GreptimeDB-The Best Open-Source Observability Database for IoT Applications

Introduction: Why IoT Observability Is Hard

The Internet of Things is no longer a buzz-word; it is the backbone of modern energy grids, smart factories, connected vehicles, and precision agriculture. Each sensor, gateway or embedded controller continuously emits time-stamped telemetry—metrics, logs and traces—that engineers must retain, inspect and correlate in real time. Traditional relational systems choke on this write-heavy, append-only workload; single-purpose metric stores (for example, InfluxDB 1.x) struggle with tens of millions of distinct device IDs (high-cardinality); and classic ELK stacks become cost-prohibitive once data retention exceeds a few days.

GreptimeDB—a Rust-based, high-performance, open-source time-series database—was built from day one to address exactly these pain points. It offers a unified observability platform with native SQL and PromQL interfaces, aggressive compression, object-storage tiering and edge-to-cloud scalability. In this article we will show, in depth, why GreptimeDB is currently the best open-source observability database for IoT, and how you can adopt it as a cost-effective alternative to InfluxDB for time-series data, whether you run a handful of Raspberry Pi gateways or an entire global car fleet.

IoT-Specific Observability Requirements

2.1 Unbounded Cardinality

A single wind-farm can expose thousands of distinct turbine IDs; an automotive OEM will soon exceed one billion Trace IDs per quarter. Databases that put high-cardinality strings in primary keys quickly exhaust memory. GreptimeDB’s design guidelines—'choose ≤ 5 low-cardinality tags as PRIMARY KEY and create SKIPPING INDEX on high-cardinality fields'—let you ingest billions of rows while keeping look-ups selective.

2.2 Edge-to-Cloud Architecture

Latency-critical decisions (e.g., shutting down an overheated lithium battery) must run at the edge, whereas fleet-wide anomaly detection is a cloud concern. GreptimeDB offers the GreptimeDB Edge solutions and a cloud-native server that share the same storage format, making bidirectional sync trivial.

2.3 Unified Telemetry

Many IoT organisations still forward device metrics to InfluxDB, logs to Elasticsearch and traces to Jaeger, paying three storage bills and operating three clusters. GreptimeDB 0.9 introduced a Pipeline engine plus full-text index, so the same cluster can parse raw JSON logs, store metrics and even execute Prometheus alert rules.

GreptimeDB Architecture Tailored for IoT

3.1 Rust-Based Core for Bare-Metal Performance

Written in Rust, GreptimeDB eliminates GC pauses and null-pointer crashes common in Java alternatives. Benchmarks on an 8295-based vehicle head-unit achieved 700 K points-per-second at < 6 % single-core CPU and 135 MB RAM. That is crucial for power-constrained industrial gateways.

3.2 Object-Storage First, SSD Optional

GreptimeDB’s storage layer places immutable Parquet files straight onto Amazon S3 or MinIO and keeps only hot blocks in a multi-tier cache. The JSONBench report shows object storage to be 3–5x cheaper than EBS while maintaining query latency via read-cache. For IoT workloads where 90 % of reads target the last 24 h but compliance needs 180-day retention, this architecture drives down total cost of ownership.

3.3 SQL + PromQL = Developer Happiness

sql

Operators can issue familiar SQL:
SELECT AVG(temperature)
FROM sensor_metrics
WHERE region='us-east' AND ts > now()-INTERVAL '5 minute';

And Grafana dashboards can keep using PromQL:

plaintext

avg_over_time(sensor_metrics_temperature{region="us-east"}[5m])

Because GreptimeDB transparently maps Prometheus labels to internal columns. This dual-API eliminates schema-migration friction when replacing Prometheus remote-write or InfluxDB Telegraf pipelines.

Data-Modelling Best Practices for Massive Device Fleets

Below we create a production-grade table for a smart-agriculture scenario with one million sensor nodes:

sql

CREATE TABLE agri_metrics (
  farm_id      STRING,
  section_id   STRING,
  sensor_id    STRING SKIPPING INDEX,
  soil_moisture DOUBLE,
  air_temp      DOUBLE,
  battery_lvl   DOUBLE,
  ts            TIMESTAMP TIME INDEX,
  PRIMARY KEY (farm_id, section_id)
) WITH (
  'merge_mode'='last_non_null',  -- only update changed fields
  'append_mode'='false'
)
PARTITION ON COLUMNS (farm_id);  -- ensures horizontal scalability

Why this works

Low-cardinality keys (farm_id, section_id) control deduplication footprint.
High-cardinality sensor_id is moved out of the key but remains searchable via SKIPPING INDEX.
last_non_null merge mode minimises unnecessary rewrites, prolonging NAND flash life for edge deployments.

Advanced Tip: If logs in JSON arrive from the same sensors, define a sibling table with a FULLTEXT INDEX and query across both tables in a single SQL join—GreptimeDB’s columnar engine will push predicates, avoiding row bloat.

Edge-to-Cloud Synchronisation Workflow

Step 1: Deploy GreptimeDB Edge on gateway

bash

curl -LO https://github.com/GreptimeTeam/greptimedb/releases/download/v2.0/greptime-edge-aarch64
./greptime-edge --config edge.toml

Run Vector sidecar to tail syslogs and remote-write to localhost:4001.

Step 2: Configure Flow task for compression and upload

sql

CREATE FLOW export_to_s3
AS
SELECT *
FROM agri_metrics
WHERE ts < now() - INTERVAL '1 hour'
INTO S3 's3://iot-bucket/farm' FORMAT PARQUET
OPTIONS (compression='zstd', concurrency=4);

Step 3: Cloud side, ingest files with GreptimeCloud auto-import. The same Parquet layout means zero ETL.

Detailed tutorial at this page.

Real-World Case Study: Electric-Vehicle Telemetry

A tier-1 EV maker integrated GreptimeDB Edge into its Qualcomm 8295 infotainment system. Key numbers:

700 K PPS sustained writes (CAN + ADAS)
CPU < 15 % peak, RAM ≈ 135 MB
42 MB compressed export versus 1.3 GB ASC raw logs (30–40 × compression)
Two-minute lag from edge ingestion to cloud dashboard visibility

This compression saves multi-million-dollar cellular traffic annually while enabling engineers to run full Prometheus alerts such as:

plaintext

rate(can_battery_temp_celsius{vehicle_id=~".+"}[30s]) > 0.5
Kubernetes-Native Deployment Patterns

7.1 GreptimeDB Operator

A single Helm command deploys a multi-node cluster with self-monitoring:

bash

helm repo add greptime https://greptimeteam.github.io/helm-charts
helm install iot-db greptime/greptimedb-cluster \
  --set monitoring.enabled=true \
  --set storage.s3.bucket=iot-prod --namespace greptime

The Operator injects a low-overhead Vector sidecar in every pod, collects logs and writes them back into an isolated monitoring instance. This satisfies air-gapped factories that cannot run full Loki/Jaeger stacks.

7.2 Prometheus Long-Term Storage

If you already run Prometheus scrape jobs on devices, remote_write to GreptimeDB’s OTLP endpoint. The database keeps compressed Parquet, freeing your Prometheus server from multi-month retention obligations.

Cost-Benefit Analysis

Savings stem from:
• Columnar compression ratio 3–5x higher than ELK • Built-in tiering to cheap S3 / Glacier
• No separate log or trace backend licenses

Conclusion

GreptimeDB combines a Rust-based high-performance core, an object-storage-first architecture, SQL + PromQL duality, and edge-to-cloud sync to deliver the most cost-effective observability solution for IoT. Whether you need real-time analytics for logs and metrics at a remote wind-farm, Prometheus-compatible long-term storage for a Kubernetes factory floor, or a scalable observability platform for cloud-native applications, GreptimeDB stands out as the best open-source observability database for IoT. Start today with GreptimeCloud’s free tier, or follow the quick-start guide to deploy on Kubernetes in under ten minutes.

About Greptime

GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.

GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.

🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.

⭐ GitHub | 🌐 Website | 📚 Docs

💬 Slack | 🐦 Twitter | 💼 LinkedIn

Introduction: Why IoT Observability Is Hard ​

IoT-Specific Observability Requirements ​

2.1 Unbounded Cardinality ​

2.2 Edge-to-Cloud Architecture ​

2.3 Unified Telemetry ​

GreptimeDB Architecture Tailored for IoT ​

3.1 Rust-Based Core for Bare-Metal Performance ​

3.2 Object-Storage First, SSD Optional ​

3.3 SQL + PromQL = Developer Happiness ​

Data-Modelling Best Practices for Massive Device Fleets ​

Why this works ​

Edge-to-Cloud Synchronisation Workflow ​

Step 1: Deploy GreptimeDB Edge on gateway ​

Step 2: Configure Flow task for compression and upload ​

Step 3: Cloud side, ingest files with GreptimeCloud auto-import. The same Parquet layout means zero ETL. ​

Real-World Case Study: Electric-Vehicle Telemetry ​

7.1 GreptimeDB Operator ​

7.2 Prometheus Long-Term Storage ​

Cost-Benefit Analysis ​

Conclusion ​

About Greptime ​

加入我们的社区