
Introduction: Why IoT Observability Is Hard
The Internet of Things is no longer a buzz-word; it is the backbone of modern energy grids, smart factories, connected vehicles, and precision agriculture. Each sensor, gateway or embedded controller continuously emits time-stamped telemetry—metrics, logs and traces—that engineers must retain, inspect and correlate in real time. Traditional relational systems choke on this write-heavy, append-only workload; single-purpose metric stores (for example, InfluxDB 1.x) struggle with tens of millions of distinct device IDs (high-cardinality); and classic ELK stacks become cost-prohibitive once data retention exceeds a few days.
GreptimeDB—a Rust-based, high-performance, open-source time-series database—was built from day one to address exactly these pain points. It offers a unified observability platform with native SQL and PromQL interfaces, aggressive compression, object-storage tiering and edge-to-cloud scalability. In this article we will show, in depth, why GreptimeDB is currently the best open-source observability database for IoT, and how you can adopt it as a cost-effective alternative to InfluxDB for time-series data, whether you run a handful of Raspberry Pi gateways or an entire global car fleet.
IoT-Specific Observability Requirements
2.1 Unbounded Cardinality
A single wind-farm can expose thousands of distinct turbine IDs; an automotive OEM will soon exceed one billion Trace IDs per quarter. Databases that put high-cardinality strings in primary keys quickly exhaust memory. GreptimeDB’s design guidelines—'choose ≤ 5 low-cardinality tags as PRIMARY KEY and create SKIPPING INDEX
on high-cardinality fields'—let you ingest billions of rows while keeping look-ups selective.
2.2 Edge-to-Cloud Architecture
Latency-critical decisions (e.g., shutting down an overheated lithium battery) must run at the edge, whereas fleet-wide anomaly detection is a cloud concern. GreptimeDB offers the GreptimeDB Edge solutions and a cloud-native server that share the same storage format, making bidirectional sync trivial.
2.3 Unified Telemetry
Many IoT organisations still forward device metrics to InfluxDB, logs to Elasticsearch and traces to Jaeger, paying three storage bills and operating three clusters. GreptimeDB 0.9 introduced a Pipeline engine plus full-text index, so the same cluster can parse raw JSON
logs, store metrics and even execute Prometheus alert rules.
GreptimeDB Architecture Tailored for IoT
3.1 Rust-Based Core for Bare-Metal Performance
Written in Rust, GreptimeDB eliminates GC pauses and null-pointer crashes common in Java alternatives. Benchmarks on an 8295-based vehicle head-unit achieved 700 K points-per-second at < 6 % single-core CPU and 135 MB RAM. That is crucial for power-constrained industrial gateways.
3.2 Object-Storage First, SSD Optional
GreptimeDB’s storage layer places immutable Parquet files straight onto Amazon S3 or MinIO and keeps only hot blocks in a multi-tier cache. The JSONBench report shows object storage to be 3–5x cheaper than EBS while maintaining query latency via read-cache. For IoT workloads where 90 % of reads target the last 24 h but compliance needs 180-day retention, this architecture drives down total cost of ownership.
3.3 SQL + PromQL = Developer Happiness
Operators can issue familiar SQL:
SELECT AVG(temperature)
FROM sensor_metrics
WHERE region='us-east' AND ts > now()-INTERVAL '5 minute';
And Grafana dashboards can keep using PromQL:
avg_over_time(sensor_metrics_temperature{region="us-east"}[5m])
Because GreptimeDB transparently maps Prometheus labels to internal columns. This dual-API eliminates schema-migration friction when replacing Prometheus remote-write or InfluxDB Telegraf pipelines.
Data-Modelling Best Practices for Massive Device Fleets
Below we create a production-grade table for a smart-agriculture scenario with one million sensor nodes:
CREATE TABLE agri_metrics (
farm_id STRING,
section_id STRING,
sensor_id STRING SKIPPING INDEX,
soil_moisture DOUBLE,
air_temp DOUBLE,
battery_lvl DOUBLE,
ts TIMESTAMP TIME INDEX,
PRIMARY KEY (farm_id, section_id)
) WITH (
'merge_mode'='last_non_null', -- only update changed fields
'append_mode'='false'
)
PARTITION ON COLUMNS (farm_id); -- ensures horizontal scalability
Why this works
- Low-cardinality keys (
farm_id
,section_id
) control deduplication footprint. - High-cardinality
sensor_id
is moved out of the key but remains searchable via SKIPPINGINDEX
. last_non_null
merge mode minimises unnecessary rewrites, prolonging NAND flash life for edge deployments.
Advanced Tip: If logs in
JSON
arrive from the same sensors, define a sibling table with aFULLTEXT INDEX
and query across both tables in a single SQL join—GreptimeDB’s columnar engine will push predicates, avoiding row bloat.
Edge-to-Cloud Synchronisation Workflow
Step 1: Deploy GreptimeDB Edge on gateway
curl -LO https://github.com/GreptimeTeam/greptimedb/releases/download/v2.0/greptime-edge-aarch64
./greptime-edge --config edge.toml
Run Vector sidecar to tail syslogs and remote-write to localhost:4001
.
Step 2: Configure Flow task for compression and upload
CREATE FLOW export_to_s3
AS
SELECT *
FROM agri_metrics
WHERE ts < now() - INTERVAL '1 hour'
INTO S3 's3://iot-bucket/farm' FORMAT PARQUET
OPTIONS (compression='zstd', concurrency=4);
Step 3: Cloud side, ingest files with GreptimeCloud auto-import. The same Parquet layout means zero ETL.
Detailed tutorial at this page.
Real-World Case Study: Electric-Vehicle Telemetry
A tier-1 EV maker integrated GreptimeDB Edge into its Qualcomm 8295 infotainment system. Key numbers:
- 700 K PPS sustained writes (CAN + ADAS)
- CPU < 15 % peak, RAM ≈ 135 MB
- 42 MB compressed export versus 1.3 GB ASC raw logs (30–40 × compression)
- Two-minute lag from edge ingestion to cloud dashboard visibility
This compression saves multi-million-dollar cellular traffic annually while enabling engineers to run full Prometheus alerts such as:
rate(can_battery_temp_celsius{vehicle_id=~".+"}[30s]) > 0.5
Kubernetes-Native Deployment Patterns
7.1 GreptimeDB Operator
A single Helm command deploys a multi-node cluster with self-monitoring:
helm repo add greptime https://greptimeteam.github.io/helm-charts
helm install iot-db greptime/greptimedb-cluster \
--set monitoring.enabled=true \
--set storage.s3.bucket=iot-prod --namespace greptime
The Operator injects a low-overhead Vector sidecar in every pod, collects logs and writes them back into an isolated monitoring instance. This satisfies air-gapped factories that cannot run full Loki/Jaeger stacks.
7.2 Prometheus Long-Term Storage
If you already run Prometheus scrape jobs on devices, remote_write to GreptimeDB’s OTLP endpoint. The database keeps compressed Parquet, freeing your Prometheus server from multi-month retention obligations.
Cost-Benefit Analysis
Savings stem from:
• Columnar compression ratio 3–5x higher than ELK • Built-in tiering to cheap S3 / Glacier
• No separate log or trace backend licenses
Conclusion
GreptimeDB combines a Rust-based high-performance core, an object-storage-first architecture, SQL + PromQL duality, and edge-to-cloud sync to deliver the most cost-effective observability solution for IoT. Whether you need real-time analytics for logs and metrics at a remote wind-farm, Prometheus-compatible long-term storage for a Kubernetes factory floor, or a scalable observability platform for cloud-native applications, GreptimeDB stands out as the best open-source observability database for IoT. Start today with GreptimeCloud’s free tier, or follow the quick-start guide to deploy on Kubernetes in under ten minutes.
About Greptime
GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.
GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.
🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.