
The Foundation of Observability Success
Poor data modeling can make even the fastest database crawl. We've seen teams struggle with query performance issues that trace back to fundamental schema design problems.
Observability data has unique characteristics that require thoughtful modeling approaches. Unlike traditional business applications, monitoring data involves high-volume writes, time-based queries, and varying cardinality patterns.
Understanding Column Cardinality
Think of your data like a library:
- Low-cardinality columns are like book genres (Sci-Fi, History, Art)
- High-cardinality columns are like ISBNs or user IDs
This distinction directly impacts performance. In one e-commerce project, region
had only 7 values while user_id
reached billions.
Primary Key Design Rules
For time-series databases, follow these guidelines:
- Choose low-cardinality columns only
- Keep key combination under 100k unique values
- Limit to ≤5 key columns
- Prefer strings and integers over floats
Wide Tables vs. Multiple Tables
Best practice: Store related metrics in wide tables, especially when collected together.
CREATE TABLE node_metrics (
host STRING,
cpu_user DOUBLE,
cpu_system DOUBLE,
memory_used DOUBLE,
disk_read_bytes DOUBLE,
net_in_bytes DOUBLE,
ts TIMESTAMP,
PRIMARY KEY(host),
TIME INDEX(ts)
);
Benefits include:
- 30-50% better compression
- Simplified queries (no JOINs needed)
- Improved query performance
Index Strategy for Different Query Types
GreptimeDB offers three index types optimized for specific patterns:
Inverted Index
Best for low-cardinality filtering:
- Supports
=, <, >, IN, BETWEEN
- Efficient categorical queries
- Moderate storage overhead
Skipping Index
Great for high-cardinality equality filters:
- Minimal write performance impact
- Very storage efficient
- Limited to equality queries only
Fulltext Index
Essential for log keyword search:
- Supports complex text matching
- English analyzer for better relevance
- Higher storage requirements
Partition Strategy for Scale
When data exceeds TB scale, distributed partitioning becomes essential:
CREATE TABLE global_metrics (
region STRING,
datacenter STRING,
host STRING,
cpu DOUBLE,
memory DOUBLE,
ts TIMESTAMP,
PRIMARY KEY(region, datacenter, host),
TIME INDEX(ts)
) PARTITION ON COLUMNS (region);
Best practice: Partition by columns with even distribution and query-aligned patterns.
Performance Tuning Guidelines
Based on real-world experience:
- Start simple without primary keys for write-heavy workloads
- Avoid over-indexing (impacts write performance)
- Consider partitioning when tables exceed 500GB
- Set appropriate TTL policies for data retention
GreptimeDB's flexible architecture supports these optimization strategies while maintaining operational simplicity.
About Greptime
GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.
GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.
🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.
💬 Slack | 🐦 Twitter | 💼 LinkedIn