
Poor data modeling decisions can turn your observability platform into a performance nightmare. Whether you're struggling with slow queries, exploding storage costs, or unmanageable complexity, the root cause often traces back to schema design. Let's explore how to model your data effectively in GreptimeDB to avoid these pitfalls.
Understanding Column Cardinality: The Foundation of Smart Design
Before diving into specifics, grasp this critical concept: column cardinality refers to how many unique values exist in a column. It's the difference between:
- Low-cardinality columns like region (7-10 values) or service_name (dozens)
- High-cardinality columns like user_id or trace_id (millions or billions)
This distinction shapes nearly every modeling decision in time-series databases like GreptimeDB.
The Primary Key Balancing Act
Your primary key design makes or breaks performance. Follow these golden rules:
- Prefer low-cardinality columns for primary keys
- Keep the cardinality of key combinations under 100K
- Use ≤ 5 key columns
- For pure logging workloads, consider the append-only mode with no primary key
For example, instead of using high-cardinality trace_id
, consider:
CREATE TABLE service_metrics (
datacenter STRING,
service_name STRING,
cpu_util DOUBLE,
memory_util DOUBLE,
ts TIMESTAMP,
PRIMARY KEY(datacenter, service_name),
TIME INDEX(ts)
);
Choosing the Right Index Strategy
GreptimeDB offers three index types, each with specific use cases:
- Inverted Indexes: Best for low-cardinality filtering with range operations
- Skipping Indexes: Ideal for high-cardinality equality filters
- Fulltext Indexes: Perfect for log keyword searching
Don't over-index! Each index increases write overhead and storage consumption.
Wide vs. Multi-Table Schemas
Contrary to relational database best practices, observability data often performs better in wide tables:
- Wide table benefits: Better compression (30-50% less storage), simplified queries (no JOINs)
- When to use multiple tables: Different collection frequencies, varying tag schemas, or access control requirements
Practical Deduplication Strategies
For telemetry data with partial updates, GreptimeDB's last_non_null
merge mode is invaluable:
CREATE TABLE device_telemetry (
device_id STRING,
temperature DOUBLE,
humidity DOUBLE,
ts TIMESTAMP,
PRIMARY KEY(device_id),
TIME INDEX(ts)
) WITH ('merge_mode'='last_non_null');
This allows updating individual fields without overwriting unmodified ones—ideal for IoT and partial metric updates.
Scaling to Petabyte Level
When your observability data grows beyond terabytes, leverage distributed partitioning:
CREATE TABLE global_metrics (
region STRING,
datacenter STRING,
host STRING,
cpu DOUBLE,
memory DOUBLE,
ts TIMESTAMP,
PRIMARY KEY(region, datacenter, host),
TIME INDEX(ts)
) PARTITION ON COLUMNS (region);
Choose partition columns that distribute data evenly and align with your query patterns.
Start Building Smarter
Applying these data modeling principles can dramatically improve query performance, reduce storage costs, and simplify your observability architecture. For more detailed guidance, explore GreptimeDB's official Data Modeling Guide.
Whether you're just starting with observability or refactoring an existing implementation, invest time in proper data modeling—it's the foundation that will support your growing observability needs.
CREATE TABLE global_metrics (
region STRING,
datacenter STRING,
host STRING,
cpu DOUBLE,
memory DOUBLE,
ts TIMESTAMP,
PRIMARY KEY(region, datacenter, host),
TIME INDEX(ts)
) PARTITION ON COLUMNS (region);
Choose partition columns that distribute data evenly and align with your query patterns.
About Greptime
GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.
GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.
🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.