欢迎参与 8 月 1 日中午 11 点的线上分享,了解 GreptimeDB 联合处理指标和日志的最新方案! 👉🏻 点击加入

Skip to content

Data Modeling Best Practices for Observability Workloads

The Foundation of Observability Success

Poor data modeling can make even the fastest database crawl. We've seen teams struggle with query performance issues that trace back to fundamental schema design problems.

Observability data has unique characteristics that require thoughtful modeling approaches. Unlike traditional business applications, monitoring data involves high-volume writes, time-based queries, and varying cardinality patterns.

Understanding Column Cardinality

Think of your data like a library:

  • Low-cardinality columns are like book genres (Sci-Fi, History, Art)
  • High-cardinality columns are like ISBNs or user IDs

This distinction directly impacts performance. In one e-commerce project, region had only 7 values while user_id reached billions.

Primary Key Design Rules

For time-series databases, follow these guidelines:

  • Choose low-cardinality columns only
  • Keep key combination under 100k unique values
  • Limit to ≤5 key columns
  • Prefer strings and integers over floats

Wide Tables vs. Multiple Tables

Best practice: Store related metrics in wide tables, especially when collected together.

sql
CREATE TABLE node_metrics (
  host STRING,
  cpu_user DOUBLE,
  cpu_system DOUBLE,
  memory_used DOUBLE,
  disk_read_bytes DOUBLE,
  net_in_bytes DOUBLE,
  ts TIMESTAMP,
  PRIMARY KEY(host),
  TIME INDEX(ts)
);

Benefits include:

  • 30-50% better compression
  • Simplified queries (no JOINs needed)
  • Improved query performance

Index Strategy for Different Query Types

GreptimeDB offers three index types optimized for specific patterns:

Inverted Index

Best for low-cardinality filtering:

  • Supports =, <, >, IN, BETWEEN
  • Efficient categorical queries
  • Moderate storage overhead

Skipping Index

Great for high-cardinality equality filters:

  • Minimal write performance impact
  • Very storage efficient
  • Limited to equality queries only

Fulltext Index

Essential for log keyword search:

  • Supports complex text matching
  • English analyzer for better relevance
  • Higher storage requirements

Partition Strategy for Scale

When data exceeds TB scale, distributed partitioning becomes essential:

sql
CREATE TABLE global_metrics (
  region STRING,
  datacenter STRING,
  host STRING,
  cpu DOUBLE,
  memory DOUBLE,
  ts TIMESTAMP,
  PRIMARY KEY(region, datacenter, host),
  TIME INDEX(ts)
) PARTITION ON COLUMNS (region);

Best practice: Partition by columns with even distribution and query-aligned patterns.

Performance Tuning Guidelines

Based on real-world experience:

  • Start simple without primary keys for write-heavy workloads
  • Avoid over-indexing (impacts write performance)
  • Consider partitioning when tables exceed 500GB
  • Set appropriate TTL policies for data retention

GreptimeDB's flexible architecture supports these optimization strategies while maintaining operational simplicity.


About Greptime

GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.

  • GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.

  • GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.

  • GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.

🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.

GitHub | 🌐 Website | 📚 Docs

💬 Slack | 🐦 Twitter | 💼 LinkedIn


加入我们的社区

获取 Greptime 最新更新,并与其他用户讨论。