欢迎参与 8 月 1 日中午 11 点的线上分享,了解 GreptimeDB 联合处理指标和日志的最新方案! 👉🏻 点击加入

Skip to content

Database Architecture Deep Dive! Why GreptimeDB Dominates JSONBench Performance

GitHub | 🌐 Website | 📚 Docs

💬 Slack | 🐦 Twitter | 💼 LinkedIn


GreptimeDB's #1 ranking in JSONBench cold queries didn't happen by accident. The performance advantage stems from fundamental architectural decisions that optimize for modern cloud workloads. This deep dive reveals the storage innovations that enable GreptimeDB to outperform ClickHouse, VictoriaLogs, and other established databases.

The JSONBench Battlefield

JSONBench processes 1 billion JSON documents from Bluesky social media data, executing complex analytical queries that stress-test database architectures. The benchmark reveals how databases handle:

  • Large-scale data ingestion (terabytes of JSON)
  • Complex analytical queries across nested JSON structures
  • Cold storage performance (data not in memory)
  • Resource efficiency under sustained load

GreptimeDB's victory in cold queries demonstrates superior storage architecture design, while strong hot query performance proves the caching strategies work effectively.

Multi-Tiered Storage: The Secret Weapon

GreptimeDB's storage architecture implements multiple cache tiers inspired by OS page cache design:

Write Cache Layer

Recent data stays in fast local storage:

  • Time-organized data layout enables rapid access to latest entries
  • Configurable cache size (hours to days of data)
  • Write-through cache ensures data durability

Read Cache Optimization

LRU-based cache management for historical data:

  • Parquet data pages cached on local disk
  • Significantly faster than object storage access
  • Automatic cache warming for frequently accessed data

Metadata & Index Caching

Critical metadata remains in memory:

  • Table schema and routing information
  • Parquet file metadata for query planning
  • Index data for fast lookups

This three-tier approach balances performance with cost efficiency, explaining why GreptimeDB maintains consistent performance whether data is in memory or cold storage.

Object Storage Economics vs. Performance

The storage economics decision reflects deep understanding of cloud infrastructure:

Cost Comparison (per GB/month)

Performance Mitigation

GreptimeDB's caching strategies eliminate object storage latency for common queries:

  • 90% of queries hit cache layers
  • Predictive cache warming for known access patterns
  • Parallel data fetching when object storage access is required

Columnar Storage Optimizations

GreptimeDB's columnar layout provides multiple advantages:

Compression Efficiency

  • Column-specific compression algorithms
  • 30-40x compression ratios for time-series data
  • Dictionary encoding for string columns
  • Delta encoding for timestamp sequences

Query Performance

  • Column pruning reads only necessary data
  • Vectorized operations leverage modern CPU capabilities
  • SIMD instruction sets accelerate analytical queries

JSON Handling Optimization

Nested JSON structures are decomposed into efficient columnar representations:

sql
-- JSON document automatically flattened
{
  "user": {"id": 123, "name": "alice"},
  "event": {"type": "click", "timestamp": "2024-01-01T10:00:00Z"}
}

-- Becomes columnar structure
user_id: 123
user_name: "alice"  
event_type: "click"
event_timestamp: 2024-01-01T10:00:00Z

LSM-Tree Adaptations for Observability

GreptimeDB's LSM-tree implementation includes observability-specific optimizations:

Write Buffer Design

  • Apache Arrow in-memory format for efficiency
  • Dictionary encoding reduces memory overhead
  • Time-series merging for related metrics

Compaction Strategies

  • Time-based partitioning aligns with query patterns
  • Background task scheduling prevents resource conflicts
  • Adaptive compaction based on data characteristics

The Rust Advantage

Rust's memory safety and performance characteristics provide foundational advantages:

Memory Management

  • Zero-cost abstractions eliminate runtime overhead
  • Memory safety prevents crashes under high load
  • Predictable performance without garbage collection pauses

Concurrency

  • Fearless concurrency enables efficient parallel processing
  • Actor-based architecture for component isolation
  • Async/await for non-blocking IO operations

Benchmarking Results Analysis

JSONBench cold query performance:

  1. GreptimeDB: Consistently fastest across query types
  2. ClickHouse: Strong but variable performance
  3. VictoriaLogs: Good average performance
  4. Others: Significantly slower

The performance gap reflects architectural choices:

  • GreptimeDB: Optimized for cloud-native workloads
  • ClickHouse: Optimized for high-memory environments
  • Traditional databases: Not designed for object storage

Advanced Features Beyond Benchmarks

Pipeline Processing

Built-in ETL capabilities eliminate external processing overhead:

yaml
processors:
  - json:
      field: message
      target_field: parsed
  - dissect:
      field: parsed.log
      pattern: "%{timestamp} %{level} %{message}"

Vector Search Integration

Semantic similarity within JSON documents:

sql
SELECT * FROM json_docs 
WHERE vec_cosine_similarity(content_embedding, :query_vector) > 0.8;

Native full-text indexing with multiple backend options:

  • Bloom filters for general-purpose search
  • Tantivy for high-precision queries

Operational Advantages

Cloud-native design reduces operational complexity:

Kubernetes Integration

  • Native Kubernetes deployment
  • Automatic scaling based on workload
  • Service mesh integration for observability

Monitoring & Metrics

  • Built-in metrics export to Prometheus
  • Distributed tracing support
  • Health check endpoints for load balancers

The Architectural Philosophy

GreptimeDB's design philosophy prioritizes:

  1. Cloud-first architecture from day zero
  2. Cost efficiency without performance compromise
  3. Operational simplicity over feature complexity
  4. Unified data model for observability workloads

This architectural coherence explains why GreptimeDB outperforms databases that retrofit cloud features onto legacy designs.

The JSONBench victory validates years of architectural decisions focused on real-world cloud workloads. GreptimeDB doesn't just store data efficiently – it transforms how organizations think about observability database architecture.

Ready to experience next-generation database performance? GreptimeDB's architecture represents the future of cloud-native data management.


About Greptime

GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.

  • GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.

  • GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.

  • GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.

🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.

加入我们的社区

获取 Greptime 最新更新,并与其他用户讨论。