
💬 Slack | 🐦 Twitter | 💼 LinkedIn
GreptimeDB's #1 ranking in JSONBench cold queries didn't happen by accident. The performance advantage stems from fundamental architectural decisions that optimize for modern cloud workloads. This deep dive reveals the storage innovations that enable GreptimeDB to outperform ClickHouse, VictoriaLogs, and other established databases.
The JSONBench Battlefield
JSONBench processes 1 billion JSON documents from Bluesky social media data, executing complex analytical queries that stress-test database architectures. The benchmark reveals how databases handle:
- Large-scale data ingestion (terabytes of JSON)
- Complex analytical queries across nested JSON structures
- Cold storage performance (data not in memory)
- Resource efficiency under sustained load
GreptimeDB's victory in cold queries demonstrates superior storage architecture design, while strong hot query performance proves the caching strategies work effectively.
Multi-Tiered Storage: The Secret Weapon
GreptimeDB's storage architecture implements multiple cache tiers inspired by OS page cache design:
Write Cache Layer
Recent data stays in fast local storage:
- Time-organized data layout enables rapid access to latest entries
- Configurable cache size (hours to days of data)
- Write-through cache ensures data durability
Read Cache Optimization
LRU-based cache management for historical data:
- Parquet data pages cached on local disk
- Significantly faster than object storage access
- Automatic cache warming for frequently accessed data
Metadata & Index Caching
Critical metadata remains in memory:
- Table schema and routing information
- Parquet file metadata for query planning
- Index data for fast lookups
This three-tier approach balances performance with cost efficiency, explaining why GreptimeDB maintains consistent performance whether data is in memory or cold storage.
Object Storage Economics vs. Performance
The storage economics decision reflects deep understanding of cloud infrastructure:
Cost Comparison (per GB/month)
- Amazon S3 Standard: $0.023
- Amazon EBS gp3: $0.080
- Performance difference: 3.5x cost savings
Performance Mitigation
GreptimeDB's caching strategies eliminate object storage latency for common queries:
- 90% of queries hit cache layers
- Predictive cache warming for known access patterns
- Parallel data fetching when object storage access is required
Columnar Storage Optimizations
GreptimeDB's columnar layout provides multiple advantages:
Compression Efficiency
- Column-specific compression algorithms
- 30-40x compression ratios for time-series data
- Dictionary encoding for string columns
- Delta encoding for timestamp sequences
Query Performance
- Column pruning reads only necessary data
- Vectorized operations leverage modern CPU capabilities
- SIMD instruction sets accelerate analytical queries
JSON Handling Optimization
Nested JSON structures are decomposed into efficient columnar representations:
-- JSON document automatically flattened
{
"user": {"id": 123, "name": "alice"},
"event": {"type": "click", "timestamp": "2024-01-01T10:00:00Z"}
}
-- Becomes columnar structure
user_id: 123
user_name: "alice"
event_type: "click"
event_timestamp: 2024-01-01T10:00:00Z
LSM-Tree Adaptations for Observability
GreptimeDB's LSM-tree implementation includes observability-specific optimizations:
Write Buffer Design
- Apache Arrow in-memory format for efficiency
- Dictionary encoding reduces memory overhead
- Time-series merging for related metrics
Compaction Strategies
- Time-based partitioning aligns with query patterns
- Background task scheduling prevents resource conflicts
- Adaptive compaction based on data characteristics
The Rust Advantage
Rust's memory safety and performance characteristics provide foundational advantages:
Memory Management
- Zero-cost abstractions eliminate runtime overhead
- Memory safety prevents crashes under high load
- Predictable performance without garbage collection pauses
Concurrency
- Fearless concurrency enables efficient parallel processing
- Actor-based architecture for component isolation
- Async/await for non-blocking IO operations
Benchmarking Results Analysis
JSONBench cold query performance:
- GreptimeDB: Consistently fastest across query types
- ClickHouse: Strong but variable performance
- VictoriaLogs: Good average performance
- Others: Significantly slower
The performance gap reflects architectural choices:
- GreptimeDB: Optimized for cloud-native workloads
- ClickHouse: Optimized for high-memory environments
- Traditional databases: Not designed for object storage
Advanced Features Beyond Benchmarks
Pipeline Processing
Built-in ETL capabilities eliminate external processing overhead:
processors:
- json:
field: message
target_field: parsed
- dissect:
field: parsed.log
pattern: "%{timestamp} %{level} %{message}"
Vector Search Integration
Semantic similarity within JSON documents:
SELECT * FROM json_docs
WHERE vec_cosine_similarity(content_embedding, :query_vector) > 0.8;
Full-Text Search
Native full-text indexing with multiple backend options:
- Bloom filters for general-purpose search
- Tantivy for high-precision queries
Operational Advantages
Cloud-native design reduces operational complexity:
Kubernetes Integration
- Native Kubernetes deployment
- Automatic scaling based on workload
- Service mesh integration for observability
Monitoring & Metrics
- Built-in metrics export to Prometheus
- Distributed tracing support
- Health check endpoints for load balancers
The Architectural Philosophy
GreptimeDB's design philosophy prioritizes:
- Cloud-first architecture from day zero
- Cost efficiency without performance compromise
- Operational simplicity over feature complexity
- Unified data model for observability workloads
This architectural coherence explains why GreptimeDB outperforms databases that retrofit cloud features onto legacy designs.
The JSONBench victory validates years of architectural decisions focused on real-world cloud workloads. GreptimeDB doesn't just store data efficiently – it transforms how organizations think about observability database architecture.
Ready to experience next-generation database performance? GreptimeDB's architecture represents the future of cloud-native data management.
About Greptime
GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.
GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.
🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.