
💬 Slack | 🐦 Twitter | 💼 LinkedIn
Log analysis just got significantly more powerful. GreptimeDB v0.14 introduces dual-backend full-text indexing with both Bloom filter and Tantivy backends, plus new query operators that make text search more intuitive and efficient. This isn't just an incremental update - it's a complete rethinking of how observability databases handle text search at scale.
The Evolution of Full-Text Search
Traditional log analysis forces uncomfortable trade-offs. You either get fast indexing with limited search capabilities or powerful search with massive storage overhead. GreptimeDB v0.14 eliminates this choice by providing two specialized backends optimized for different use cases.
The new release brings 247 merged pull requests including 100 feature enhancements, with significant focus on full-text indexing improvements that directly impact how organizations handle log analysis workflows.
New Query Operators: matches_term
and @@
Version 0.14 introduces intuitive text matching with the new matches_term
function and @@
operator shorthand:
-- Using matches_term function
SELECT * FROM logs WHERE matches_term(message, 'error') OR matches_term(message, 'fail');
-- Using @@ operator (shorthand)
SELECT * FROM logs WHERE message @@ 'error' OR message @@ 'fail';
These operators provide exact term matching with intelligent boundary detection:
- Case-sensitive matching for precise results
- Word boundary detection prevents partial matches
- Multi-word phrase support for complex search patterns
Dual-Backend Architecture: Bloom vs Tantivy
GreptimeDB v0.14's dual-backend approach allows users to choose the optimal indexing strategy based on their specific workload characteristics:
Bloom Backend: Optimized for General-Purpose Search
Characteristic | Performance |
---|---|
Best For | General-purpose log search across diverse patterns |
Storage Overhead | ~10% of raw data size (extremely efficient) |
Query Performance | Consistent across all query types |
Memory Usage | Minimal impact on system resources |
Example Storage Comparison:
- Raw log data: 10GB
- Bloom index: 1GB
- Total storage: 11GB (10% overhead)
Tantivy Backend: Precision-Focused Architecture
Characteristic | Performance |
---|---|
Best For | High-selectivity queries (TraceID, unique identifiers) |
Storage Overhead | ~100% of raw data size (inverted index) |
Selective Queries | 5x faster than Bloom for unique lookups |
General Queries | 5x slower than Bloom for broad searches |
Example Storage Comparison:
- Raw log data: 10GB
- Tantivy index: 10GB
- Total storage: 20GB (100% overhead)
Performance Benchmarking Results
Real-world testing reveals significant performance differences based on query selectivity:
High-Selectivity Queries (TraceID, UserID)
Backend | Relative Performance |
---|---|
Tantivy | 5x faster (baseline) |
Bloom | 1x (baseline) |
LIKE Query | 50x slower |
Low-Selectivity Queries (Common Terms)
Backend | Relative Performance |
---|---|
Bloom | 1x (baseline) |
Tantivy | 5x slower |
LIKE Query | 1x (equivalent) |
Choosing the Right Backend Strategy
Decision matrix for backend selection:
Use Bloom Backend When:
- Diverse query patterns across your log corpus
- Storage efficiency is a primary concern
- Consistent performance matters more than peak speed
- Budget constraints limit infrastructure resources
Use Tantivy Backend When:
- Trace ID lookups dominate your query patterns
- Unique identifier searches are performance-critical
- Storage costs are less constraining than query speed
- High-precision matching is essential
Advanced Configuration Options
Backend-specific configuration allows fine-tuning for optimal performance:
-- Bloom backend configuration
CREATE TABLE logs_bloom (
message STRING FULLTEXT WITH (backend = 'bloom'),
service STRING,
ts TIMESTAMP,
TIME INDEX(ts)
);
-- Tantivy backend configuration
CREATE TABLE logs_tantivy (
message STRING FULLTEXT WITH (backend = 'tantivy'),
trace_id STRING,
ts TIMESTAMP,
TIME INDEX(ts)
);
Integration with Time-Series Queries
Full-text search combines seamlessly with time-series filtering:
-- Hybrid query combining text search and time filtering
SELECT service, COUNT(*) as error_count
FROM logs
WHERE ts > now() - INTERVAL '1 hour'
AND message @@ 'timeout'
AND service != 'health-check'
GROUP BY service
ORDER BY error_count DESC;
This unified approach eliminates the need for separate log and metrics storage systems.
Memory and Resource Optimization
GreptimeDB's columnar storage provides significant advantages for text indexing:
Memory Efficiency
- Bloom filters: 400MB memory usage for 10GB dataset
- Traditional solutions: Often require 12GB+ memory
- 32x memory efficiency compared to Elasticsearch
Compression Benefits
- Structured log parsing: 13% storage usage vs raw logs
- Column-specific compression: Optimized for each data type
- Automatic data lifecycle: Intelligent tiering based on access patterns
Migration and Adoption Strategy
Transitioning to enhanced full-text indexing:
- Evaluate query patterns to determine optimal backend choice
- Start with Bloom backend for general-purpose workloads
- Migrate high-selectivity queries to Tantivy backend
- Monitor performance metrics to validate configuration choices
Real-World Performance Impact
Organizations migrating to GreptimeDB v0.14's enhanced indexing report:
- 50-80% reduction in query response times
- 70% decrease in storage costs compared to Elasticsearch
- Simplified operations with unified metrics and logs storage
The dual-backend architecture enables organizations to optimize for their specific use cases without compromising on functionality or performance.
Ready to enhance your log analysis capabilities? GreptimeDB v0.14's full-text indexing delivers the performance and flexibility needed for modern observability workloads.
About Greptime
GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.
GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.
🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.