欢迎参与 8 月 1 日中午 11 点的线上分享,了解 GreptimeDB 联合处理指标和日志的最新方案! 👉🏻 点击加入

Skip to content

Enhanced Full-Text Indexing in GreptimeDB v0.14! Bloom vs Tantivy Backend Analysis

GitHub | 🌐 Website | 📚 Docs

💬 Slack | 🐦 Twitter | 💼 LinkedIn


Log analysis just got significantly more powerful. GreptimeDB v0.14 introduces dual-backend full-text indexing with both Bloom filter and Tantivy backends, plus new query operators that make text search more intuitive and efficient. This isn't just an incremental update - it's a complete rethinking of how observability databases handle text search at scale.

Traditional log analysis forces uncomfortable trade-offs. You either get fast indexing with limited search capabilities or powerful search with massive storage overhead. GreptimeDB v0.14 eliminates this choice by providing two specialized backends optimized for different use cases.

The new release brings 247 merged pull requests including 100 feature enhancements, with significant focus on full-text indexing improvements that directly impact how organizations handle log analysis workflows.

New Query Operators: matches_term and @@

Version 0.14 introduces intuitive text matching with the new matches_term function and @@ operator shorthand:

sql
-- Using matches_term function
SELECT * FROM logs WHERE matches_term(message, 'error') OR matches_term(message, 'fail');

-- Using @@ operator (shorthand)
SELECT * FROM logs WHERE message @@ 'error' OR message @@ 'fail';

These operators provide exact term matching with intelligent boundary detection:

  • Case-sensitive matching for precise results
  • Word boundary detection prevents partial matches
  • Multi-word phrase support for complex search patterns

Dual-Backend Architecture: Bloom vs Tantivy

GreptimeDB v0.14's dual-backend approach allows users to choose the optimal indexing strategy based on their specific workload characteristics:

CharacteristicPerformance
Best ForGeneral-purpose log search across diverse patterns
Storage Overhead~10% of raw data size (extremely efficient)
Query PerformanceConsistent across all query types
Memory UsageMinimal impact on system resources

Example Storage Comparison:

  • Raw log data: 10GB
  • Bloom index: 1GB
  • Total storage: 11GB (10% overhead)

Tantivy Backend: Precision-Focused Architecture

CharacteristicPerformance
Best ForHigh-selectivity queries (TraceID, unique identifiers)
Storage Overhead~100% of raw data size (inverted index)
Selective Queries5x faster than Bloom for unique lookups
General Queries5x slower than Bloom for broad searches

Example Storage Comparison:

  • Raw log data: 10GB
  • Tantivy index: 10GB
  • Total storage: 20GB (100% overhead)

Performance Benchmarking Results

Real-world testing reveals significant performance differences based on query selectivity:

High-Selectivity Queries (TraceID, UserID)

BackendRelative Performance
Tantivy5x faster (baseline)
Bloom1x (baseline)
LIKE Query50x slower

Low-Selectivity Queries (Common Terms)

BackendRelative Performance
Bloom1x (baseline)
Tantivy5x slower
LIKE Query1x (equivalent)

Choosing the Right Backend Strategy

Decision matrix for backend selection:

Use Bloom Backend When:

  • Diverse query patterns across your log corpus
  • Storage efficiency is a primary concern
  • Consistent performance matters more than peak speed
  • Budget constraints limit infrastructure resources

Use Tantivy Backend When:

  • Trace ID lookups dominate your query patterns
  • Unique identifier searches are performance-critical
  • Storage costs are less constraining than query speed
  • High-precision matching is essential

Advanced Configuration Options

Backend-specific configuration allows fine-tuning for optimal performance:

sql
-- Bloom backend configuration
CREATE TABLE logs_bloom (
    message STRING FULLTEXT WITH (backend = 'bloom'),
    service STRING,
    ts TIMESTAMP,
    TIME INDEX(ts)
);

-- Tantivy backend configuration  
CREATE TABLE logs_tantivy (
    message STRING FULLTEXT WITH (backend = 'tantivy'),
    trace_id STRING,
    ts TIMESTAMP,
    TIME INDEX(ts)
);

Integration with Time-Series Queries

Full-text search combines seamlessly with time-series filtering:

sql
-- Hybrid query combining text search and time filtering
SELECT service, COUNT(*) as error_count
FROM logs 
WHERE ts > now() - INTERVAL '1 hour'
  AND message @@ 'timeout'
  AND service != 'health-check'
GROUP BY service
ORDER BY error_count DESC;

This unified approach eliminates the need for separate log and metrics storage systems.

Memory and Resource Optimization

GreptimeDB's columnar storage provides significant advantages for text indexing:

Memory Efficiency

  • Bloom filters: 400MB memory usage for 10GB dataset
  • Traditional solutions: Often require 12GB+ memory
  • 32x memory efficiency compared to Elasticsearch

Compression Benefits

  • Structured log parsing: 13% storage usage vs raw logs
  • Column-specific compression: Optimized for each data type
  • Automatic data lifecycle: Intelligent tiering based on access patterns

Migration and Adoption Strategy

Transitioning to enhanced full-text indexing:

  1. Evaluate query patterns to determine optimal backend choice
  2. Start with Bloom backend for general-purpose workloads
  3. Migrate high-selectivity queries to Tantivy backend
  4. Monitor performance metrics to validate configuration choices

Real-World Performance Impact

Organizations migrating to GreptimeDB v0.14's enhanced indexing report:

  • 50-80% reduction in query response times
  • 70% decrease in storage costs compared to Elasticsearch
  • Simplified operations with unified metrics and logs storage

The dual-backend architecture enables organizations to optimize for their specific use cases without compromising on functionality or performance.

Ready to enhance your log analysis capabilities? GreptimeDB v0.14's full-text indexing delivers the performance and flexibility needed for modern observability workloads.


About Greptime

GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.

  • GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.

  • GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.

  • GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.

🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.

加入我们的社区

获取 Greptime 最新更新,并与其他用户讨论。