Full-Text Search Meets Time-Series! GreptimeDB's Unified Observability Approach

Traditional observability stacks force you into uncomfortable trade-offs. Fast metrics queries or powerful log search. Real-time performance or historical analysis. GreptimeDB v0.14 shatters these limitations with advanced full-text indexing that seamlessly integrates with time-series data.

The Problem with Fragmented Observability

Most organizations run something like this:

Prometheus for metrics
Elasticsearch for logs
Jaeger for traces

Each system excels individually, but correlation across data types becomes a nightmare. When your application fails at 3 AM, you're frantically jumping between dashboards, trying to piece together the story.

GreptimeDB's unified approach changes this fundamentally. Metrics, logs, and traces share the same storage engine, query language, and operational model.

Full-Text Search Evolution in v0.14

The latest release introduces dual-backend full-text indexing that adapts to your specific use case:

Bloom Filter Backend

Perfect for general-purpose log search:

Low storage overhead: ~10% of raw data size
Consistent performance across query patterns
Stable resource usage for production workloads

Tantivy Backend

Optimized for high-selectivity queries:

5x faster for unique identifier searches (trace IDs, user IDs)
Inverted index architecture for precise matching
Higher storage cost but unmatched precision

The new matches_term function and @@ operator make log analysis intuitive:

sql

-- Find all error-related entries
SELECT * FROM logs WHERE message @@ 'error' OR message @@ 'fail';

-- Combine with time-series filtering
SELECT * FROM logs 
WHERE ts > '2024-01-01' 
  AND message @@ 'timeout' 
  AND service = 'api-gateway';

Real-World Log Processing Performance

Benchmark results show GreptimeDB outperforming traditional solutions:

Ingestion Performance (rows/second):

GreptimeDB: 120,000-130,000
ClickHouse: 150,000
Elasticsearch: 40,000

Resource Efficiency:

GreptimeDB: 400MB memory usage
ClickHouse: 600MB memory usage
Elasticsearch: 12GB+ memory usage

The 32x memory efficiency advantage over Elasticsearch is particularly striking for resource-constrained environments.

Compression That Actually Matters

GreptimeDB achieves 13% storage usage compared to raw log data in structured mode. This isn't just about saving disk space – it's about reducing bandwidth costs in distributed deployments and enabling longer data retention periods.

The Pipeline engine automatically parses unstructured logs into optimized columnar format:

yaml

processors:
  - dissect:
      fields:
        - line
      patterns:
        - '%{ip} - - [%{ts}] "%{method} %{path}" %{status} %{size}'
  - date:
      fields:
        - ts
      formats:
        - "%d/%b/%Y:%H:%M:%S %Z"

This transformation improves both query performance and storage efficiency while maintaining the flexibility to handle diverse log formats.

The Observability Data Model Revolution

Here's what makes this approach powerful. Instead of maintaining separate schemas for metrics and logs, GreptimeDB uses a unified table structure:

sql

CREATE TABLE observability_data (
  service STRING,
  environment STRING,
  message STRING FULLTEXT,
  level STRING INVERTED INDEX,
  latency_ms DOUBLE,
  error_count INT,
  ts TIMESTAMP,
  PRIMARY KEY(service, environment),
  TIME INDEX(ts)
);

Single queries can now correlate metrics with log events:

sql

SELECT 
  service,
  AVG(latency_ms) as avg_latency,
  COUNT(*) as log_entries
FROM observability_data 
WHERE ts > now() - INTERVAL '1 hour'
  AND (latency_ms > 1000 OR matches(message, 'error'))
GROUP BY service;

Advanced Features Beyond Basic Search

Vector search capabilities in v0.10+ enable semantic log analysis. Find logs with similar meanings even when exact wording differs:

sql

SELECT * FROM logs 
WHERE vec_dot_product(embedding, query_vector) > 0.8
ORDER BY similarity DESC;

This is particularly valuable for anomaly detection and incident pattern recognition.

Performance Optimization Strategies

Cold vs. Hot Query Optimization:

Cold queries: GreptimeDB's object storage integration shines here
Hot queries: In-memory caching and write buffers provide sub-second response times

Partitioning by service or environment enables massive scale-out while maintaining query performance.

Ready to unify your observability stack? GreptimeDB's full-text search capabilities represent the next evolution in observability databases – where time-series analytics and log search finally work together seamlessly.

About Greptime

GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.

GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.
GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.
GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.

🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.

The Problem with Fragmented Observability ​

Full-Text Search Evolution in v0.14 ​

Bloom Filter Backend ​

Tantivy Backend ​

Real-World Log Processing Performance ​

Compression That Actually Matters ​

The Observability Data Model Revolution ​

Advanced Features Beyond Basic Search ​

Performance Optimization Strategies ​

About Greptime ​

加入我们的社区