欢迎参与 8 月 1 日中午 11 点的线上分享,了解 GreptimeDB 联合处理指标和日志的最新方案! 👉🏻 点击加入

Skip to content

The Pipeline Engine! Transforming Unstructured Logs into Actionable Insights

Raw logs often hide their most valuable insights behind messy, unstructured text. Traditional log processing approaches force painful compromises between storage efficiency, query performance, and data fidelity. GreptimeDB's Pipeline engine elegantly solves these challenges, transforming how organizations extract value from their log data.

The Log Processing Challenge

Consider this typical nginx access log:

plain
192.168.97.8 - - [15/Oct/2024:08:41:09 +0000] "GET /query/endpoint-8a68-48a4-8a4e-e92f9fcb0a38 HTTP/1.1" 200 664 "https://www.github.com" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:116.0) Gecko/20100101 Firefox/116.0"

Hidden within this string are valuable structured fields:

  • Client IP address
  • Request timestamp
  • HTTP method, path, and protocol
  • Status code and response size
  • Referrer and user agent information

Storing this as a raw string makes searching and analysis painfully inefficient. Yet extracting this structure traditionally required external tools like Logstash or complex ETL pipelines—until now.

GreptimeDB's Pipeline Engine: Built-in Log Intelligence

Introduced in v0.9, GreptimeDB's Pipeline engine brings log parsing and transformation directly into the database core:

How It Works: The Processor Chain

The Pipeline engine processes logs through configurable transformation stages:

  1. Dissect processors extract structured fields using pattern matching
  2. Date processors parse timestamps into standardized formats
  3. Transform stage converts data types (e.g., string to integer)

This process converts unstructured strings into neatly organized columns, ready for efficient querying and storage.

Declarative Configuration

Pipeline configurations use a simple, declarative format:

sql
processors:
  - dissect:
      fields:
        - line
      patterns:
        - '%{ip} %{? ignored} %{? ignored} [%{ts}] "%{method} %{path} %{protocol}" %{status} %{size} "%{referer}" "%{ua}"'
  - date:
      fields:
        - ts
      formats:
        - "%d/%b/%Y:%H:%M:%S %Z"
transform:
  - fields:
      - status
      - size
    type: int32
  - fields:
      - ip
      - method
      - path
      - protocol
      - referer
      - ua
    type: string
  - field: ts
    type: time
index: time

This approach makes complex log parsing accessible without requiring programming expertise.

Real-World Benefits

The Pipeline engine delivers transformative advantages for log management:

1. Dramatic Storage Efficiency

By extracting and typing fields properly, GreptimeDB can:

  • Apply column-specific compression algorithms
  • Store integers and timestamps in native formats
  • Eliminate redundant storage of repeated values

The result? Up to 5-10x better compression compared to raw string storage, dramatically reducing storage costs.

2. Query Performance Improvements

Structured data enables more efficient querying:

  • 50-100x faster filtering on extracted numeric fields
  • Native time-series functions on properly parsed timestamps
  • Elimination of expensive string parsing during query execution

Queries that previously scanned gigabytes of text now leverage optimized columnar operations.

3. Enhanced Analytics Capabilities

Structured log data unlocks powerful analytical capabilities:

  • Aggregate metrics by status code, path, or client IP
  • Calculate percentiles on response times
  • Join log data with metrics for deeper correlation

These insights would be impractical or impossible with raw string logs.

The Pipeline engine works seamlessly with GreptimeDB's full-text indexing, combining the best of both worlds:

  • Extract high-value fields for structured analysis
  • Maintain full-text search on message content
  • Blend precise field filtering with fuzzy text matching

This hybrid approach makes GreptimeDB uniquely powerful for comprehensive log analytics.

Getting Started

Ready to transform your approach to log processing? GreptimeDB's Pipeline engine is available now in v0.9 and later. Check out the comprehensive documentation to begin extracting more value from your log data with less infrastructure and complexity.

Stop treating your logs as opaque strings—unlock their structured insights with GreptimeDB today.


About Greptime

GreptimeDB is an open-source, cloud-native database purpose-built for real-time observability. Built in Rust and optimized for cloud-native environments, it provides unified storage and processing for metrics, logs, and traces—delivering sub-second insights from edge to cloud —at any scale.

  • GreptimeDB OSS – The open-sourced database for small to medium-scale observability and IoT use cases, ideal for personal projects or dev/test environments.

  • GreptimeDB Enterprise – A robust observability database with enhanced security, high availability, and enterprise-grade support.

  • GreptimeCloud – A fully managed, serverless DBaaS with elastic scaling and zero operational overhead. Built for teams that need speed, flexibility, and ease of use out of the box.

🚀 We’re open to contributors—get started with issues labeled good first issue and connect with our community.

GitHub | 🌐 Website | 📚 Docs

💬 Slack | 🐦 Twitter | 💼 LinkedIn

加入我们的社区

获取 Greptime 最新更新,并与其他用户讨论。