Background
OceanBase, launched in 2010, is a natively distributed database independently developed by Ant Group. Its unified architecture that combines distributed scalability with centralized performance, delivers full Oracle/MySQL compatibility, supports diverse workloads including Transaction Processing (TP) and real-time Analytical Processing (AP), and natively integrates vector search and multi-modal data hybrid search capabilities, serving more than 2000 customers upgrade their database from different industries, including Financial Services, Telecom, Retail, Internet and more.
In 2022, OceanBase launched its cloud database service, OB Cloud, to help customers build modern data architectures and simplify their tech stack with an integrated cloud database architecture. Operating across 170+ availability zones in 50+ regions, OB Cloud leverages infrastructure from Alibaba Cloud, Huawei Cloud, Tencent Cloud, AWS, and Google Cloud to provide consistent global performance, meeting diverse business growth needs.
The Loki Performance Challenge
OB Cloud initially deployed Grafana Loki across multiple cloud environments to unify log storage and streamline operational experiments. Inspired by Prometheus, Loki is an efficient log aggregation system which indexes only log metadata (labels), not raw log content. Its support for object storage also contributes to lower storage costs, making it a popular choice. OB Cloud's log storage architecture based on Loki is shown below:

Fluent Bit agents deployed on each node collect application pod logs and ingest them into Loki. The log viewer invokes the log query service to retrieve logs based on search conditions (e.g. keywords) and render results. The log query service constructs Loki-compatible queries and executes them via Loki's API.
As workloads scaled, Loki's significant limitations emerged. Queries against large log volumes frequently timed out. Furthermore, Loki's indexing is restricted to labels, offering no acceleration for searching within the actual log body text. The query service had to restrict the default query ranges to just minutes.
Migrating to GreptimeDB
After evaluating alternatives, OB Cloud migrated to GreptimeDB for log management. In the new architecture, Fluent Bit agents write directly to GreptimeDB while the query service leverages GreptimeDB's SQL interface for retrieval. This transition yielded immediate improvements: queries that previously timed out on Loki now resolve consistently within sub-second to single-second latency, enabling users to search across hours or days of logs rather than minutes.

Technical Practices
Multi-Cloud Native Deployment Architecture
To support global major cloud vendor's like Alibaba Cloud, Huawei Cloud, Tencent Cloud, AWS, and Google Cloud, OB Cloud needs an internal log service that supports multi-cloud. Its GreptimeDB deployment architecture is illustrated below:

OB Cloud deploys dedicated GreptimeDB clusters within each cloud environment, directly integrating each cloud vendor's native object storage (S3, OSS, COS). Furthermore, GreptimeDB natively supports multi-cloud object storage, which is well-suited for OB Cloud's requirements, while its unified SQL interface simplified integration efforts. Combined with Kubernetes-native deployment and management, and its built-in dashboard component simplifies usage and debugging, significantly enhancing operational convenience.
Pipeline-Based Log Processing
Fluent Bit outputs JSON-formatted logs that GreptimeDB processes through customizable pipelines. These pipelines extract critical fields like hostnames and filenames into indexed columns for efficient filtering.
During implementation, we identified that some OB Cloud applications produce logs containing multiple new lines, making simple dissect
processor failed. For such type of log files, we used regex
processor to extract log fields.
processors:
- regex:
fields:
- message
patterns:
- ‘^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(\.\d+)?)[ ,](?P<message>(?s:.*))$’
ignore_missing: true
- date:
fields:
- message_timestamp
formats:
- ‘%Y-%m-%d %H:%M:%S%.3f’
- ‘%Y-%m-%d %H:%M:%S%.6f’
- ‘%Y-%m-%d %H:%M:%S%.9f’
- ‘%Y-%m-%d %H:%M:%S%.f’
- ‘%Y-%m-%d %H:%M:%S%’
timezone: ‘Asia/Shanghai’
ignore_missing: true
transform:
- fields:
- message
type: string
- fields:
- file
- host
type: string
index: tag
- fields:
- message_timestamp,timestamp
type: epoch, ns
index: timestamp
Tuning Fluent Bit
During traffic spikes, we observed mem_buf overlimit
warnings indicating backpressure in Fluent Bit's logs. After the diagnosis, we discovered that the issue was related to Fluent Bit's backpressure mechanism and our configuration.
- Fluent Bit uses an in-memory buffer (
mem_buf
) to hold collected logs. If this buffer fills up, Fluent Bit pauses collection, triggering themem buf overlimit
message – indicating backpressure. - The
mem_buf_limit
parameter controls the buffer size and thus Fluent Bit's memory usage. - Fluent Bit flushes data from the
mem_buf
to its outputs at intervals defined by theflush
parameter. - If
flush
is set too high relative tomem_buf_limit
, Fluent Bit might not send data fast enough to keep pace with high log generation rates, causing delays during peaks. - Log file rotation could cause delays because Fluent Bit's default interval for checking the file list is 60 seconds (
Refresh_Interval
). Reducing this interval minimizes collection latency after rotation.
Fine-tuning Fluent Bit's flush
, mem_buf_limit
, and Refresh_Interval
parameters proved highly effective in reducing log delays and backpressure occurrences.
Advanced Indexing Strategies
Unlike Loki's brute-force text scanning, GreptimeDB offers a more powerful indexing capability, enabling users to select the optimal index type for specific query patterns. A common pattern in OB Cloud involves searching log text for keywords. Loki could only perform brute-force text matching for this, which scaled poorly.
While GreptimeDB's brute-force text search speed is fast, it also provides indexes to accelerate phrase matching. Users can enable them as needed:
CREATE TABLE db_log (
ts TIMESTAMP TIME INDEX,
message TEXT FULLTEXT INDEX
);
Query using the matches_term
function to find logs containing system failure
:
SELECT * FROM db_log WHERE matches_term(message, 'system failure');
OB Cloud creates Indexes on relevant log text fields to accelerate keyword searches.
For structured logs, commonly searched fields can be extracted into dedicated columns. And Indexes (like secondary indexes) can then be created on these columns, allowing direct filtering and significantly boosting query speed.
Results: Scaling Without Compromise
The new log architecture powered by GreptimeDB is now live across all OB Cloud environments, processing hundreds of millions of log entries daily. The migration from Loki to GreptimeDB achieved remarkable performance and cost improvements: query response times improved by 10x, previously timeout-prone queries now execute in sub-second timeframes, and Total Cost of Ownership (TCO) realized a 30% reduction.
Enhanced Log Management at Scale
Log query response times and reliability have dramatically improved. By supporting diverse indexing and efficient keyword search, GreptimeDB enables faster location of specific logs within massive datasets, dramatically boosting troubleshooting efficiency. This enhancement makes it easier for OB Cloud users to manage large-scale business data, particularly during troubleshooting and performance monitoring.
Cloud-Native Deployment Simplified
GreptimeDB's cloud-native design and native object storage compatibility ensure operational flexibility in heterogeneous environments, simplifying OB Cloud's multi-cloud deployment and management. Crucially, it also allows OB Cloud to deliver a consistent service experience worldwide. For enterprise users, this multi-cloud flexibility enhances system resilience and scalability while maintaining high availability.
Improved User Experience and Scalability
Optimizations in OB Cloud's log processing, particularly the fine-tuning of Fluent Bit, have further boosted system scalability and stability. The seamless integration between GreptimeDB and Fluent Bit maintains efficient log collection and storage even under heavy-load scenarios. By tuning caching and ingestion configurations, OB Cloud ensures the logging system remains responsive during traffic spikes, which enhances the overall user experience.