⭐ Open-Source OLAP Engine

StarRocks
Big Data Platform Design
Complete Guide

A next-generation, vectorized MPP OLAP database designed for sub-second analytics on petabyte-scale data — without pre-aggregation or data cubes.

Vectorized Engine
MPP Architecture
Sub-second Queries
PB-scale Storage
Real-time + Batch
📋 Jump to section:
What is StarRocks?

StarRocks (formerly DorisDB) is a next-generation, open-source, fully-vectorized MPP (Massively Parallel Processing) OLAP database engine. It was originally developed by the team behind Apache Doris at Baidu, later spun off as an independent project. It is now a graduated Apache Software Foundation top-level project known as Apache Doris (upstream), with StarRocks as its enterprise-grade fork with significant performance and feature improvements.

🏛️ Origin & Lineage
StarRocks originated from Apache Doris (previously Palo at Baidu). The core team forked it to build a more performant, cloud-native OLAP engine. StarRocks became open-source in 2021 under the Apache 2.0 license and has grown rapidly with enterprise adoption across Alibaba, JD.com, Meituan, and global tech companies.
Apache 2.0 MySQL Protocol CBO Optimizer
🎯 Core Purpose
StarRocks is purpose-built for high-concurrency, low-latency analytical queries on massive datasets. It supports OLAP, real-time analytics, and ad-hoc queries in a single unified engine — replacing the need for separate pre-computation layers like Kylin or Druid aggregation cubes.
OLAP Real-time Ad-hoc
💡 One-line definition: StarRocks is a vectorized MPP OLAP database that delivers sub-second SQL analytics on petabytes of data — supporting both real-time streaming ingestion and batch processing — without pre-computed cubes or aggregation trees.
📊 Analytics Types
• Interactive dashboard queries
• Ad-hoc exploratory analytics
• Real-time metrics monitoring
• Multidimensional OLAP analysis
• High-concurrency reporting
⚙️ Engine Type
• Fully vectorized execution
• Columnar storage format
• Cost-Based Optimizer (CBO)
• Pipeline execution engine
• Adaptive query scheduling
🔗 Compatibility
• MySQL wire protocol (port 9030)
• ANSI SQL-99 + extensions
• JDBC/ODBC drivers
• Spark/Flink connectors
• BI tool native support

⚙️
How it Works

StarRocks processes queries using a pipeline-based, fully vectorized MPP engine. A query enters through a Frontend (FE) node, gets parsed and optimized by the CBO, distributed as fragments to Backend (BE) nodes, executed in parallel with vectorized operators, and the results are merged and returned.

Query Execution Flow
🖥️
SQL Client
Sends SQL via MySQL protocol
🧠
FE Parser
Parse + analyze SQL AST
🔮
CBO Optimizer
Choose optimal execution plan
🗺️
Query Planner
Split into plan fragments
BE Executors
Vectorized parallel execution
📦
Results
Merge, return to client
🚀 Vectorized Execution Engine
Instead of processing one row at a time (like traditional engines), StarRocks processes data in batches (vectors) using SIMD (Single Instruction Multiple Data) CPU instructions. This means a single CPU instruction can process 4–16 values simultaneously, dramatically reducing overhead per row and maximizing cache efficiency. All operators — scan, filter, join, aggregation — are fully vectorized.
🔢 Cost-Based Optimizer (CBO)
StarRocks uses a sophisticated CBO that collects table/column statistics (row counts, NDV, histograms) and uses these to evaluate the cost of alternative query plans. It chooses optimal join orders, join algorithms (hash join, broadcast join, shuffle join), and operator push-downs. The CBO is inspired by Cascades framework used in enterprise databases like SQL Server.
🔄 Pipeline Execution Model
StarRocks v2.0+ introduced a fully pipelined execution model. Query plan fragments are broken into pipeline operators that process data in a streaming fashion. Operators are non-blocking — data flows continuously from scan → filter → join → agg without waiting for each stage to complete. This removes pipeline stalls and dramatically reduces latency for complex multi-stage queries.
🗄️ Columnar Storage (Segment)
Data is stored in columnar format (Segment files). Each segment contains multiple column files with zone maps, bitmap indexes, and Bloom filter indexes. During scan, only required columns are read. StarRocks also uses Short-circuit scan — when a query targets indexed columns, it skips irrelevant data pages entirely using per-page zone maps.
📂 Storage Models
StarRocks offers four distinct table types optimized for different workloads:
🔑 Duplicate Key
All rows stored as-is. Best for log/event data with flexible queries. No dedup or aggregation.
📊 Aggregate Key
Pre-aggregates metrics at load time. Best for metrics/KPI tables. SUM, MAX, MIN, REPLACE supported.
🔄 Unique Key
Deduplicates rows by key (last-write-wins). Best for CDC/upsert workloads. Supports DELETE.
🌐 Primary Key
True UPDATE/DELETE support with row-level MVCC. Best for real-time upsert from CDC streams.
🔥 Hot/Cold Tiering: StarRocks supports tiered storage. Hot data lives on local NVMe SSDs on BE nodes. Cold data can be offloaded to object storage (S3/HDFS) while remaining fully queryable. The engine uses Datacache (local disk/memory cache) to accelerate repeated queries on cold data.

🏗️
Internal Architecture

StarRocks uses a classic separation of Frontend (Metadata/Coordination) and Backend (Compute/Storage) nodes. In shared-nothing deployments, BE nodes own data locally. In shared-data (cloud-native) mode, storage is decoupled to object storage.

StarRocks Internal Node Architecture
Client Layer
💻
MySQL Client
Port 9030
📊
BI Tools
Tableau / Superset
🔧
Spark / Flink
Connector
🌐
REST API
Stream Load
Frontend (FE) Layer — Coordination & Metadata
👑
FE Leader
BDBJE Raft
🤝
FE Follower 1
Read replica
🤝
FE Follower 2
Read replica
🔮
CBO Optimizer
Inside FE
Backend (BE) Layer — Compute & Storage
BE Node 1
Execute + Storage
BE Node 2
Execute + Storage
BE Node 3
Execute + Storage
BE Node N
Horizontal scale
Storage Layer
💾
Local NVMe SSD
Hot data
☁️
S3 / HDFS
Cold / Shared data
🗂️
Segment Files
Columnar format
🧠 Frontend (FE) Responsibilities
SQL parsing & semantic analysis — builds AST and validates schema.

Query optimization — CBO generates and selects optimal physical plan.

Metadata management — stores all catalog, table, partition, and tablet metadata using BDB-JE (Berkeley DB Java Edition) with Raft-based consensus for HA.

Load scheduling — coordinates data ingestion jobs (Broker Load, Routine Load, Stream Load).
⚡ Backend (BE) Responsibilities
Data storage — owns tablets (data shards) in columnar Segment files on local disk or object storage.

Query execution — receives plan fragments from FE, executes vectorized operators (scan, filter, join, agg, sort).

Data exchange — shuffles intermediate data between BEs during joins/aggregations over the internal network.

Compaction — background merging of delta rowsets into base rowsets for read efficiency.
🗄️ Data Organization: Tablet → Rowset → Segment → Column
🗂️
Table
Partitioned by date/hash
📋
Partition
Range or List partition
💊
Tablet
Bucketed shard on BE
📦
Rowset
Write batch / delta
📄
Segment
Columnar .dat file
🔢
Column
Encoded + indexed

🏆
Why StarRocks is Exceptional

StarRocks consistently benchmarks at 3–10x faster than Hive, Presto/Trino, and ClickHouse on multi-table join analytics. Here's a deep look at its key strengths.

⚡ Performance: Sub-second at Scale
StarRocks achieves sub-second query latency on petabyte-scale data through multiple complementary techniques: full vectorization (SIMD), CBO-driven plan optimization, pipeline non-blocking execution, columnar scan with predicate pushdown, and intelligent data skipping via zone maps and bloom filters. In TPC-H and SSB benchmarks, StarRocks outperforms Presto/Trino by 3–10x.
🔄 Unified Real-time + Batch
StarRocks is the rare OLAP engine that handles both streaming and batch analytics natively. Data can be ingested from Kafka via Routine Load (real-time, lag <1s), from files via Broker Load/INSERT (batch), or via Flink/Spark connectors. The same table and query engine handles both freshly ingested real-time data and historical batch data seamlessly — no separate lambda architecture needed.
🔗 External Table / Data Lakehouse
StarRocks can directly query data in Hive, Iceberg, Delta Lake, Hudi, JDBC, Elasticsearch, and S3/HDFS as External Catalogs — without copying data. This makes it a powerful Lakehouse query engine. It can join internal StarRocks tables with external Iceberg/Hive tables in a single SQL query, enabling federation across the data lake and data warehouse.
💰 Cost Efficiency
Compared to Snowflake, BigQuery, or Redshift, StarRocks is open-source and self-managed. No data egress fees, no per-query costs. The hot/cold tiering capability means you keep hot data on fast SSDs and cold data on cheap object storage (S3), dramatically reducing infrastructure cost while maintaining query performance via the Datacache layer.
🔒 High Availability & No SPOF
FE nodes use Raft consensus (BDB-JE) for metadata replication (typically 3 FE nodes: 1 leader + 2 followers). BE data is replicated with a configurable replication factor (default 3). No external ZooKeeper or NameNode dependency. This shared-nothing, self-healing design means individual node failures are handled transparently with automatic re-routing and re-replication.
📈 High Concurrency
StarRocks supports thousands of concurrent query sessions. For dashboard and BI scenarios, Materialized Views (MV) can be created to pre-aggregate common query patterns. Queries on the MV are automatically rewritten by the optimizer — users don't need to know about the MVs. This enables hundreds of concurrent BI users hitting the same table cluster without degradation.
📊 Benchmark Facts: In Alibaba's internal benchmarks, StarRocks processed a 10TB TPC-DS benchmark ~5x faster than Presto. At JD.com, it serves 100,000+ daily queries on 10PB+ data with p99 latency under 3 seconds. Meituan migrated 80% of their Kylin OLAP workloads to StarRocks, reducing query time from minutes to seconds.
Key Feature Highlights
🧮 Fully Vectorized
🔮 CBO Optimizer
🔄 Pipeline Execution
📊 Materialized Views
🌐 External Catalogs
☁️ Shared-Data Mode
🔑 Primary Key Model
📡 Routine Load (Kafka)
🧊 Iceberg / Delta / Hudi
🔐 RBAC / Row-Level Security
📉 Tiered Hot/Cold Storage
⚡ SIMD Vectorization
🔗 Cross-Catalog JOIN
📦 MySQL Protocol
StarRocks vs. Alternatives
Feature StarRocks ClickHouse Presto/Trino Apache Druid
Multi-table JOINs ✅ Excellent ⚠️ Limited ✅ Good ❌ Poor
Real-time Ingestion ✅ <1s latency ✅ Good ❌ Query only ✅ Sub-second
Data Lake Query ✅ Native Catalogs ⚠️ Limited ✅ Excellent ❌ No
UPDATE / DELETE ✅ Primary Key ⚠️ Mutation ❌ No ⚠️ Segment
Materialized Views ✅ Auto-rewrite ✅ Manual ❌ No ⚠️ Rollup only
Open Source License ✅ Apache 2.0 ✅ Apache 2.0 ✅ Apache 2.0 ✅ Apache 2.0

🏭
Production Setup at Petabyte Scale

Deploying StarRocks at PB scale requires careful hardware sizing, replication topology, partitioning strategy, and operational tooling. Below is a production-grade reference deployment.

💻 Hardware Sizing Reference (PB-scale)
RoleCountCPURAMStorageNetwork
FE Leader 1 16–32 cores 64–128 GB 500GB SSD (meta) 25 GbE
FE Follower 2 16–32 cores 64–128 GB 500GB SSD (meta) 25 GbE
BE Node (hot) 20–100+ 32–64 cores 256–512 GB 8x 3.8TB NVMe SSD 25–100 GbE
S3 / HDFS Unlimited (cold data) 10 GbE min
Step-by-Step Production Setup
01
Deploy Frontend (FE) Cluster
Deploy 3 FE nodes with Raft consensus. One leader elected automatically. Use dedicated SSD for metadata (BDB-JE journal). Configure priority_networks to specify the right network interface.
fe.conf # fe/conf/fe.conf
meta_dir = /data/starrocks/fe/meta
priority_networks = 10.0.0.0/24
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
sys_log_level = INFO
# Start leader first, then add followers via SQL:
# ALTER SYSTEM ADD FOLLOWER "fe2:9010";
02
Deploy Backend (BE) Nodes
Deploy BE nodes with NVMe SSDs. Configure multiple storage paths for parallelism. BE nodes self-register with FE via heartbeat after you add them.
be.conf # be/conf/be.conf
storage_root_path = /nvme0/starrocks;/nvme1/starrocks;/nvme2/starrocks
priority_networks = 10.0.0.0/24
be_port = 9060
be_http_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060
mem_limit = 80%
# Register BE to FE cluster:
# ALTER SYSTEM ADD BACKEND "be1:9050";
03
Design Table Schema with Partitioning
For PB-scale tables, use two-level partitioning: Range partition by date (enables partition pruning), then Hash bucket by high-cardinality dimension. Choose bucket count based on data size (target ~1–10 GB per tablet). Use Primary Key model for CDC data.
SQL CREATE TABLE events (
  event_date DATE NOT NULL,
  user_id BIGINT NOT NULL,
  event_type VARCHAR(64),
  properties JSON,
  amount DECIMAL(18,4),
  created_at DATETIME
) ENGINE=OLAP
PRIMARY KEY (event_date, user_id)
PARTITION BY RANGE(event_date) (
  START ("2024-01-01") END ("2026-12-31")
  EVERY INTERVAL 1 MONTH
)
DISTRIBUTED BY HASH(user_id) BUCKETS 128
PROPERTIES (
  "replication_num" = "3",
  "storage_medium" = "SSD",
  "storage_cooldown_ttl" = "30 day"
);
04
Configure Real-time Kafka Ingestion (Routine Load)
StarRocks Routine Load creates a persistent consumer group on Kafka. It auto-commits offsets and handles failure/recovery internally. Throughput can be tuned with desired_concurrent_number.
SQL CREATE ROUTINE LOAD events_kafka_load ON events
COLUMNS TERMINATED BY ",",
COLUMNS (event_date, user_id, event_type, properties, amount, created_at)
PROPERTIES (
  "desired_concurrent_number" = "8",
  "max_batch_interval" = "10",
  "max_error_number" = "1000",
  "format" = "json"
)
FROM KAFKA (
  "kafka_broker_list" = "broker1:9092,broker2:9092",
  "kafka_topic" = "prod.events",
  "kafka_partitions" = "0,1,2,3,4,5,6,7"
);
05
Create Async Materialized Views for High-Concurrency BI
For PB tables, create MVs on common aggregation patterns. Queries touching the base table will be automatically rewritten by the optimizer to hit the MV — giving 100x speedup for dashboard queries without changing application code.
SQL CREATE MATERIALIZED VIEW mv_daily_events
DISTRIBUTED BY HASH(event_date) BUCKETS 32
REFRESH ASYNC EVERY(INTERVAL 5 MINUTE)
AS
SELECT
  event_date,
  event_type,
  COUNT(*) AS event_count,
  SUM(amount) AS total_amount,
  COUNT(DISTINCT user_id) AS dau
FROM events
GROUP BY event_date, event_type;
06
Configure Resource Groups & Query Queuing
At PB scale with hundreds of concurrent users, use Resource Groups to isolate workloads. Assign ETL/batch jobs to low-priority groups and BI dashboards to high-priority groups. Configure concurrency limits and CPU quotas per group to prevent noisy-neighbor issues.
SQL CREATE RESOURCE GROUP bi_group
TO (user=bi_user, role=analyst)
WITH (
  "cpu_core_limit" = "32",
  "mem_limit" = "40%",
  "concurrency_limit" = "50",
  "type" = "normal"
);

CREATE RESOURCE GROUP etl_group
TO (user=etl_user)
WITH (
  "cpu_core_limit" = "8",
  "mem_limit" = "20%",
  "type" = "slow_query"
);
☁️ Shared-Data Mode (Cloud Native): In StarRocks 3.0+, the shared-data deployment separates Compute Nodes (CN) from storage. CNs are stateless and auto-scalable. Data lives on S3/GCS/HDFS. Local disk is used only as Datacache. This is ideal for cloud Kubernetes deployments where compute scales independently of storage.

🌐
Open-Source Ecosystem Fit

StarRocks does not operate in isolation — it's most powerful as the query and serving layer in a broader big data platform. Below are the best open-source frameworks for each layer of the data stack.

BATCH PROCESSING
Apache Spark
Batch ETL Engine
Best fit for large-scale batch transformations (ELT). Use StarRocks Spark Connector to write transformed results directly into StarRocks tables. Spark handles the heavy data shuffling; StarRocks serves the results.
ETL/ELTPySpark
dbt (data build tool)
SQL Transformation
dbt-starrocks adapter allows SQL-based data modeling, testing, and documentation directly on StarRocks. Use dbt for in-warehouse ELT transformations with version-controlled models and lineage tracking.
SQL ModelsLineage
Apache Airflow
Workflow Orchestration
Orchestrates batch pipelines: trigger Spark jobs, wait for completion, load into StarRocks, run dbt models. The StarRocksHook connects directly via MySQL protocol. Use DAGs to schedule periodic bulk loads.
SchedulingDAGs
STREAMING
Apache Flink
Stream Processing
Best pairing for StarRocks. The flink-connector-starrocks uses the StreamLoad protocol to sink data in micro-batches with exactly-once semantics. Flink handles complex CEP, windowing, enrichment; StarRocks stores and serves the output.
Exactly-OnceCEP
Apache Kafka
Message Bus
Kafka acts as the streaming backbone. StarRocks consumes directly from Kafka via Routine Load (built-in consumer, no extra infra). Kafka decouples producers from StarRocks and provides buffering for burst traffic.
Pub/SubRoutine Load
Debezium + Kafka
CDC (Change Data Capture)
Debezium captures row-level changes from MySQL/PostgreSQL/MongoDB and publishes CDC events to Kafka. Flink or StarRocks Routine Load consumes the CDC stream into StarRocks Primary Key tables for real-time sync.
CDCUpsert
STORAGE / LAKE
Apache Iceberg
Table Format (Lake)
Preferred lake table format for StarRocks. StarRocks can query Iceberg tables via External Catalog with full partition pruning and predicate pushdown. Supports time-travel queries. Spark/Flink write to Iceberg; StarRocks queries it directly.
ACIDTime-travel
Apache Hive / HDFS
Legacy Data Lake
StarRocks supports Hive Metastore and HDFS natively via Hive External Catalog. Query existing Hive ORC/Parquet tables without migration. Ideal for gradually migrating from Hive to StarRocks as the query engine.
ORC/ParquetFederation
Apache Hudi
Incremental Table Format
Hudi's MOR (Merge-On-Read) tables provide efficient upsert/delete in the data lake. StarRocks can query Hudi tables via its External Catalog, enabling unified lakehouse analytics across historical Hudi data and real-time StarRocks data.
MOR/COWIncremental
BI / VISUALIZATION
Apache Superset
Open-Source BI
Best open-source BI for StarRocks. Superset connects via MySQL dialect or SQLAlchemy connector. Supports interactive dashboards, SQL Lab ad-hoc queries, alerts, and chart types. Native StarRocks dialect for partition-aware query optimization.
DashboardsSQL Lab
Grafana
Metrics & Monitoring
Grafana connects to StarRocks via MySQL data source plugin. Ideal for operational dashboards and real-time metric monitoring. Works well for time-series metrics stored in StarRocks from Kafka streaming pipelines.
Time-seriesAlerts
Metabase
Self-serve Analytics
User-friendly self-serve BI tool. Connects via MySQL protocol. Non-technical users can build dashboards with drag-and-drop. Metabase's automatic query generation works well with StarRocks' fast response times.
Self-serveNo-code
CATALOG / GOVERNANCE
Apache Atlas
Data Catalog & Lineage
Tracks metadata, data lineage, and governance policies. Integrates with Hive Metastore (shared by StarRocks External Catalog). Enables end-to-end lineage from source systems through StarRocks to BI dashboards.
Apache Ranger
Security & Access Control
Ranger provides centralized security policy management. StarRocks can integrate with Ranger for row/column-level security policies, RBAC, and audit logging — essential for enterprise PB-scale deployments.
OpenMetadata
Modern Data Catalog
Modern open-source data catalog with native StarRocks connector. Discovers tables, columns, and generates data quality profiling reports. Enables self-serve data discovery for analysts.

🗺️
Full Big Data Architecture with StarRocks

This is the recommended full-stack open-source big data architecture with StarRocks as the core analytical engine — covering data sources, ingestion, processing, storage, query, and BI layers.

🏛️ Complete Open-Source Big Data Platform Architecture
① Data Sources
🗄️RDBMSMySQL/PG
📱App EventsMobile/Web
📡IoT SensorsDevices
📂File UploadsCSV/JSON
🌐APIs / SaaS3rd Party
↓ Ingestion
② Ingestion Layer
🔄DebeziumCDC
📨Apache KafkaStream Bus
🚀Stream LoadHTTP Direct
📤Broker LoadBatch Files
↓ Processing
③ Processing Layer
Apache FlinkStreaming ETL
🌊Apache SparkBatch ETL
🔧dbtSQL Transform
📅AirflowOrchestration
↓ Store / Serve
④ Storage Layer
🧊Apache IcebergData Lake
☁️S3 / HDFSObject Store
🗂️Apache HiveLegacy Lake
🌊Apache HudiIncremental
↓ Query & Serve
⭐ CORE QUERY ENGINE (StarRocks)
🧠FE NodesCBO + Metadata
BE NodesVectorized Exec
📊Materialized ViewsAuto-rewrite
🌐External CatalogsIceberg/Hive/Delta
↓ Consume
⑥ BI & Consumption Layer
📊Apache SupersetDashboards
📈GrafanaMetrics
💻Jupyter / PythonData Science
🔌APIs / AppsDownstream
↓ Govern
⑦ Governance & Security Layer
🗺️Apache AtlasData Catalog
🔐Apache RangerSecurity
🎯OpenMetadataDiscovery
📋Audit LogsCompliance
🏆 Why this architecture wins: StarRocks sits at the center as a unified OLAP engine — serving fresh streaming data from Kafka via Routine Load, querying historical data in Iceberg/Hive via External Catalogs, and serving hundreds of concurrent BI users via Materialized View auto-rewrite. Flink handles complex CEP streaming; Spark handles heavy batch; Airflow orchestrates; Superset/Grafana visualize — all without data silos or extra serving layers.
Framework Role Summary
LayerFrameworkRole in PlatformInterface with StarRocks
BatchApache SparkLarge-scale data transformation, ML feature prepspark-connector-starrocks (Sink/Source)
BatchdbtSQL-based in-warehouse ELT transformationsdbt-starrocks adapter (MySQL dialect)
BatchApache AirflowPipeline orchestration, schedulingStarRocksHook, MySQLOperator
StreamingApache FlinkReal-time ETL, CEP, windowed aggregationflink-connector-starrocks (StreamLoad)
StreamingApache KafkaMessage bus, event streaming backboneRoutine Load (native consumer)
StreamingDebeziumCDC from RDBMS to KafkaVia Kafka → Routine Load / Flink
StorageApache IcebergOpen table format for data lakeExternal Catalog (Iceberg Catalog)
StorageS3 / HDFSRaw file storage, cold data tierBroker Load, Shared-data mode
StorageApache HudiIncremental upsert table formatExternal Catalog (Hudi Catalog)
QueryStarRocks FECBO optimization, metadata, SQL parsingCore engine
QueryStarRocks BEVectorized execution, columnar storageCore engine
BIApache SupersetSelf-serve dashboards, SQL LabMySQL connector / SQLAlchemy
BIGrafanaReal-time operational metrics dashboardsMySQL data source plugin
GovernanceApache AtlasData lineage and catalogVia Hive Metastore integration
GovernanceApache RangerCentralized security and RBACRanger StarRocks plugin
⭐ StarRocks Big Data Platform Guide
Built for engineers designing petabyte-scale analytics platforms.
StarRocks is licensed under Apache 2.0 — github.com/StarRocks/starrocks