StarRocks — Big Data Platform Design Guide

⭐

What is StarRocks?

StarRocks (formerly DorisDB) is a next-generation, open-source, fully-vectorized MPP (Massively Parallel Processing) OLAP database engine. It was originally developed by the team behind Apache Doris at Baidu, later spun off as an independent project. It is now a graduated Apache Software Foundation top-level project known as Apache Doris (upstream), with StarRocks as its enterprise-grade fork with significant performance and feature improvements.

🏛️ Origin & Lineage

StarRocks originated from Apache Doris (previously Palo at Baidu). The core team forked it to build a more performant, cloud-native OLAP engine. StarRocks became open-source in 2021 under the Apache 2.0 license and has grown rapidly with enterprise adoption across Alibaba, JD.com, Meituan, and global tech companies.

Apache 2.0 MySQL Protocol CBO Optimizer

🎯 Core Purpose

StarRocks is purpose-built for high-concurrency, low-latency analytical queries on massive datasets. It supports OLAP, real-time analytics, and ad-hoc queries in a single unified engine — replacing the need for separate pre-computation layers like Kylin or Druid aggregation cubes.

OLAP Real-time Ad-hoc

      💡
      One-line definition: StarRocks is a vectorized MPP OLAP database that delivers sub-second SQL analytics on petabytes of data — supporting both real-time streaming ingestion and batch processing — without pre-computed cubes or aggregation trees.
    

📊 Analytics Types

• Interactive dashboard queries
• Ad-hoc exploratory analytics
• Real-time metrics monitoring
• Multidimensional OLAP analysis
• High-concurrency reporting

⚙️ Engine Type

• Fully vectorized execution
• Columnar storage format
• Cost-Based Optimizer (CBO)
• Pipeline execution engine
• Adaptive query scheduling

🔗 Compatibility

• MySQL wire protocol (port 9030)
• ANSI SQL-99 + extensions
• JDBC/ODBC drivers
• Spark/Flink connectors
• BI tool native support

⚙️

How it Works

StarRocks processes queries using a pipeline-based, fully vectorized MPP engine. A query enters through a Frontend (FE) node, gets parsed and optimized by the CBO, distributed as fragments to Backend (BE) nodes, executed in parallel with vectorized operators, and the results are merged and returned.

Query Execution Flow

🖥️

SQL Client

Sends SQL via MySQL protocol

🧠

FE Parser

Parse + analyze SQL AST

🔮

CBO Optimizer

Choose optimal execution plan

🗺️

Query Planner

Split into plan fragments

⚡

BE Executors

Vectorized parallel execution

📦

Results

Merge, return to client

🚀 Vectorized Execution Engine

Instead of processing one row at a time (like traditional engines), StarRocks processes data in batches (vectors) using SIMD (Single Instruction Multiple Data) CPU instructions. This means a single CPU instruction can process 4–16 values simultaneously, dramatically reducing overhead per row and maximizing cache efficiency. All operators — scan, filter, join, aggregation — are fully vectorized.

🔢 Cost-Based Optimizer (CBO)

StarRocks uses a sophisticated CBO that collects table/column statistics (row counts, NDV, histograms) and uses these to evaluate the cost of alternative query plans. It chooses optimal join orders, join algorithms (hash join, broadcast join, shuffle join), and operator push-downs. The CBO is inspired by Cascades framework used in enterprise databases like SQL Server.

🔄 Pipeline Execution Model

StarRocks v2.0+ introduced a fully pipelined execution model. Query plan fragments are broken into pipeline operators that process data in a streaming fashion. Operators are non-blocking — data flows continuously from scan → filter → join → agg without waiting for each stage to complete. This removes pipeline stalls and dramatically reduces latency for complex multi-stage queries.

🗄️ Columnar Storage (Segment)

Data is stored in columnar format (Segment files). Each segment contains multiple column files with zone maps, bitmap indexes, and Bloom filter indexes. During scan, only required columns are read. StarRocks also uses Short-circuit scan — when a query targets indexed columns, it skips irrelevant data pages entirely using per-page zone maps.

📂 Storage Models

StarRocks offers four distinct table types optimized for different workloads:

🔑 Duplicate Key

All rows stored as-is. Best for log/event data with flexible queries. No dedup or aggregation.

📊 Aggregate Key

Pre-aggregates metrics at load time. Best for metrics/KPI tables. SUM, MAX, MIN, REPLACE supported.

🔄 Unique Key

Deduplicates rows by key (last-write-wins). Best for CDC/upsert workloads. Supports DELETE.

🌐 Primary Key

True UPDATE/DELETE support with row-level MVCC. Best for real-time upsert from CDC streams.

      🔥
      Hot/Cold Tiering: StarRocks supports tiered storage. Hot data lives on local NVMe SSDs on BE nodes. Cold data can be offloaded to object storage (S3/HDFS) while remaining fully queryable. The engine uses Datacache (local disk/memory cache) to accelerate repeated queries on cold data.
    

🏗️

Internal Architecture

StarRocks uses a classic separation of Frontend (Metadata/Coordination) and Backend (Compute/Storage) nodes. In shared-nothing deployments, BE nodes own data locally. In shared-data (cloud-native) mode, storage is decoupled to object storage.

StarRocks Internal Node Architecture

Client Layer

💻

MySQL Client
Port 9030

📊

BI Tools
Tableau / Superset

🔧

Spark / Flink
Connector

🌐

REST API
Stream Load

↕

Frontend (FE) Layer — Coordination & Metadata

👑

FE Leader
BDBJE Raft

🤝

FE Follower 1
Read replica

🤝

FE Follower 2
Read replica

🔮

CBO Optimizer
Inside FE

↕

Backend (BE) Layer — Compute & Storage

⚡

BE Node 1
Execute + Storage

⚡

BE Node 2
Execute + Storage

⚡

BE Node 3
Execute + Storage

⚡

BE Node N
Horizontal scale

↕

Storage Layer

💾

Local NVMe SSD
Hot data

☁️

S3 / HDFS
Cold / Shared data

🗂️

Segment Files
Columnar format

🧠 Frontend (FE) Responsibilities

SQL parsing & semantic analysis — builds AST and validates schema.

Query optimization — CBO generates and selects optimal physical plan.

Metadata management — stores all catalog, table, partition, and tablet metadata using BDB-JE (Berkeley DB Java Edition) with Raft-based consensus for HA.

Load scheduling — coordinates data ingestion jobs (Broker Load, Routine Load, Stream Load).

⚡ Backend (BE) Responsibilities

Data storage — owns tablets (data shards) in columnar Segment files on local disk or object storage.

Query execution — receives plan fragments from FE, executes vectorized operators (scan, filter, join, agg, sort).

Data exchange — shuffles intermediate data between BEs during joins/aggregations over the internal network.

Compaction — background merging of delta rowsets into base rowsets for read efficiency.

🗄️ Data Organization: Tablet → Rowset → Segment → Column

🗂️

Table

Partitioned by date/hash

📋

Partition

Range or List partition

💊

Tablet

Bucketed shard on BE

📦

Rowset

Write batch / delta

📄

Segment

Columnar .dat file

🔢

Column

Encoded + indexed

🏆

Why StarRocks is Exceptional

StarRocks consistently benchmarks at 3–10x faster than Hive, Presto/Trino, and ClickHouse on multi-table join analytics. Here's a deep look at its key strengths.

⚡ Performance: Sub-second at Scale

StarRocks achieves sub-second query latency on petabyte-scale data through multiple complementary techniques: full vectorization (SIMD), CBO-driven plan optimization, pipeline non-blocking execution, columnar scan with predicate pushdown, and intelligent data skipping via zone maps and bloom filters. In TPC-H and SSB benchmarks, StarRocks outperforms Presto/Trino by 3–10x.

🔄 Unified Real-time + Batch

StarRocks is the rare OLAP engine that handles both streaming and batch analytics natively. Data can be ingested from Kafka via Routine Load (real-time, lag <1s), from files via Broker Load/INSERT (batch), or via Flink/Spark connectors. The same table and query engine handles both freshly ingested real-time data and historical batch data seamlessly — no separate lambda architecture needed.

🔗 External Table / Data Lakehouse

StarRocks can directly query data in Hive, Iceberg, Delta Lake, Hudi, JDBC, Elasticsearch, and S3/HDFS as External Catalogs — without copying data. This makes it a powerful Lakehouse query engine. It can join internal StarRocks tables with external Iceberg/Hive tables in a single SQL query, enabling federation across the data lake and data warehouse.

💰 Cost Efficiency

Compared to Snowflake, BigQuery, or Redshift, StarRocks is open-source and self-managed. No data egress fees, no per-query costs. The hot/cold tiering capability means you keep hot data on fast SSDs and cold data on cheap object storage (S3), dramatically reducing infrastructure cost while maintaining query performance via the Datacache layer.

🔒 High Availability & No SPOF

FE nodes use Raft consensus (BDB-JE) for metadata replication (typically 3 FE nodes: 1 leader + 2 followers). BE data is replicated with a configurable replication factor (default 3). No external ZooKeeper or NameNode dependency. This shared-nothing, self-healing design means individual node failures are handled transparently with automatic re-routing and re-replication.

📈 High Concurrency

StarRocks supports thousands of concurrent query sessions. For dashboard and BI scenarios, Materialized Views (MV) can be created to pre-aggregate common query patterns. Queries on the MV are automatically rewritten by the optimizer — users don't need to know about the MVs. This enables hundreds of concurrent BI users hitting the same table cluster without degradation.

      📊
      Benchmark Facts: In Alibaba's internal benchmarks, StarRocks processed a 10TB TPC-DS benchmark ~5x faster than Presto. At JD.com, it serves 100,000+ daily queries on 10PB+ data with p99 latency under 3 seconds. Meituan migrated 80% of their Kylin OLAP workloads to StarRocks, reducing query time from minutes to seconds.
    

Key Feature Highlights

🧮 Fully Vectorized

🔮 CBO Optimizer

🔄 Pipeline Execution

📊 Materialized Views

🌐 External Catalogs

☁️ Shared-Data Mode

🔑 Primary Key Model

📡 Routine Load (Kafka)

🧊 Iceberg / Delta / Hudi

🔐 RBAC / Row-Level Security

📉 Tiered Hot/Cold Storage

⚡ SIMD Vectorization

🔗 Cross-Catalog JOIN

📦 MySQL Protocol

StarRocks vs. Alternatives

Feature	StarRocks	ClickHouse	Presto/Trino	Apache Druid
Multi-table JOINs	✅ Excellent	⚠️ Limited	✅ Good	❌ Poor
Real-time Ingestion	✅ <1s latency	✅ Good	❌ Query only	✅ Sub-second
Data Lake Query	✅ Native Catalogs	⚠️ Limited	✅ Excellent	❌ No
UPDATE / DELETE	✅ Primary Key	⚠️ Mutation	❌ No	⚠️ Segment
Materialized Views	✅ Auto-rewrite	✅ Manual	❌ No	⚠️ Rollup only
Open Source License	✅ Apache 2.0	✅ Apache 2.0	✅ Apache 2.0	✅ Apache 2.0

🏭

Production Setup at Petabyte Scale

Deploying StarRocks at PB scale requires careful hardware sizing, replication topology, partitioning strategy, and operational tooling. Below is a production-grade reference deployment.

💻 Hardware Sizing Reference (PB-scale)

Role	Count	CPU	RAM	Storage	Network
FE Leader	1	16–32 cores	64–128 GB	500GB SSD (meta)	25 GbE
FE Follower	2	16–32 cores	64–128 GB	500GB SSD (meta)	25 GbE
BE Node (hot)	20–100+	32–64 cores	256–512 GB	8x 3.8TB NVMe SSD	25–100 GbE
S3 / HDFS	—	—	—	Unlimited (cold data)	10 GbE min

Step-by-Step Production Setup

01

Deploy Frontend (FE) Cluster

Deploy 3 FE nodes with Raft consensus. One leader elected automatically. Use dedicated SSD for metadata (BDB-JE journal). Configure priority_networks to specify the right network interface.

          fe.conf
          # fe/conf/fe.conf

          meta_dir = /data/starrocks/fe/meta

          priority_networks = 10.0.0.0/24

          http_port = 8030

          rpc_port = 9020

          query_port = 9030

          edit_log_port = 9010

          sys_log_level = INFO

          # Start leader first, then add followers via SQL:

          # ALTER SYSTEM ADD FOLLOWER "fe2:9010";

02

Deploy Backend (BE) Nodes

Deploy BE nodes with NVMe SSDs. Configure multiple storage paths for parallelism. BE nodes self-register with FE via heartbeat after you add them.

          be.conf
          # be/conf/be.conf

          storage_root_path = /nvme0/starrocks;/nvme1/starrocks;/nvme2/starrocks

          priority_networks = 10.0.0.0/24

          be_port = 9060

          be_http_port = 8040

          heartbeat_service_port = 9050

          brpc_port = 8060

          mem_limit = 80%

          # Register BE to FE cluster:

          # ALTER SYSTEM ADD BACKEND "be1:9050";

03

Design Table Schema with Partitioning

For PB-scale tables, use two-level partitioning: Range partition by date (enables partition pruning), then Hash bucket by high-cardinality dimension. Choose bucket count based on data size (target ~1–10 GB per tablet). Use Primary Key model for CDC data.

          SQL
          CREATE TABLE events (

            event_date  DATE         NOT NULL,

            user_id     BIGINT       NOT NULL,

            event_type  VARCHAR(64),

            properties  JSON,

            amount      DECIMAL(18,4),

            created_at  DATETIME

          ) ENGINE=OLAP

          PRIMARY KEY (event_date, user_id)

          PARTITION BY RANGE(event_date) (

            START ("2024-01-01") END ("2026-12-31")

            EVERY INTERVAL 1 MONTH

          )

          DISTRIBUTED BY HASH(user_id) BUCKETS 128

          PROPERTIES (

            "replication_num" = "3",

            "storage_medium" = "SSD",

            "storage_cooldown_ttl" = "30 day"

          );

04

Configure Real-time Kafka Ingestion (Routine Load)

StarRocks Routine Load creates a persistent consumer group on Kafka. It auto-commits offsets and handles failure/recovery internally. Throughput can be tuned with desired_concurrent_number.

          SQL
          CREATE ROUTINE LOAD events_kafka_load ON events

          COLUMNS TERMINATED BY ",",

          COLUMNS (event_date, user_id, event_type, properties, amount, created_at)

          PROPERTIES (

            "desired_concurrent_number" = "8",

            "max_batch_interval" = "10",

            "max_error_number" = "1000",

            "format" = "json"

          )

          FROM KAFKA (

            "kafka_broker_list" = "broker1:9092,broker2:9092",

            "kafka_topic" = "prod.events",

            "kafka_partitions" = "0,1,2,3,4,5,6,7"

          );

05

Create Async Materialized Views for High-Concurrency BI

For PB tables, create MVs on common aggregation patterns. Queries touching the base table will be automatically rewritten by the optimizer to hit the MV — giving 100x speedup for dashboard queries without changing application code.

          SQL
          CREATE MATERIALIZED VIEW mv_daily_events

          DISTRIBUTED BY HASH(event_date) BUCKETS 32

          REFRESH ASYNC EVERY(INTERVAL 5 MINUTE)

          AS

          SELECT

            event_date,

            event_type,

            COUNT(*) AS event_count,

            SUM(amount) AS total_amount,

            COUNT(DISTINCT user_id) AS dau

          FROM events

          GROUP BY event_date, event_type;

06

Configure Resource Groups & Query Queuing

At PB scale with hundreds of concurrent users, use Resource Groups to isolate workloads. Assign ETL/batch jobs to low-priority groups and BI dashboards to high-priority groups. Configure concurrency limits and CPU quotas per group to prevent noisy-neighbor issues.

          SQL
          CREATE RESOURCE GROUP bi_group

          TO (user=bi_user, role=analyst)

          WITH (

            "cpu_core_limit" = "32",

            "mem_limit" = "40%",

            "concurrency_limit" = "50",

            "type" = "normal"

          );

          CREATE RESOURCE GROUP etl_group

          TO (user=etl_user)

          WITH (

            "cpu_core_limit" = "8",

            "mem_limit" = "20%",

            "type" = "slow_query"

          );

      ☁️
      Shared-Data Mode (Cloud Native): In StarRocks 3.0+, the shared-data deployment separates Compute Nodes (CN) from storage. CNs are stateless and auto-scalable. Data lives on S3/GCS/HDFS. Local disk is used only as Datacache. This is ideal for cloud Kubernetes deployments where compute scales independently of storage.
    

🌐

Open-Source Ecosystem Fit

StarRocks does not operate in isolation — it's most powerful as the query and serving layer in a broader big data platform. Below are the best open-source frameworks for each layer of the data stack.

BATCH PROCESSING

🌊

Apache Spark

Batch ETL Engine

Best fit for large-scale batch transformations (ELT). Use StarRocks Spark Connector to write transformed results directly into StarRocks tables. Spark handles the heavy data shuffling; StarRocks serves the results.

ETL/ELTPySpark

🗂️

dbt (data build tool)

SQL Transformation

dbt-starrocks adapter allows SQL-based data modeling, testing, and documentation directly on StarRocks. Use dbt for in-warehouse ELT transformations with version-controlled models and lineage tracking.

SQL ModelsLineage

📅

Apache Airflow

Workflow Orchestration

Orchestrates batch pipelines: trigger Spark jobs, wait for completion, load into StarRocks, run dbt models. The StarRocksHook connects directly via MySQL protocol. Use DAGs to schedule periodic bulk loads.

SchedulingDAGs

STREAMING

⚡

Apache Flink

Stream Processing

Best pairing for StarRocks. The flink-connector-starrocks uses the StreamLoad protocol to sink data in micro-batches with exactly-once semantics. Flink handles complex CEP, windowing, enrichment; StarRocks stores and serves the output.

Exactly-OnceCEP

📨

Apache Kafka

Message Bus

Kafka acts as the streaming backbone. StarRocks consumes directly from Kafka via Routine Load (built-in consumer, no extra infra). Kafka decouples producers from StarRocks and provides buffering for burst traffic.

Pub/SubRoutine Load

🔄

Debezium + Kafka

CDC (Change Data Capture)

Debezium captures row-level changes from MySQL/PostgreSQL/MongoDB and publishes CDC events to Kafka. Flink or StarRocks Routine Load consumes the CDC stream into StarRocks Primary Key tables for real-time sync.

CDCUpsert

STORAGE / LAKE

🧊

Apache Iceberg

Table Format (Lake)

Preferred lake table format for StarRocks. StarRocks can query Iceberg tables via External Catalog with full partition pruning and predicate pushdown. Supports time-travel queries. Spark/Flink write to Iceberg; StarRocks queries it directly.

ACIDTime-travel

📁

Apache Hive / HDFS

Legacy Data Lake

StarRocks supports Hive Metastore and HDFS natively via Hive External Catalog. Query existing Hive ORC/Parquet tables without migration. Ideal for gradually migrating from Hive to StarRocks as the query engine.

ORC/ParquetFederation

🌊

Apache Hudi

Incremental Table Format

Hudi's MOR (Merge-On-Read) tables provide efficient upsert/delete in the data lake. StarRocks can query Hudi tables via its External Catalog, enabling unified lakehouse analytics across historical Hudi data and real-time StarRocks data.

MOR/COWIncremental

BI / VISUALIZATION

📊

Apache Superset

Open-Source BI

Best open-source BI for StarRocks. Superset connects via MySQL dialect or SQLAlchemy connector. Supports interactive dashboards, SQL Lab ad-hoc queries, alerts, and chart types. Native StarRocks dialect for partition-aware query optimization.

DashboardsSQL Lab

📈

Grafana

Metrics & Monitoring

Grafana connects to StarRocks via MySQL data source plugin. Ideal for operational dashboards and real-time metric monitoring. Works well for time-series metrics stored in StarRocks from Kafka streaming pipelines.

Time-seriesAlerts

🔍

Metabase

Self-serve Analytics

User-friendly self-serve BI tool. Connects via MySQL protocol. Non-technical users can build dashboards with drag-and-drop. Metabase's automatic query generation works well with StarRocks' fast response times.

Self-serveNo-code

CATALOG / GOVERNANCE

🗺️

Apache Atlas

Data Catalog & Lineage

Tracks metadata, data lineage, and governance policies. Integrates with Hive Metastore (shared by StarRocks External Catalog). Enables end-to-end lineage from source systems through StarRocks to BI dashboards.

🔐

Apache Ranger

Security & Access Control

Ranger provides centralized security policy management. StarRocks can integrate with Ranger for row/column-level security policies, RBAC, and audit logging — essential for enterprise PB-scale deployments.

🎯

OpenMetadata

Modern Data Catalog

Modern open-source data catalog with native StarRocks connector. Discovers tables, columns, and generates data quality profiling reports. Enables self-serve data discovery for analysts.

🗺️

Full Big Data Architecture with StarRocks

This is the recommended full-stack open-source big data architecture with StarRocks as the core analytical engine — covering data sources, ingestion, processing, storage, query, and BI layers.

🏛️ Complete Open-Source Big Data Platform Architecture

① Data Sources

🗄️RDBMSMySQL/PG

📱App EventsMobile/Web

📡IoT SensorsDevices

📂File UploadsCSV/JSON

🌐APIs / SaaS3rd Party

↓ Ingestion

② Ingestion Layer

🔄DebeziumCDC

📨Apache KafkaStream Bus

🚀Stream LoadHTTP Direct

📤Broker LoadBatch Files

↓ Processing

③ Processing Layer

⚡Apache FlinkStreaming ETL

🌊Apache SparkBatch ETL

🔧dbtSQL Transform

📅AirflowOrchestration

↓ Store / Serve

④ Storage Layer

🧊Apache IcebergData Lake

☁️S3 / HDFSObject Store

🗂️Apache HiveLegacy Lake

🌊Apache HudiIncremental

↓ Query & Serve

⭐ CORE QUERY ENGINE (StarRocks)

🧠FE NodesCBO + Metadata

⚡BE NodesVectorized Exec

📊Materialized ViewsAuto-rewrite

🌐External CatalogsIceberg/Hive/Delta

↓ Consume

⑥ BI & Consumption Layer

📊Apache SupersetDashboards

📈GrafanaMetrics

💻Jupyter / PythonData Science

🔌APIs / AppsDownstream

↓ Govern

⑦ Governance & Security Layer

🗺️Apache AtlasData Catalog

🔐Apache RangerSecurity

🎯OpenMetadataDiscovery

📋Audit LogsCompliance

      🏆
      Why this architecture wins: StarRocks sits at the center as a unified OLAP engine — serving fresh streaming data from Kafka via Routine Load, querying historical data in Iceberg/Hive via External Catalogs, and serving hundreds of concurrent BI users via Materialized View auto-rewrite. Flink handles complex CEP streaming; Spark handles heavy batch; Airflow orchestrates; Superset/Grafana visualize — all without data silos or extra serving layers.
    

Framework Role Summary

Layer	Framework	Role in Platform	Interface with StarRocks
Batch	Apache Spark	Large-scale data transformation, ML feature prep	spark-connector-starrocks (Sink/Source)
Batch	dbt	SQL-based in-warehouse ELT transformations	dbt-starrocks adapter (MySQL dialect)
Batch	Apache Airflow	Pipeline orchestration, scheduling	StarRocksHook, MySQLOperator
Streaming	Apache Flink	Real-time ETL, CEP, windowed aggregation	flink-connector-starrocks (StreamLoad)
Streaming	Apache Kafka	Message bus, event streaming backbone	Routine Load (native consumer)
Streaming	Debezium	CDC from RDBMS to Kafka	Via Kafka → Routine Load / Flink
Storage	Apache Iceberg	Open table format for data lake	External Catalog (Iceberg Catalog)
Storage	S3 / HDFS	Raw file storage, cold data tier	Broker Load, Shared-data mode
Storage	Apache Hudi	Incremental upsert table format	External Catalog (Hudi Catalog)
Query	StarRocks FE	CBO optimization, metadata, SQL parsing	Core engine
Query	StarRocks BE	Vectorized execution, columnar storage	Core engine
BI	Apache Superset	Self-serve dashboards, SQL Lab	MySQL connector / SQLAlchemy
BI	Grafana	Real-time operational metrics dashboards	MySQL data source plugin
Governance	Apache Atlas	Data lineage and catalog	Via Hive Metastore integration
Governance	Apache Ranger	Centralized security and RBAC	Ranger StarRocks plugin

StarRocks Big Data Platform Design Complete Guide

StarRocks
Big Data Platform Design
Complete Guide