The Lakehouse Query Engine — how it works, why it's powerful, how to deploy it at petabyte scale, and how to architect a complete modern data platform around it.
Dremio is an open-source, high-performance lakehouse query engine that lets you run blazing-fast SQL queries directly on your data lake — no ETL, no data movement, no proprietary silos.
Dremio is a distributed SQL query engine that sits on top of your data lake (S3, ADLS, GCS, HDFS) and object storage. It uses Apache Arrow for in-memory columnar processing and Apache Iceberg as the open table format — giving you data warehouse performance without a data warehouse.
Traditional ETL pipelines are slow, expensive, and create data drift. Dremio eliminates the "copy data to analyze it" anti-pattern. You connect Dremio to your existing data lake, define virtual datasets, and BI tools connect directly — as if it were a high-speed data warehouse.
Founded in 2015 by former MapR and Hortonworks engineers. The core query engine is Apache Arrow Flight SQL based. Community edition is open-source (Apache 2.0). Enterprise edition adds security, governance, and managed cloud offerings (Dremio Cloud).
Dremio's Reflections system (intelligent, automatic, transparent acceleration materialized views) combined with a vectorized execution engine means queries that take minutes on Spark/Presto finish in sub-second response times for BI workloads without any caching complexity for end users.
One-liner: Dremio is to your data lake what Snowflake is to a cloud data warehouse — but open, cheaper, and you control your data. It's a SQL interface over your existing lake, with acceleration magic built in.
Dremio's query engine is a distributed, multi-stage pipeline from SQL parse to columnar result delivery. Understanding the flow helps you tune it correctly.
Tableau · Power BI · Looker · Python · JDBC/ODBC · Arrow Flight
SQL Parse → Semantic Analysis → Logical Plan → Physical Plan → Fragment Distribution
Auto-match queries to pre-built accelerations (Raw / Agg Reflections)
Vectorized Arrow execution · Local spill · Fragment work
Vectorized Arrow execution · Local spill · Fragment work
Vectorized Arrow execution · Local spill · Fragment work
S3 · ADLS · GCS · HDFS · Delta Lake · Iceberg · Hudi · Hive · PostgreSQL · MongoDB · Kafka
Parquet · ORC · Avro · JSON · CSV · Iceberg Tables · Delta Tables
The Coordinator receives the SQL query. Calcite-based parser validates syntax, resolves table/column names against the Metadata store (Nessie catalog or internal), and checks permissions.
The query is converted to a relational algebra logical plan, then the cost-based optimizer (CBO) produces an optimized physical plan. It decides join ordering, aggregation pushdowns, partition pruning, and predicate pushdown to the source.
Before execution, the engine checks if any Reflection (pre-computed materialized view) can satisfy or partially satisfy the query. If yes, it rewrites the plan to use the acceleration instead of the raw data — transparently, without changing the SQL.
The physical plan is split into query fragments. The coordinator assigns fragments to executor nodes. Each executor node scans its assigned data splits (S3 file ranges / Iceberg manifest entries) in parallel using Apache Arrow columnar vectorized processing.
Intermediate results flow between executor nodes using Arrow Flight RPC for zero-copy data exchange (no serialization overhead). Hash joins, merge-sort, and aggregations happen in Arrow memory buffers.
Final results stream back to the coordinator and are returned to the client via Arrow Flight SQL, JDBC, ODBC, or REST. BI tools receive columnar data natively — Tableau and Power BI get sub-second responses even on billions of rows when Reflections are active.
Dremio's strength comes from a combination of architectural innovations that work together to deliver warehouse-speed performance on open lake storage.
The entire execution pipeline is built on Apache Arrow's in-memory columnar format. SIMD vectorized operations process thousands of rows per CPU clock. No row-by-row iteration. No Java object overhead. Data stays in Arrow format from scan to result delivery.
Reflections are like automatic materialized views. Administrators define them; queries automatically rewrite to use them — even partial matches work. Raw Reflections (sorted/partitioned pre-reads) and Aggregation Reflections (pre-computed rollups) reduce query time from minutes to milliseconds.
Dremio is built on Apache Iceberg, Apache Arrow, Apache Parquet, and Nessie (Git for Data). Your data stays in open formats on your own storage. Swap Dremio out anytime; your data is always yours and readable by any tool.
With Apache Iceberg support, Dremio supports INSERT, UPDATE, DELETE, MERGE INTO — full ACID transactions directly on S3/GCS/ADLS. No need for a traditional database for mutable data. Time-travel queries (AT SNAPSHOT, AT BRANCH) are native.
Project Nessie (open-source, integrated by default) provides Git-like branching for your data catalog. Create a branch, run ETL, test data quality, then merge to production — all without affecting live queries. Supports multi-table transactions across Iceberg tables.
Dremio's Virtual Datasets and Spaces act as a built-in semantic layer. Business users query curated, governed, documented virtual tables. Column masking, row-level security, and data lineage are all native — no separate semantic layer product needed.
| Feature | Dremio | Snowflake | Presto / Trino | Spark SQL | BigQuery |
|---|---|---|---|---|---|
| Open Source | Community Edition | Closed | Fully OSS | Fully OSS | Closed |
| Data on Your Storage | Yes — S3/GCS/ADLS | Snowflake-managed | Yes | Yes | GCS-only |
| BI Query Speed | Sub-second (Reflections) | Fast | Moderate | Slow for BI | Fast |
| Iceberg Support | Native + DML | Good | Good | Via catalog | Limited |
| Transparent Acceleration | Reflections (unique) | Auto-clustering | Manual only | No | BI Engine |
| Cost Model | Free OSS / flat license | Credit-based (expensive) | Free OSS | Free OSS | Pay per query |
| Git-for-Data | Nessie built-in | No | No | No | No |
Dremio's engine is a carefully assembled stack of open-source components, each chosen for performance and interoperability.
All in-memory data is stored in Arrow columnar format. Arrow Flight RPC handles zero-copy data transfer between nodes. Arrow's C++ libraries power the vectorized execution kernels (SIMD-accelerated).
Dremio extends Apache Calcite for SQL parsing, validation, logical planning, and cost-based optimization. The planner supports predicate pushdown, column pruning, join reordering, and subquery elimination.
Iceberg provides ACID transactions, schema evolution, time-travel, partition evolution, and row-level deletes on object storage. Dremio uses Iceberg as the primary format for managed tables and Reflections.
Nessie is a transactional catalog with Git-like branching semantics. It stores table metadata, schema versions, and snapshot pointers. Enables data experimentation without risk to production.
Primary on-disk file format. Parquet's columnar layout + compression (Snappy/Zstd) + predicate pushdown statistics make it ideal for analytical queries. Dremio also supports ORC, Avro, JSON, CSV.
Dremio uses Gandiva (part of Arrow) to JIT-compile expression trees to native LLVM code at runtime. Filter expressions, projections, and aggregations are compiled to CPU-native code — faster than interpreted evaluation.
A Raw Reflection stores a physically re-sorted, re-partitioned, columnar copy of a dataset. When a query matches, Dremio reads from this pre-optimized version instead of the original files. Example: a 500GB Parquet dataset sorted by customer_id makes customer-based queries 100× faster.
An Aggregation Reflection pre-computes GROUP BY rollups. When a BI tool queries total sales by region/month, Dremio matches it to a pre-computed aggregate — returning results in <1s from a table of billions of rows. The reflection updates on a refresh schedule or triggered by new data arrivals.
Reflection Matching: Uses substitution rules + constraint-based rewriting. Even a partial match (e.g., reflection has 90% of the needed columns) is used with a join to raw data for the missing 10%. Users never know — the SQL stays unchanged.
You can define a Reflection that points to an existing Iceberg table or Parquet dataset built by Spark or dbt. Dremio reuses your existing transformation outputs as acceleration sources — no duplicating ETL work.
Running Dremio in production with PB-scale data requires careful cluster sizing, Kubernetes orchestration, storage strategy, and tuning. Here is a battle-tested blueprint.
PB-scale strategy: Keep data in S3/GCS/ADLS as Iceberg tables in Parquet format. Dremio executors are compute — they scale horizontally and are stateless. Never put PB of data on executor disks. Use NVMe SSD on executors for spill-to-disk and Reflection caching only.
| Role | Count | CPU | RAM | Storage | Purpose |
|---|---|---|---|---|---|
| Coordinator | 2 (HA) | 16 vCPU | 128 GB | 500 GB SSD | Query planning, metadata, Reflection scheduling, HA failover |
| Executor (Standard) | 20–50 | 32 vCPU | 256 GB | 2 TB NVMe SSD | Query execution, Arrow vectorized processing, spill-to-disk |
| Executor (Reflection) | 5–10 | 32 vCPU | 256 GB | 4 TB NVMe SSD | Dedicated Reflection refresh processing (separate engine pools) |
| Zookeeper | 3 | 4 vCPU | 16 GB | 100 GB SSD | Coordinator HA, distributed coordination |
| Nessie / Catalog | 3 | 8 vCPU | 32 GB | 500 GB SSD + PostgreSQL | Iceberg catalog, table metadata, branching |
# dremio-values.yaml — Production PB-scale configuration coordinator: count: 2 cpu: "16" memory: "131072" # 128 GB in MB volumeClaimStorageClass: "fast-ssd" volumeSize: "500Gi" web: port: 9047 executor: count: 30 cpu: "32" memory: "262144" # 256 GB in MB volumeClaimStorageClass: "nvme-local" volumeSize: "2Ti" engineOverride: name: "default" distStorage: type: "aws" # or gcs / azureStorage aws: bucketName: "dremio-dist-storage-prod" path: "/dremio" region: "ap-southeast-1" zookeeper: count: 3 cpu: "4" memory: "16384" image: repository: "dremio/dremio-oss" tag: "25.0.0"
# /opt/dremio/conf/dremio.conf paths: { local: "/var/lib/dremio" dist: "s3a://dremio-dist-storage-prod/dremio" } services: { coordinator.enabled: true executor.enabled: true executor: { # Memory allocation — 80% of heap to Arrow memory pool memory.reserve.in.bytes: 4294967296 # 4 GB OS reserve heap.memory.percent: 20 # 20% JVM heap direct.memory.percent: 80 # 80% Arrow off-heap # NVMe spill path for large joins / aggregations spill.disk: "/mnt/nvme/dremio-spill" } } services.fabric.memory.jvm: 8192 # MB # S3 parallelism — critical for PB data scanning store.s3.threadpool.size: 64 store.s3.multipart.threshold: 67108864 # 64 MB # Iceberg manifest caching iceberg.manifest.cache.size: 10000
Use hidden partitioning on Iceberg tables — partition by days(event_time), bucket(customer_id, 256). Dremio prunes partitions automatically. A PB table partitioned well means most queries scan <0.1% of data.
Small files (many <10 MB) destroy S3 throughput. Run periodic Iceberg compaction (via Spark or Dremio's OPTIMIZE TABLE) to merge small files. Dremio reads PB efficiently when files are 256 MB–1 GB each.
Profile your top 20 BI queries. Create Aggregation Reflections pre-computing those rollups. Set refresh schedules tied to your data arrival SLA. Use Incremental Refresh for Reflections on append-only Iceberg tables.
Create separate executor engine pools: one for interactive BI queries (smaller, fast-response), one for Reflection refresh jobs (larger, batch-friendly), one for data scientists (Spark-like heavy scans). Use WLM rules to route workloads.
Expose Dremio metrics via Prometheus JMX exporter. Dashboards in Grafana for query latency, executor memory, Reflection freshness, S3 request rates. Alert on high spill-to-disk (indicates memory pressure) and stale Reflections.
Enable TLS on all endpoints (web UI, JDBC, Arrow Flight). Integrate with LDAP/SAML/OAuth2 for SSO. Use column masking and row-level security policies via CREATE POLICY. IAM role-based S3 access — no static credentials.
Dremio shines brightest in the Lakehouse Architecture — specifically the Open Lakehouse pattern built on Apache Iceberg. Here is how to structure the full system.
Raw ingestion. Original data as-is. Iceberg tables, append-only. Schema-on-read. Full history retained. Never overwritten.
Cleansed, deduplicated, validated, conformed. Iceberg with MERGE INTO for upserts. Column-typed, nulls handled, PII masked.
Business-domain aggregates. Star schema / wide tables. Optimized for BI queries. Dremio Reflections built on Gold + Silver. Exposed via Virtual Datasets.
Virtual Datasets · Reflections · SQL Engine · Semantic Layer · Access Control · Arrow Flight
S3 / GCS / ADLS — Parquet + Iceberg. Nessie catalog. Unlimited scale. $0.023/GB/month.
Best For Dremio: Lakehouse with Iceberg + Medallion + Nessie. Dremio becomes the single query interface across all layers, eliminating dedicated data marts.
Data Modeling: Use wide denormalized tables in Gold layer (Kimball-inspired). Avoid too many joins — Dremio handles them, but pre-joining in Gold + Reflections gives best BI speed.
Dremio is the query & serving layer — but a complete big data platform needs tools for ingestion, streaming, transformation, cataloging, orchestration, and BI. Here is the best-fit open-source stack.
Batch-load structured & semi-structured data into the Bronze Iceberg layer.
Stream events into Iceberg Bronze tables with low-latency micro-batch or true streaming.
Transform, clean, aggregate, and model data across the medallion layers.
Object storage with open ACID table formats — the foundation of the lakehouse.
Track table metadata, lineage, schemas, and policies. Nessie is Dremio's native catalog; supplement with external tools for broader governance.
Orchestrate batch Spark jobs, dbt runs, Flink deployments, and Reflection refresh triggers.
SQL on everything. Reflections for BI acceleration. Arrow Flight for fast client delivery. Virtual Datasets as semantic layer. Multi-source federation.
Connect BI tools directly to Dremio via JDBC/ODBC or Arrow Flight. Reflections ensure sub-second dashboard loads.
A production-ready, fully open-source big data platform with Dremio at the center — covering ingestion, streaming, transformation, storage, query, catalog, orchestration, and BI.
| Layer | Function | Recommended Tools | Notes |
|---|---|---|---|
| Ingest | Batch loading from RDBMS/files | Airbyte, Apache NiFi, Sqoop | Load to Bronze Iceberg tables via Spark |
| Stream | Real-time event ingestion | Kafka + Flink, Debezium CDC | Flink writes Iceberg with exactly-once |
| Transform | Bronze → Silver → Gold ETL | Apache Spark, dbt (SQL) | Spark for heavy compute; dbt for SQL transforms on Gold |
| Storage | Lake storage + table format | S3/MinIO + Apache Iceberg + Parquet | Iceberg = ACID, time-travel, schema evolution |
| Catalog | Metadata + branching + lineage | Nessie + OpenMetadata / DataHub | Nessie = native Dremio catalog; OpenMetadata for governance UI |
| Orchestrate | DAG scheduling & workflow | Apache Airflow, Dagster | Trigger Spark jobs, dbt runs, Reflection refreshes |
| Query | SQL engine + acceleration + federation | Dremio | Reflections for BI speed; Arrow Flight for Python/BI clients |
| BI / Viz | Dashboards & reports | Apache Superset, Metabase, Redash | Connect via Dremio JDBC or Arrow Flight; Reflections = sub-second |
Why this stack wins: Every component is open-source and Apache-licensed. Data never leaves your storage. Swappable at every layer. Dremio sits at the center as the intelligence & serving layer — consuming Iceberg tables produced by Spark/Flink, and serving BI tools with Reflection-accelerated speed. Total TCO is a fraction of Snowflake or Databricks.
Dremio + Apache Iceberg + Kafka + Flink + Spark + Airflow + Superset is the most powerful, fully open-source big data platform stack available today — scalable from GB to PB with no vendor lock-in.