Definition
MinIO is a high-performance, S3-compatible object storage system written in Go. It is designed to run on commodity hardware and cloud infrastructure, storing any amount of unstructured data — from a few gigabytes to exabytes — as flat objects (files) in buckets.
Unlike traditional file systems (POSIX) or block storage, MinIO treats every piece of data — logs, images, ML models, Parquet files, videos — as an object with a key, a value (the data), and metadata. There is no directory hierarchy at the OS level; the flat namespace is infinitely scalable.
S3-Native API
100% compatible with the AWS S3 API. Any application, SDK, or tool built for S3 (Boto3, AWS CLI, Spark, etc.) works with MinIO without code changes.
Speed First
Written in Go for low-latency, high-concurrency IO. Single-binary deployment with zero external dependencies. Saturates 100 GbE NICs on commodity NVMe hardware.
Cloud Native
First-class Kubernetes operator, Helm chart, and Operator Console. Runs on bare metal, VMs, or any K8s distribution — on-premises or in the cloud.
Open Source (AGPL-3)
Community edition is AGPL-3. Commercial license (SSPL) available for closed-source products. Full source code on GitHub with >47 k stars.
Why "object" storage?
Object storage decouples metadata from data, enabling virtually unlimited scale and rich querying — unlike file systems limited by inode counts or block stores limited by block size. Each object gets a globally unique URL (its key), making it addressable across distributed systems without a central namespace server.
1 · Erasure Coding (Reed-Solomon)
MinIO does not use simple replication. Instead, it applies Reed-Solomon erasure coding to split every object into data shards and parity shards across the drives/nodes in an erasure set.
- Default EC:4 — 12 data shards + 4 parity shards across 16 drives.
- Tolerates loss of up to N/2 drives without any data loss.
- Storage overhead is just 33% (vs 200% for 3× replication).
- On read, MinIO reconstructs data from any surviving shard combination.
- Background healing automatically recomputes missing shards when drives return.
# Erasure set = drives used for one coding group # Standard: 16 drives per set (12 data + 4 parity) minio server \ http://minio{1...4}/data{1...4} # 4 nodes × 4 drives = 16 drives / set # Or specify explicitly: MINIO_ERASURE_SET_DRIVE_COUNT=16 # Verify protection level at runtime: mc admin info myminio | grep "EC:"
2 · Inline Bitrot Detection
Every shard is checksummed using HighwayHash-256 at write time. On every read, checksums are verified. Silent data corruption (bitrot) is detected instantly and the corrupted shard is healed from parity — without operator intervention.
3 · Distributed Mode & Server Pools
In production, MinIO runs as a distributed cluster of nodes. All nodes are equal peers — there is no master. The cluster is composed of one or more Server Pools, each a homogeneous group of nodes+drives forming their own erasure sets.
- Horizontal scaling: Add a new Server Pool to expand capacity non-disruptively.
- Object placement: Objects are placed on pools based on available free space (weighted).
- Quorum writes: A PUT requires N/2 + 1 drives to acknowledge before confirming success.
- Read quorum: Only data shards needed — no parity required for reads under normal conditions.
4 · Active-Active Site Replication
For disaster recovery and geo-distribution, MinIO supports Active-Active replication across multiple independent MinIO deployments (sites). Every write to any site propagates to all peers in near real-time via internal queuing.
- All sites remain fully writable — no primary/secondary model.
- Conflict resolution uses last-writer-wins semantics.
- Policies, users, groups, and IAM settings also replicate automatically.
- Typical RPO: <1 second on a 10 GbE WAN link.
Bucket Replication vs. Site Replication
Bucket Replication (S3-compatible) copies objects from one bucket to another, even across vendors (e.g., MinIO → AWS S3). Site Replication replicates the entire namespace including IAM, policies, and all buckets — recommended for DR at PB scale.
Metadata Management
MinIO stores object metadata alongside data as XL meta files within the same erasure set. This eliminates a separate metadata database and keeps metadata access local, reducing latency. For bucket-level metadata and IAM, MinIO uses an internal etcd-free distributed KV store backed by the same drives.
Lifecycle & Tiering
MinIO supports ILM (Information Lifecycle Management) — objects automatically transition between storage tiers (hot NVMe → warm HDD → cold cloud) based on age or access patterns. The remote tier can be another MinIO, AWS S3, GCS, or Azure Blob.
Networking Requirements for PB-Scale
- Minimum: 10 GbE between all nodes in a server pool. Erasure coding requires all drives to be written in parallel — network is often the bottleneck, not disk.
- Recommended: 25 GbE or 100 GbE for high-throughput workloads (ML training data, large-scale ETL).
- Topology: All nodes in a pool should be on the same L2 segment (single rack or spine-leaf) to minimize latency variance.
- Load balancer: Deploy NGINX, HAProxy, or F5 in front of MinIO for health-checking and TLS termination. Use round-robin or least-connection strategies.
Fastest Object Store
Benchmarked at 325 GB/s GET and 165 GB/s PUT on a 32-node NVMe cluster. Outperforms Ceph and all major cloud vendors in raw throughput at equal hardware cost.
Single Binary
The entire MinIO server is one statically compiled binary (~120 MB). No JVM, no package manager, no runtime dependencies. Runs anywhere Go runs — including ARM and s390x.
True S3 Parity
Supports all S3 features: multipart upload, pre-signed URLs, bucket versioning, object locking (WORM), object tagging, lifecycle policies, server-side encryption, and event notifications.
70–90% Cheaper
Running on your own hardware (or spot VMs) vs. AWS S3 for PB-scale workloads typically yields 70–90% cost savings. No egress fees for on-prem deployments.
Enterprise-Grade Security
TLS everywhere, SSE-S3 / SSE-KMS / SSE-C encryption, LDAP/AD integration, OpenID Connect, attribute-based access control (ABAC), and audit logging built-in.
Works with Everything
Native integrations: Apache Spark (Hadoop S3A), Flink, Trino, Hive, Presto, DeltaLake, Apache Iceberg, Hudi, MLflow, Kubeflow, Airflow, dbt, and more.
Kubernetes Native
Official MinIO Operator auto-manages tenant lifecycle, auto-healing, certificate rotation, upgrades, and scaling on any K8s. Operator Console provides a unified management UI.
Prometheus + Grafana
Exposes 100+ Prometheus metrics out of the box. Pre-built Grafana dashboards for throughput, capacity, errors, healing status, and replication lag.
MinIO is best for…
AI/ML training data lakes, data lake-house architectures (Iceberg/Delta), log aggregation, time-series data stores, media/CDN backends, container registry storage (Harbor), backup targets, and any workload needing S3-compatible storage at massive scale without cloud vendor lock-in.
Reference Hardware Config (32 nodes × 32 NVMe)
Benchmark published by MinIO on 32-node cluster, each with dual AMD EPYC, 512 GB RAM, 32× 2 TB NVMe, 2× 100 GbE NICs.
GET Throughput
PUT Throughput
ML/AI Workload Tip
For large model checkpoints and training datasets (10–500 GB objects), use MinIO's multipart upload (128 MB part size) to saturate network bandwidth. Enable MINIO_STORAGE_CLASS_STANDARD=EC:2 for hot training data to reduce parity overhead and maximize IOPS.
Benchmark Tool
Use warp (MinIO's official benchmark tool) to validate your hardware before going to production. It tests GET, PUT, DELETE, and mixed workloads with configurable concurrency and object sizes across your actual cluster topology.
Phase 0 — Hardware Planning
- Nodes: Minimum 4 nodes; recommended 8–32 for PB workloads. Always use multiples of 4.
- Drives per node: 4, 8, or 16 drives. Prefer NVMe for hot; SATA SSD or HDD for warm/cold.
- RAM: 32–128 GB per node. MinIO caches drive metadata in RAM.
- Network: 25 GbE minimum for production; 100 GbE for high-throughput ML/analytics.
- OS: RHEL 8/9, Ubuntu 22.04 LTS, Rocky Linux 9. XFS filesystem on all data drives.
Phase 1 — OS & Disk Preparation
# Format each drive as XFS (faster than ext4 for object workloads) for disk in /dev/nvme{0..3}n1; do mkfs.xfs -L "minio-$(basename $disk)" -f $disk done # Mount with noatime,nodiratime for performance cat >> /etc/fstab <<EOF LABEL=minio-nvme0n1 /data1 xfs defaults,noatime,nodiratime 0 2 LABEL=minio-nvme1n1 /data2 xfs defaults,noatime,nodiratime 0 2 LABEL=minio-nvme2n1 /data3 xfs defaults,noatime,nodiratime 0 2 LABEL=minio-nvme3n1 /data4 xfs defaults,noatime,nodiratime 0 2 EOF mount -a # Verify df -h | grep /data
# Network stack tuning cat >> /etc/sysctl.d/99-minio.conf <<EOF net.core.rmem_max = 67108864 net.core.wmem_max = 67108864 net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 net.core.netdev_max_backlog = 250000 net.ipv4.tcp_congestion_control = bbr vm.swappiness = 1 vm.dirty_ratio = 10 vm.dirty_background_ratio = 5 EOF sysctl -p /etc/sysctl.d/99-minio.conf # Set I/O scheduler to none (pass-through) for NVMe for dev in /sys/block/nvme*/queue/scheduler; do echo none > $dev done
Phase 2 — Install MinIO Binary
# Download latest MinIO (amd64) wget https://dl.min.io/server/minio/release/linux-amd64/minio chmod +x minio mv minio /usr/local/bin/ # Create minio user (never run as root in production) useradd -r -s /sbin/nologin minio-user chown -R minio-user:minio-user /data{1..4} # Install mc (MinIO client) wget https://dl.min.io/client/mc/release/linux-amd64/mc chmod +x mc && mv mc /usr/local/bin/
Phase 3 — Environment Configuration
# Credentials (use Vault or K8s secrets in real deployments) MINIO_ROOT_USER="admin" MINIO_ROOT_PASSWORD="SuperSecretPassw0rd123!" # Cluster topology: 4 nodes × 4 drives = 16 drives (EC:4 parity) MINIO_VOLUMES="https://minio{1...4}.internal:9000/data{1...4}" # API port and console port MINIO_OPTS="--address :9000 --console-address :9001" # TLS (place certs in /etc/minio/certs/) MINIO_CERT_FILE="/etc/minio/certs/public.crt" MINIO_KEY_FILE="/etc/minio/certs/private.key" # Storage class: EC:4 standard, EC:2 reduced redundancy MINIO_STORAGE_CLASS_STANDARD="EC:4" MINIO_STORAGE_CLASS_RRS="EC:2" # Enable compression (transparent, for cold data) MINIO_COMPRESS_ALLOW_ENCRYPTION="on" MINIO_COMPRESS_EXTENSIONS=".log,.txt,.csv,.json" MINIO_COMPRESS_MIME_TYPES="text/plain,application/json" # Prometheus metrics (scrape at :9000/minio/health/metrics) MINIO_PROMETHEUS_AUTH_TYPE="public" # Audit logging (send to Kafka/Elasticsearch) MINIO_AUDIT_WEBHOOK_ENABLE_kafka="on" MINIO_AUDIT_WEBHOOK_ENDPOINT_kafka="http://kafka:9092/minio-audit"
[Unit] Description=MinIO Object Storage After=network-online.target Wants=network-online.target [Service] WorkingDirectory=/usr/local EnvironmentFile=/etc/default/minio ExecStart=/usr/local/bin/minio server $MINIO_OPTS $MINIO_VOLUMES User=minio-user Group=minio-user Restart=always RestartSec=5s LimitNOFILE=1048576 TasksMax=infinity TimeoutStopSec=120 SendSIGKILL=no [Install] WantedBy=multi-user.target
# Enable and start on ALL nodes systemctl daemon-reload systemctl enable --now minio # Verify cluster health mc alias set myminio https://minio1.internal:9000 admin SuperSecretPassw0rd123! mc admin info myminio # Expected output excerpt: # Servers: 4 Drives: 16 Online: 16 Offline: 0 # Status: 16 online, 0 offline drives # Used: 0 B / 64 TB total
Phase 4 — Multi-Site Replication (DR)
# Register aliases for both sites mc alias set site-a https://minio-site-a.internal:9000 admin pass1 mc alias set site-b https://minio-site-b.internal:9000 admin pass2 # Enable site replication (run once from either site) mc admin replicate add site-a site-b # Verify replication status mc admin replicate info site-a # Add a third DR site later: mc admin replicate add site-a site-b site-c
Phase 5 — Kubernetes Deployment (Operator)
# Install MinIO Operator helm repo add minio-operator https://operator.min.io helm install --namespace minio-operator --create-namespace \ operator minio-operator/operator # Deploy a MinIO tenant (4 servers × 4 drives) helm install --namespace minio-tenant --create-namespace \ tenant minio-operator/tenant \ --set "tenant.pools[0].servers=4" \ --set "tenant.pools[0].volumesPerServer=4" \ --set "tenant.pools[0].size=2Ti" \ --set "tenant.pools[0].storageClassName=local-nvme"
Hardware → XFS format drives, tune OS kernel (sysctl, scheduler)
Foundation phase: every data drive formatted as XFS with noatime, network stack tuned for large transfers, I/O scheduler set to none for NVMe.
Install MinIO binary + configure /etc/default/minio
Single binary, no package dependencies. Configure MINIO_VOLUMES with your node expansion syntax to define the erasure set topology.
TLS everywhere + load balancer
Use Let's Encrypt or internal CA. Place NGINX/HAProxy in front for client-facing TLS termination and health-check routing. MinIO nodes communicate over TLS internally.
Observability: Prometheus + Grafana + alerting
Scrape /minio/v2/metrics/cluster. Import MinIO's official Grafana dashboards (IDs: 13502, 15305). Set alerts on drive offline, healing rate, and replication lag.
Multi-site replication for DR
Enable mc admin replicate add across geographically separated sites. Test failover quarterly by simulating a site outage and verifying zero data loss.
ILM policies + storage tiering
Set up lifecycle rules to move cold data to HDD or cloud (AWS S3, GCS) after N days. This keeps hot NVMe headroom ≥ 20% for write performance.
- ✓ 325 GB/s GET throughput
- ✓ Full S3 API parity
- ✓ Single binary, easy ops
- ✓ Active-active replication
- ✓ Kubernetes native
- ✓ Inline bitrot protection
- ✓ Open source (AGPL)
- ✓ Sub-ms metadata latency
- ✓ S3-compatible (RGW)
- ✓ Block + file + object
- ✗ Complex to operate
- ✗ Lower raw throughput
- ✗ Large footprint (many daemons)
- ✓ Strong community
- ✗ Slow metadata (RADOS)
- ✗ Hard K8s integration
- ✓ The S3 API standard
- ✓ Global availability
- ✗ Expensive at PB scale
- ✗ High egress fees
- ✗ No on-prem option
- ✗ Vendor lock-in
- ✓ Managed (zero ops)
- ✓ Massive ecosystem
| Feature | MinIO | Ceph RGW | AWS S3 | HDFS |
|---|---|---|---|---|
| S3 API | ✓ Full parity | ✓ Most features | ✓ Native | ✗ Not S3 |
| Max throughput | 325 GB/s | ~100 GB/s | ~50 GB/s* | ~150 GB/s |
| Operational complexity | Low (1 binary) | Very High | None (managed) | High (NN) |
| On-premises | ✓ | ✓ | ✗ | ✓ |
| Kubernetes Native | ✓ Operator | Partial (Rook) | ✓ EKS | ✗ |
| Iceberg / Delta | ✓ Native | ✓ Via S3 | ✓ Native | ✓ Partial |
| Egress cost @ 1 PB/mo | $0 (on-prem) | $0 (on-prem) | ~$90,000 | $0 |
* AWS S3 throughput is per-prefix limited; aggregate across prefixes is higher.
Keep free space ≥ 20%
MinIO write performance degrades sharply above 80% capacity. Use ILM tiering rules to automatically push cold data to cheaper storage before hitting this threshold.
Dedicated storage VLAN
Isolate MinIO inter-node traffic from client traffic using separate NICs or VLANs. This prevents noisy-neighbor bandwidth contention on shared 10 GbE switches.
IAM per service account
Never share root credentials. Create a dedicated MinIO service account per application with least-privilege bucket policies. Rotate credentials via Vault or K8s Secrets.
Test healing regularly
Run mc admin heal --recursive monthly. Simulate drive failures in staging to measure MTTR and validate that parity reconstruction keeps pace with load.
Use server pools, not drive expansion
When adding capacity, add full Server Pools (new nodes + drives) rather than adding drives to existing nodes. Pools are the safe, non-disruptive scale-out unit.
S3 multipart for large objects
For objects > 128 MB, always use multipart upload (128–256 MB parts). This enables parallel upload across drives, dramatically increasing throughput for ML datasets and backups.
Encryption Strategy for PB Deployments
Use SSE-KMS (Server-Side Encryption with KMS) backed by HashiCorp Vault or AWS KMS. Every object gets a unique data encryption key (DEK) derived from a master key — so compromising one object never exposes others. Enable it by setting MINIO_KMS_KES_ENDPOINT and MINIO_KMS_KES_KEY_NAME environment variables.