📊 Big Data Architecture · PB-Scale Analysis

Amazon S3 vs Apache HDFS

A comprehensive technical comparison for architects and engineers designing petabyte-scale data systems

Object Storage Distributed File System Cloud-Native vs On-Premises Petabyte-Scale Systems

System Overview

☁️ Amazon S3

Simple Storage Service

Amazon S3 is a massively scalable object storage service launched in 2006. It decouples compute from storage, enabling independent scaling of each. S3 stores data as flat objects with metadata, accessible via HTTP REST APIs. At petabyte scale, S3 becomes the backbone of nearly every cloud-native data lakehouse architecture, with durability guarantees of 99.999999999% (11 nines) achieved through automatic replication across ≥3 AZs.

Scale Ceiling
11 9sDurability
~200msTypical Latency
Pay-goCost Model
🏗️ Apache HDFS

Hadoop Distributed File System

HDFS is the fault-tolerant, distributed file system at the core of the Hadoop ecosystem, designed in 2006 inspired by Google's GFS paper. It stores data in large blocks (128MB–256MB) distributed across DataNodes with a centralized NameNode for metadata. HDFS co-locates compute and storage (data locality), enabling high-throughput sequential reads ideal for large-scale batch processing workloads. It operates best in managed cluster environments.

10s PBPractical Scale
Replication Factor
~1msLocal Latency
CapExCost Model
💡

Context: At petabyte scale, the choice between S3 and HDFS is rarely binary. Modern architectures increasingly adopt a hybrid or lakehouse model — storing cold/warm data in S3 while using HDFS-compatible APIs (via EMR, Databricks, or open-source clusters) for hot compute-intensive workloads. The decision hinges on access patterns, cost structures, team capability, and cloud strategy.

Architectural Deep Dive

☁️ S3 Architecture

🌐  Client / Application Layer (SDK, REST API, S3 Select)
🔐  IAM / Bucket Policies / ACLs / S3 Object Lock
🗂️  Bucket + Prefix Namespace (Flat Object Model)
💾  Distributed Object Store (AWS-managed, opaque)
🔄  Cross-AZ Replication (≥3 zones automatically)
🧊  Tiering: S3 Standard → IA → Glacier → Deep Archive
⚠️

Key trait: Eventual consistency was the default until 2020. Since Dec 2020, S3 offers strong read-after-write consistency for all operations — a major architectural improvement for stream and incremental workloads.

🏗️ HDFS Architecture

💻  Client Layer (HDFS Client, WebHDFS, HttpFS)
🧠  NameNode (Master) — Metadata & Namespace FSImage + EditLog
🔁  Secondary NameNode / Standby HA NameNode (ZKFC)
🗃️  DataNodes — Block Storage (128MB/256MB blocks × 3 replicas)
📡  Block Reports + Heartbeats (3s interval) → NameNode
⚡  Data Locality Scheduler (Rack-aware pipeline writes)
🔑

Key trait: The NameNode is a single point of metadata bottleneck. In large deployments, the in-memory metadata (~150 bytes/file) limits namespace to ~500M–1B files without HDFS Federation. HA NameNode (ZKFC + JournalNodes) mitigates SPOF but not the memory ceiling.

Feature-by-Feature Comparison

Dimension ☁️ Amazon S3 🏗️ Apache HDFS
Storage Model Object Store Flat key-value namespace; no real directories. Prefix-based hierarchy simulation. Objects up to 5TB each. Block Store POSIX-like hierarchical file system. Files split into 128/256MB blocks distributed across DataNodes.
Scalability Virtually Unlimited No capacity ceiling. AWS manages sharding transparently. Scales to exabytes with zero operator effort. Bounded by NameNode Memory-limited metadata (~150B/file). HDFS Federation partially alleviates this. Practical upper bound: ~1B files per NameNode.
Durability 99.999999999% 11 nines. Auto-replication across ≥3 AZs. S3 One Zone-IA: 99.999999999% within single AZ. Configurable Default 3× replication. Erasure Coding (EC) available (3+2 / 6+3). EC reduces storage overhead by ~50% vs triple replication.
Consistency Strong (since 2020) Strong read-after-write consistency for PUTs, DELETEs, and list operations on all objects. Strong (POSIX-like) Native strong consistency for all operations. HDFS leases provide exclusive write locks per file.
Latency Higher ~100–500ms per GET (first byte). High TPS throughput compensates. S3 Express One Zone: ~10ms (newer). Lower Local reads: sub-millisecond to low-ms. Rack-local reads: 1–5ms. Network hop reads: 5–30ms. Ideal for shuffle-heavy jobs.
Throughput Massive Per-prefix: 5,500 GET/s, 3,500 PUT/s. Multiple prefixes multiply this. S3 Transfer Acceleration via CloudFront edge. High (sequential) Optimized for sequential large-block reads (MapReduce, Spark). Random I/O is poor. DataNode bandwidth-bound.
Data Locality None Compute and storage are fully decoupled. Network I/O is always required. Mitigated by co-locating EC2/EMR in same AZ. Native Rack-aware scheduling. MapReduce/YARN schedules tasks on nodes holding data blocks. Reduces network I/O significantly.
Metadata Operations Slow LIST is expensive and eventually consistent (up to 1000 keys/call). S3 Inventory + Athena recommended for large namespaces. No rename (copy+delete). Fast NameNode holds all metadata in memory. Rename/move is atomic O(1). HDFS supports snapshots natively.
ACID / Transactions Via Table Formats Native S3 has no ACID. Delta Lake, Apache Iceberg, Apache Hudi add ACID on top with optimistic concurrency + transaction logs. Limited File-level append-only. ACID requires table format layers (Iceberg, Hudi) just like S3. Hive ACID on ORC supported.
Security Comprehensive IAM, bucket policies, ACLs, KMS encryption at rest, SSE-S3/SSE-KMS/SSE-C, S3 Object Lock (WORM), Macie, VPC endpoints. Kerberos-based Kerberos authentication, HDFS encryption zones (KMS), POSIX permissions, Ranger/Sentry for fine-grained policies.
Cost Structure OpEx / Pay-go Storage: ~$0.023/GB-month (Standard). Egress expensive (~$0.09/GB). Requests billed. No upfront investment. Lifecycle policies automate tiering. CapEx Heavy Large upfront hardware investment. Ongoing power, cooling, rack space, ops staff costs. 3× replication = 3× raw capacity needed. EC reduces this to ~1.5×.
Operational Burden Minimal Fully managed AWS service. No cluster to operate. Auto-scaling, patching, and availability handled. SLA 99.99% availability. High Requires dedicated ops team. NameNode HA, DataNode expansion, rebalancing, decommissioning, upgrades are non-trivial at PB scale.
Append / Streaming Limited Multipart upload (MPU) for large objects. No true append semantics. Kinesis Firehose or MSK buffers streaming writes before S3 sink. Native Supports append (hflush/hsync). HBase WAL uses HDFS append. Kafka log segments on HDFS. Direct streaming write support.
Multi-Cloud / Portability Vendor Lock-in Risk S3 API is de facto standard (GCS/Azure Blob have S3 compatibility). But native features (Glacier, Lambda triggers) are AWS-specific. Open Source Apache-licensed. Runs on any commodity hardware, any cloud (via EMR, HDInsight, Dataproc). No vendor lock-in.

Performance Analysis

☁️ S3 — Relative Performance Scores

Sequential Read Throughput85/100
Random I/O55/100
Metadata Operations (LIST)30/100
Concurrent Access / TPS95/100
First-Byte Latency45/100
Scalability Ceiling100/100

🏗️ HDFS — Relative Performance Scores

Sequential Read Throughput90/100
Random I/O40/100
Metadata Operations85/100
Concurrent Access / TPS65/100
First-Byte Latency92/100
Scalability Ceiling70/100
📈

PB-Scale Performance Note: At petabyte scale with 10,000+ concurrent Spark tasks, S3's aggregate throughput (5,500 GET/s per prefix × N prefixes) often exceeds HDFS total cluster bandwidth. However, S3 network egress costs and inter-request latency become dominant cost/latency factors. Solutions: S3 Express One Zone (~10ms latency), partition-aware prefix design, and intelligent tiering via S3 Intelligent-Tiering.

Strengths & Weaknesses

☁️ Amazon S3

✅ Strengths
🚀
Infinite horizontal scalability — No storage capacity planning. Grows seamlessly from GB to EB. AWS provisions capacity on demand, transparently.
💰
OpEx economics at scale — Zero capital expenditure. Lifecycle policies auto-tier cold data to Glacier Deep Archive ($0.00099/GB-month). Total cost drops dramatically vs HDFS at lower data temperatures.
🔌
Ecosystem integration — Native connectors for EMR, Glue, Athena, Redshift Spectrum, SageMaker, Lambda, EventBridge. Enables serverless, event-driven architectures.
🔒
Enterprise security — IAM, KMS, WORM, Macie, VPC endpoints, Object Lock. Meets HIPAA, PCI-DSS, SOC 2, FedRAMP, ISO 27001 requirements out of the box.
🌎
Global multi-region replication — Cross-Region Replication (CRR) for disaster recovery. S3 Multi-Region Access Points for latency-optimized global reads.
⚙️
Zero operational overhead — No cluster management. Engineers focus on data products, not infrastructure plumbing. SLA of 99.99% availability guaranteed by AWS.
❌ Weaknesses
⏱️
High first-byte latency — 100–500ms typical GET latency makes S3 unsuitable for real-time, sub-second query engines without a caching layer (ElastiCache, FSx, Alluxio).
💸
Egress cost shock — Data egress to internet: ~$0.09/GB. At PB scale this is catastrophic. Mitigation: S3 Transfer Acceleration, VPC endpoints, same-region services.
📂
Expensive and slow LIST operations — Listing millions of objects is slow and billed per 1,000 API calls. Rename is impossible (copy+delete). Small-file problem amplified.
🔗
No data locality — Every compute job crosses the network. Spark on EMR reading S3 pays network I/O costs on every shuffle and read, degrading performance vs local HDFS.
🏢
Vendor lock-in risk — AWS-specific features (Glacier, Lambda integration, Object Lock) create dependency. Migrating PBs off S3 is costly and time-consuming.
✏️
No true POSIX append semantics — Streaming writes require workarounds (MPU, Kinesis buffering). WAL-based systems (HBase, Kafka) cannot use S3 directly without abstraction.

🏗️ Apache HDFS

✅ Strengths
Data locality = zero network I/O — YARN schedules Spark/MapReduce tasks on nodes co-located with data blocks. Eliminates network bottleneck for shuffle-heavy workloads. Massive throughput gains.
📝
Native append & streaming — Supports hflush/hsync append semantics. HBase Write-Ahead Log (WAL), Kafka log segments, and Flink checkpoints can write directly to HDFS reliably.
📁
Fast metadata operations — NameNode holds full namespace in RAM. Rename/move is O(1) atomic. Snapshots, access time tracking, and POSIX-like ops work natively.
🆓
No egress cost — Internal cluster I/O has no per-GB billing. At PB scale with heavy compute, this can mean millions of dollars saved vs S3 egress.
🔓
Open-source & vendor-neutral — Apache-licensed. Runs on commodity hardware, bare metal, or any cloud. No vendor dependency. Full control over data and cluster configuration.
📉
Erasure Coding for storage efficiency — EC (e.g., Reed-Solomon 6+3) reduces storage overhead to ~1.5× vs 3× replication. Significant savings at PB scale.
❌ Weaknesses
🧠
NameNode memory bottleneck — All metadata lives in NameNode RAM (~150 bytes/file). 1 billion files ≈ 150GB RAM. HDFS Federation required for very large namespaces, adding operational complexity.
💸
High CapEx & OpEx — Requires significant hardware investment. Dedicated ops team for cluster management, capacity planning, upgrades. Total cost often exceeds S3 for infrequently accessed data.
📦
Small-file problem is severe — Millions of small files saturate the NameNode and degrade performance dramatically. Requires periodic compaction (CombineFileInputFormat, Avro/ORC compaction jobs).
📉
Poor elasticity — Adding/removing DataNodes triggers rebalancing (hdfs balancer), which is slow and resource-intensive. Cannot scale compute independently from storage.
🌐
No multi-region durability — HDFS replication is rack-aware within a cluster but not multi-datacenter by default. Disaster recovery requires DistCp-based mirroring (NameNode Federation across DCs).
☁️
Cloud-native integration friction — Integrating HDFS with cloud-native services (serverless compute, managed ML) requires significant engineering. S3A connector is the standard bridge but adds latency.

Recommended Frameworks

☁️ Frameworks Optimized for S3

Apache Spark (EMR / Databricks)

Batch + Streaming + ML

The dominant S3 compute engine. Spark's S3A connector (with magic committer) enables efficient large-scale reads/writes. AWS EMR uses EMRFS for S3 optimizations including consistent view and optimized list operations. Databricks AutoOptimize auto-compacts small files on S3.

Batch Streaming ML
# Spark S3 config best practices spark.conf.set("fs.s3a.committer.magic.enabled", "true") spark.conf.set("fs.s3a.fast.upload", "true") spark.conf.set("fs.s3a.multipart.size", "128M")
🗃️

Apache Iceberg

Open Table Format (ACID on S3)

The recommended table format for S3-based lakehouses. Iceberg adds hidden partitioning, schema evolution, time-travel, ACID transactions, and partition pruning on top of S3 object storage. Netflix, Apple, and LinkedIn use Iceberg at exabyte scale on S3. Works seamlessly with Spark, Flink, Trino, and Athena.

ACID Time Travel Schema Evolve
-- Iceberg ACID merge on S3 MERGE INTO s3_catalog.orders t USING updates u ON t.id = u.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *;
🔍

Amazon Athena + Glue

Serverless SQL on S3

Athena enables serverless SQL directly on S3 data with zero cluster management. Glue Data Catalog acts as the Hive Metastore. Athena v3 (Trino-based) supports Iceberg, Hudi, and Delta Lake natively. At $5/TB scanned, partition pruning via columnar formats (Parquet/ORC) is essential for cost control.

Serverless Pay-per-query
-- Optimize Athena cost: partition pruning SELECT user_id, SUM(revenue) FROM events WHERE dt = '2024-01-15' -- partition prune AND region = 'us-east-1' GROUP BY 1;
🌊

Apache Flink + Kinesis

Real-time Streaming → S3 Sink

Flink's StreamingFileSink (now FileSink) writes exactly-once to S3 via Kinesis Data Firehose or direct S3 sink. Micro-batch compaction on S3 with Flink + Iceberg enables near-real-time lakehouse. Kinesis Firehose buffers records and batches into S3, optimizing for Parquet/ORC format conversion.

Exactly-Once Streaming
🦆

DuckDB + Trino / Presto

Interactive SQL on S3

DuckDB can query Parquet files on S3 directly from a laptop — ideal for data exploration at scale without spinning up clusters. Trino (formerly PrestoSQL) enables distributed interactive SQL on S3 at PB scale with sub-minute query times on Parquet/ORC Iceberg tables.

Interactive Ad-hoc SQL
-- DuckDB: query S3 Parquet directly SELECT * FROM read_parquet( 's3://bucket/data/**/*.parquet' ) WHERE year = 2024;
🏔️

Apache Hudi

Incremental Data Lake on S3

Hudi is optimized for incremental data ingestion and CDC (Change Data Capture) workloads on S3. Copy-on-Write (CoW) for read-heavy and Merge-on-Read (MoR) for write-heavy pipelines. Uber originally built Hudi to handle millions of trips/day updating S3-backed tables.

CDC Incremental Upserts

🏗️ Frameworks Optimized for HDFS

🗺️

Apache MapReduce / YARN

Batch Processing (Native)

The original HDFS compute paradigm. YARN's locality-aware scheduler ensures map tasks run on DataNodes holding the input splits, achieving near-zero network I/O for reads. Still used for ETL workloads where predictable, stable throughput outweighs Spark's in-memory speed.

Locality-aware Batch ETL
🐝

Apache HBase

NoSQL Wide-Column on HDFS

HBase is tightly coupled to HDFS — its WAL (Write-Ahead Log) and HFile storage require HDFS append semantics and atomic renames. Provides millisecond random reads via its LSM-tree architecture. Facebook's Messages used HBase on HDFS at hundreds of PB. Not feasible on raw S3.

Low-latency Append-dependent
# HBase WAL requires HDFS append hbase.rootdir = hdfs://namenode:8020/hbase hbase.wal.provider = filesystem hbase.regionserver.hlog.enabled = true
📨

Apache Kafka (on HDFS)

Event Streaming with HDFS Storage

Kafka log segments can be tiered to HDFS via Confluent Tiered Storage or the Kafka HDFS Connector (Kafka Connect). Enables long-term log retention on HDFS with Kafka's sequential I/O pattern matching HDFS block-sequential reads perfectly.

Streaming Tiered Storage
🐘

Apache Hive (LLAP)

SQL Analytics on HDFS

Hive with LLAP (Live Long and Process) daemon caches data in HDFS-adjacent memory, enabling sub-second interactive queries. HDFS's fast metadata enables efficient partition management at scale. Hive ACID on ORC provides row-level insert/update/delete on HDFS-backed tables.

Interactive SQL ACID (ORC)
🌩️

Apache Spark (Local HDFS)

In-cluster Batch + Streaming

Spark on HDFS achieves maximum performance when co-located with DataNodes. YARN's node-local, rack-local scheduling minimizes data shuffle distance. For iterative ML workloads (MLlib), HDFS-backed RDD caching on local disks outperforms S3-backed Spark significantly.

Locality MLlib
🔄

Apache Flink (Stateful)

Stateful Stream Processing

Flink's state backends (RocksDB) checkpoint directly to HDFS, leveraging append semantics and reliable block storage. HDFS-backed checkpoints are faster to write and restore than S3 for large state sizes (100s of GB). Ideal for stateful CEP and windowed aggregation pipelines.

Checkpointing Stateful

Most Suitable Ecosystems

☁️ S3 Ideal Ecosystems

🏠 Cloud-Native Data Lakehouse

  • Storage Layer S3 Standard / Intelligent-Tiering
  • Table Format Apache Iceberg or Delta Lake
  • Catalog AWS Glue / Project Nessie
  • Query Engine Trino, Amazon Athena, Spark SQL
  • Orchestration Apache Airflow / AWS Step Functions
  • Compute EMR Serverless / Databricks on AWS
  • BI / Viz Redshift Spectrum, QuickSight, Superset

🌊 Streaming & Event-Driven Analytics

  • Ingestion Kinesis Data Streams / MSK (Kafka)
  • Processing Kinesis Data Analytics (Flink)
  • Sink Kinesis Firehose → S3 (Parquet/ORC)
  • Serving DynamoDB, ElastiSearch / OpenSearch
  • Cold Query Athena on S3

🤖 MLOps & AI Platform

  • Feature Store SageMaker Feature Store → S3
  • Model Registry MLflow / SageMaker Model Registry
  • Data Versioning DVC + S3 backend
  • Training Data S3 → SageMaker Training Jobs
  • LLM Fine-tuning S3 datasets → Bedrock / SageMaker

🏗️ HDFS Ideal Ecosystems

🏭 On-Premises Hadoop Data Platform

  • Storage HDFS (Erasure Coded, HA NameNode)
  • Resource Mgmt YARN + Capacity Scheduler
  • SQL Layer Hive LLAP / Impala / Presto
  • Batch Compute Spark, MapReduce
  • NoSQL Store HBase (WAL on HDFS)
  • Security Kerberos + Apache Ranger
  • Governance Apache Atlas (lineage on HDFS)

⚙️ Real-Time Operational Systems

  • Message Bus Apache Kafka (HDFS tiered storage)
  • Stream Proc. Apache Flink (HDFS checkpoints)
  • Serving DB HBase (HDFS-backed LSM)
  • OLAP Cache Apache Druid (deep store: HDFS)
  • Search Apache Solr (index on HDFS)

🔬 Scientific / HPC Workloads

  • Storage HDFS (large sequential files)
  • Compute Spark MLlib, H2O.ai on YARN
  • Genomics GATK / Adam (Spark + HDFS)
  • Graph Apache Giraph / GraphX on HDFS
  • Formats Avro, Parquet, ORC, SequenceFile

Decision Guide

☁️ Choose S3 When...

✔ Building a greenfield cloud-native architecture with no on-premises constraints.

✔ Your data access is primarily read-heavy and batch, tolerating 100–500ms latency.

✔ You want zero operational overhead — no cluster team, no capacity planning.

✔ Your data temperature varies — some data is hot, most is warm/cold. S3 tiering saves dramatically.

✔ You need global multi-region durability (CRR) and compliance (WORM, Object Lock).

✔ You're building a data lakehouse with Iceberg/Delta + Spark/Trino/Athena stack.

✔ Your compute is elastic and bursty — you pay only for active processing cycles.

✔ Team expertise is in cloud-native tools rather than Hadoop ecosystem administration.

🏗️ Choose HDFS When...

✔ You have significant on-premises infrastructure investment and regulatory constraints preventing cloud adoption.

✔ Your workloads are shuffle-intensive and latency-sensitive — e.g., iterative ML, graph processing where data locality eliminates network I/O.

✔ You run HBase or other systems requiring POSIX append semantics and atomic renames.

✔ Your egress costs on cloud would be catastrophic — heavy compute jobs reading PBs repeatedly.

✔ You require sub-10ms read latency from local DataNode reads (e.g., real-time ML inference backed by Hive LLAP).

✔ You already operate a mature Hadoop platform with a dedicated ops team and want incremental migration.

✔ Your data consists of large sequential files accessed in batch patterns — HDFS block streaming is optimally tuned for this.

✔ You need vendor-neutral open-source with full data sovereignty.

🔀 Consider Hybrid Architecture When...

Most large-scale enterprises end up here. Use HDFS for hot/operational data (HBase, Kafka, real-time Flink) while simultaneously using S3 as the canonical data lake for historical, analytical, and cold data. DistCp or Flink pipelines replicate from HDFS to S3 for long-term retention and cloud analytics. This pattern is common at LinkedIn, Twitter/X, Alibaba, and Tencent. The S3A connector in Hadoop 3.x enables Spark to read both HDFS and S3 transparently, enabling gradual migration. Platforms like Cloudera CDP and AWS EMR explicitly support this hybrid model.

PB-Scale Design Patterns

🏗️

S3 Lakehouse — Medallion Architecture

Multi-layered approach to data quality and reliability on S3:

🥉 Bronze: Raw ingest — Parquet/Avro, no transform
🥈 Silver: Cleaned, deduplicated, schema-enforced
🥇 Gold: Aggregated, business-ready, Iceberg ACID

Tip: Use S3 Lifecycle to auto-move Bronze → Glacier after 90 days. Only Gold tables need Standard tier.

HDFS Lambda Architecture

Classic pattern for combining batch accuracy with real-time speed:

📥 Speed Layer: Flink/Storm → HDFS/HBase (minutes)
🔄 Batch Layer: Spark → HDFS (hourly/daily)
🔍 Serving Layer: Hive LLAP / Impala merge views

Note: Kappa architecture (Flink only) is increasingly preferred for simplicity on HDFS/S3.

🌉

HDFS → S3 Migration Pattern

Phased migration from on-prem HDFS to cloud S3:

Phase 1: Cold data migration via DistCp
Phase 2: Dual-write to HDFS + S3 (S3A connector)
Phase 3: Read from S3, retire HDFS DataNodes
# Parallel DistCp PB transfer hadoop distcp -m 1000 -bandwidth 100 \ -strategy dynamic \ hdfs://src/data s3a://bucket/data

🔧 PB-Scale Tuning Reference

S3 Optimization Checklist
# 1. Partition prefix strategy s3://bucket/year=2024/month=01/day=15/*.parquet # 2. Enable S3 Transfer Acceleration endpoint = bucket.s3-accelerate.amazonaws.com # 3. Magic committer (Spark) spark.hadoop.fs.s3a.committer.name = magic # 4. Parallel multipart uploads fs.s3a.multipart.size = 128M fs.s3a.connection.maximum = 200 # 5. S3 Intelligent-Tiering (auto) PUT s3://bucket/data --storage-class INTELLIGENT_TIERING
HDFS Optimization Checklist
# 1. Block size for large sequential files dfs.blocksize = 256m # 2. Erasure Coding (saves 50% storage) hdfs ec -setPolicy -policy RS-6-3-1024k -path /cold # 3. NameNode heap for 1B files HADOOP_NAMENODE_OPTS="-Xmx200g" # 4. Short-circuit local reads dfs.client.read.shortcircuit = true # 5. HDFS Federation for namespace scale dfs.nameservices = ns1,ns2 dfs.namenode.rpc-address.ns1 = nn1:8020

Executive Summary

Scalability
S310/10
HDFS7/10
Latency
S34/10
HDFS9/10
Cost Efficiency
S38/10
HDFS6/10
Ops Simplicity
S310/10
HDFS3/10
Throughput
S39/10
HDFS9/10
Ecosystem
S39/10
HDFS9/10
🎯

Architect's Verdict: In 2024–2025, the industry has decisively shifted toward S3 as the long-term storage layer for new big data systems — driven by lakehouse formats (Iceberg, Delta), serverless query engines (Athena, Snowflake on S3), and the elimination of the data locality advantage through high-bandwidth networking. HDFS remains irreplaceable for HBase, stateful Flink checkpoints, and on-premises regulated environments. For PB-scale greenfield systems: adopt the S3 lakehouse pattern with Iceberg + Spark on EMR Serverless or Databricks. For existing HDFS investments: plan a phased migration using DistCp for cold data and S3A connector for hybrid reads, targeting a 3–5 year full migration horizon.