📦 Storage Systems Deep Dive

File · Object · Block
Storage Explained

A comprehensive technical reference covering internal structure, tradeoffs, and architecture decision guides for the three fundamental storage paradigms.

🗂 File Storage
🪣 Object Storage
🧱 Block Storage

Storage Types — Structure & Internals

🗂

File Storage

Hierarchical filesystem abstraction. Organizes data as files inside a tree of directories/folders — the most familiar model to humans. Used by NAS, NFS, SMB/CIFS, and local OS filesystems.

📐 Hierarchical Structure
/ (root)
📁 home/
📁 alice/
📄 resume.pdf → inode:4821, 245KB, rwxr-xr-x
📄 notes.txt → inode:4822, 12KB
📁 var/
📁 log/
📄 syslog → inode:1100, appended daily
📁 mnt/
📁 nas-share/ ← NFS mount point
Metadata layer: inode → permissions, timestamps, owner, size, data block pointers
Access path: namespace/path/filename → traverse directory tree → locate inode → read blocks
⚙️ Key Attributes
Protocol
NFS, SMB/CIFS, AFP
Addressing
Path-based
Mutability
Fully mutable (in-place)
Metadata
POSIX (inode-based)
Scale
TB range (NAS)
Access
Shared / concurrent
✓ Pros
  • Human-readable, familiar hierarchy
  • POSIX semantics (rename, lock, append)
  • Works out-of-the-box with OS & apps
  • Fine-grained permissions (ACLs)
  • Good for shared team access (NAS)
  • Supports symbolic links & hard links
✗ Cons
  • Difficult to scale beyond petabytes
  • Deep nesting creates performance issues
  • Namespace is single point of contention
  • Not ideal for cloud-native / distributed
  • Limited metadata (only POSIX fields)
  • Challenging cross-region replication
🏷 Common Implementations
ext4 XFS NTFS ZFS NFS SMB AWS EFS Azure Files GCP Filestore NetApp ONTAP
🪣

Object Storage

Flat namespace of data objects. Each object = data payload + rich, custom metadata + unique key. No hierarchy — everything lives in a bucket. Purpose-built for massive scale, durability, and HTTP access.

📐 Flat Namespace Structure
🪣 media-assets
photos/2024/hero.jpg
4.2 MB ETag: a3f1...
videos/intro.mp4
1.1 GB custom: cdn=true
🪣 backups-prod
db/2024-03-01.sql.gz
550 MB tier: archive
logs/app-2024-03.tar
2.1 GB ttl: 90d
Access: HTTP GET/PUT/DELETE | Content-Addressable by Key | Immutable writes (overwrite = new version)
Anatomy: [bucket-name] + [object-key] + [data bytes] + [system metadata] + [user-defined metadata]
⚙️ Key Attributes
Protocol
HTTP/S3 REST API
Addressing
Key-based (URL)
Mutability
Immutable (versioned)
Metadata
Unlimited custom KV
Scale
Exabytes (unlimited)
Durability
11 nines (S3)
✓ Pros
  • Infinite horizontal scalability
  • Extremely durable (multi-region replication)
  • Rich, queryable metadata on every object
  • HTTP-native (CDN-friendly, global access)
  • Cost-effective for large cold data
  • Built-in versioning & lifecycle policies
✗ Cons
  • Not POSIX — no append, no random write
  • Higher latency vs block (ms not µs)
  • Object immutability: full re-upload to edit
  • Not suitable for databases or OS volumes
  • Eventual consistency (in some configs)
  • No directory locking or file locking
🏷 Common Implementations
AWS S3 GCS Azure Blob MinIO Ceph RADOS Cloudflare R2 Wasabi OpenStack Swift
🧱

Block Storage

Low-level raw storage volumes split into fixed-size blocks. No intrinsic structure — a filesystem, database, or OS is layered on top. The closest abstraction to physical hard drives; delivers highest performance.

📐 Block-Level Structure (512B – 4KB blocks)
Volume
Block 0
MBR/GPT
Block 1
Superblock
Block 2
Inode tbl
Block 3
Data
FS Layer
ext4 / XFS
filesystem
Journal
writes
Free
unalloc
File data
chunks
Or DB
Page 0
Postgres
WAL log
buffer
Index
b-tree
Heap
tuples
Access: iSCSI, NVMe-oF, Fibre Channel, virtio-blk | Block addressing (LBA: Logical Block Address)
No metadata in storage layer — structure is entirely defined by whatever is layered on top
⚙️ Key Attributes
Protocol
iSCSI, NVMe-oF, FC
Addressing
Block/LBA address
Mutability
Random read/write
Metadata
None (external)
Latency
Sub-millisecond (µs)
Attach
Single instance
✓ Pros
  • Lowest latency of any storage type
  • Highest IOPS — ideal for databases
  • Supports any filesystem (OS formats it)
  • Full random access at byte level
  • Predictable, consistent performance
  • Snapshots & cloning at block level
✗ Cons
  • Typically attached to only one instance
  • No built-in sharing across multiple hosts
  • Higher cost per GB vs object storage
  • No intrinsic data redundancy (you manage)
  • No metadata or search capabilities
  • Not globally accessible (region-locked)
🏷 Common Implementations
AWS EBS GCP Persistent Disk Azure Disk iSCSI SAN Ceph RBD LVM SAN (NetApp, Pure) NVMe SSD

Primary Use Cases & Workloads

🗂

File Storage

Best for shared, human-accessed data
  • Team file shares & NAS environments
  • Home directories & user files
  • Collaborative document editing
  • Media production (video editing, DAM)
  • Legacy apps requiring POSIX filesystem
  • Lift-and-shift enterprise workloads
  • Log aggregation on a shared server
🪣

Object Storage

Best for large-scale, cloud-native data
  • Static website assets & CDN origin
  • Media uploads: images, video, audio
  • Data lake & big data analytics
  • Database & VM backups / snapshots
  • Machine learning training datasets
  • Long-term archival & compliance
  • Software artifact & container registry
🧱

Block Storage

Best for performance-sensitive workloads
  • Relational databases (PostgreSQL, MySQL)
  • NoSQL databases (MongoDB, Cassandra)
  • OS boot volumes & system disks
  • Virtual machine persistent volumes
  • High-frequency transaction systems
  • Email servers (Exchange, Dovecot)
  • Kubernetes persistent volumes (PVCs)

Detailed Comparison Matrix

Dimension 🗂 File Storage 🪣 Object Storage 🧱 Block Storage
Data Unit File in directory tree Object with metadata + key Fixed-size block (512B–4KB)
Access Method open() / POSIX path HTTP REST (GET/PUT/DELETE) LBA address via iSCSI/NVMe
Hierarchy Tree of directories Flat (prefix-simulated) None (raw blocks)
Mutability In-place edit Full overwrite Random write
Scalability
Medium
Unlimited
Per-instance
Latency Low–Medium (ms) Medium–High (ms) Ultra-low (µs)
Throughput
(parallel)
IOPS Moderate Low Extreme (100K+)
Cost / GB Medium Very Low High
Durability
RAID-dependent
11 nines
Replication
Multi-host Access Yes (NFS/SMB) Yes (HTTP) No (single attach)
Metadata POSIX only (owner, perms, timestamps) Unlimited custom key-value pairs None (storage-level)
Versioning Manual / FS-level Built-in Snapshots
Global Access Network-limited Global (HTTP/CDN) Region-locked
Consistency Model Strong (POSIX) Strong (S3 2020+) / Eventual Strong (per-volume)
Encryption At rest / LUKS SSE, CSE built-in At-rest encryption
Best For Shared workloads Web/cloud scale Databases/VMs

Choosing the Right Storage by Scenario

Scenario 01
🌐

Web / SaaS Application

A typical 3-tier web app serving millions of users with media uploads, user data, and a relational database.

🧱
Block Storage
Database volumes (PostgreSQL/MySQL) — max IOPS, low latency
Primary DB
🪣
Object Storage
User uploads, media assets, static files — serve via CDN
Media/Assets
🗂
File Storage
App config, shared logs across instances
Optional
Scenario 02
🤖

ML / Data Platform

Training pipelines, feature stores, and model registries at petabyte scale with multiple compute workers.

🪣
Object Storage
Training datasets, checkpoints, model artifacts — data lake
Data Lake
🧱
Block Storage
Fast NVMe scratch disks on GPU workers for hot data
Hot Cache
🗂
File Storage
Shared POSIX access for multi-node training jobs (EFS/Lustre)
Multi-node
Scenario 03
☸️

Kubernetes / Cloud-Native

Stateful microservices on K8s with PersistentVolumes, sidecars, and operator-managed databases.

🧱
Block Storage
PersistentVolumeClaims (RWO) for StatefulSets, databases
PVC (RWO)
🗂
File Storage
RWX volumes for shared configs, CMS content between pods
PVC (RWX)
🪣
Object Storage
Container registry, build artifacts, log archival
Artifacts
Scenario 04
🏢

Enterprise / Hybrid NAS

On-premise enterprise with shared drives, legal archives, and ERP/database systems.

🗂
File Storage
Department shares, home drives, project folders via SMB
NAS/SMB
🧱
Block Storage
Oracle/SQL Server databases on SAN (Fibre Channel)
SAN/DB
🪣
Object Storage
Regulatory archival, email backup, cold data tier
Archive
Scenario 05
🎬

Media & Streaming Platform

Video ingestion, transcoding pipeline, and global streaming with petabytes of content.

🪣
Object Storage
Video library, HLS segments, thumbnails behind CDN
Content Store
🗂
File Storage
Shared NAS for editors & transcoding workers (high-bandwidth)
Edit/Ingest
🧱
Block Storage
Metadata DB (content catalog, user activity)
Metadata DB
Scenario 06
💳

Fintech / High-Frequency Trading

Ultra-low latency transaction processing, compliance archival, and real-time analytics.

🧱
Block Storage
Local NVMe + SAN for transaction engines — µs latency
Core DB
🪣
Object Storage
Regulatory compliance archives (7+ year retention)
Compliance
🗂
File Storage
Shared audit logs, config distribution
Ops

Which Storage Should You Use?

New Storage Requirement
Do you need low-latency
random read/write?
← YES
🧱 Block Storage
Databases, VM disks, OS volumes, high IOPS workloads
NO →
Do many users / services
need concurrent shared access?
YES — POSIX?
🗂 File Storage
NAS, team shares,
legacy apps
YES — HTTP?
🪣 Object Storage
Cloud-native, CDN,
data lake, backups
Quick Rules of Thumb
🗂 Use File when…
You need multiple machines to share files over a network with directory semantics (NFS/SMB)
🪣 Use Object when…
You're storing unstructured data at scale (GBs to EBs), or need HTTP-accessible, globally distributed storage
🧱 Use Block when…
You're running a database, VM, or any workload that needs raw IOPS, byte-addressable random access
🗂
File Storage
The Human-Friendly Layer

Think of it as a shared filing cabinet. Best when people or legacy apps need to browse and edit files in a familiar folder hierarchy. Simple, intuitive, but limited in scale.

🪣
Object Storage
The Internet-Scale Layer

Think of it as a massive flat warehouse with infinite shelving. Perfect for cloud-native apps, data lakes, and anything that needs to be globally accessible at massive scale.

🧱
Block Storage
The Performance Layer

Think of it as a raw hard drive in the cloud. Essential for databases and VMs that demand the lowest possible latency and highest IOPS — no abstractions in the way.