Storage Deep Dive — File · Object

Three Paradigms

Storage Types — Structure & Internals

🗂

File Storage

Hierarchical filesystem abstraction. Organizes data as files inside a tree of directories/folders — the most familiar model to humans. Used by NAS, NFS, SMB/CIFS, and local OS filesystems.

📐 Hierarchical Structure

/ (root)

📁 home/

📁 alice/

📄 resume.pdf → inode:4821, 245KB, rwxr-xr-x

📄 notes.txt → inode:4822, 12KB

📁 var/

📁 log/

📄 syslog → inode:1100, appended daily

📁 mnt/

📁 nas-share/ ← NFS mount point

Metadata layer: inode → permissions, timestamps, owner, size, data block pointers
Access path: namespace/path/filename → traverse directory tree → locate inode → read blocks

⚙️ Key Attributes

Protocol

NFS, SMB/CIFS, AFP

Addressing

Path-based

Mutability

Fully mutable (in-place)

Metadata

POSIX (inode-based)

Scale

TB range (NAS)

Access

Shared / concurrent

✓ Pros

Human-readable, familiar hierarchy
POSIX semantics (rename, lock, append)
Works out-of-the-box with OS & apps
Fine-grained permissions (ACLs)
Good for shared team access (NAS)
Supports symbolic links & hard links

✗ Cons

Difficult to scale beyond petabytes
Deep nesting creates performance issues
Namespace is single point of contention
Not ideal for cloud-native / distributed
Limited metadata (only POSIX fields)
Challenging cross-region replication

🏷 Common Implementations

ext4 XFS NTFS ZFS NFS SMB AWS EFS Azure Files GCP Filestore NetApp ONTAP

🪣

Object Storage

Flat namespace of data objects. Each object = data payload + rich, custom metadata + unique key. No hierarchy — everything lives in a bucket. Purpose-built for massive scale, durability, and HTTP access.

📐 Flat Namespace Structure

🪣 media-assets

photos/2024/hero.jpg

4.2 MB ETag: a3f1...

videos/intro.mp4

1.1 GB custom: cdn=true

🪣 backups-prod

db/2024-03-01.sql.gz

550 MB tier: archive

logs/app-2024-03.tar

2.1 GB ttl: 90d

              Access: HTTP GET/PUT/DELETE  |  Content-Addressable by Key  |  Immutable writes (overwrite = new version)

              Anatomy: [bucket-name] + [object-key] + [data bytes] + [system metadata] + [user-defined metadata]

⚙️ Key Attributes

Protocol

HTTP/S3 REST API

Addressing

Key-based (URL)

Mutability

Immutable (versioned)

Metadata

Unlimited custom KV

Scale

Exabytes (unlimited)

Durability

11 nines (S3)

✓ Pros

Infinite horizontal scalability
Extremely durable (multi-region replication)
Rich, queryable metadata on every object
HTTP-native (CDN-friendly, global access)
Cost-effective for large cold data
Built-in versioning & lifecycle policies

✗ Cons

Not POSIX — no append, no random write
Higher latency vs block (ms not µs)
Object immutability: full re-upload to edit
Not suitable for databases or OS volumes
Eventual consistency (in some configs)
No directory locking or file locking

🏷 Common Implementations

AWS S3 GCS Azure Blob MinIO Ceph RADOS Cloudflare R2 Wasabi OpenStack Swift

🧱

Block Storage

Low-level raw storage volumes split into fixed-size blocks. No intrinsic structure — a filesystem, database, or OS is layered on top. The closest abstraction to physical hard drives; delivers highest performance.

📐 Block-Level Structure (512B – 4KB blocks)

Volume

Block 0
MBR/GPT

Block 1
Superblock

Block 2
Inode tbl

Block 3
Data

FS Layer

ext4 / XFS
filesystem

Journal
writes

Free
unalloc

File data
chunks

Or DB

Page 0
Postgres

WAL log
buffer

Index
b-tree

Heap
tuples

              Access: iSCSI, NVMe-oF, Fibre Channel, virtio-blk  |  Block addressing (LBA: Logical Block Address)

              No metadata in storage layer — structure is entirely defined by whatever is layered on top

⚙️ Key Attributes

Protocol

iSCSI, NVMe-oF, FC

Addressing

Block/LBA address

Mutability

Random read/write

Metadata

None (external)

Latency

Sub-millisecond (µs)

Attach

Single instance

✓ Pros

Lowest latency of any storage type
Highest IOPS — ideal for databases
Supports any filesystem (OS formats it)
Full random access at byte level
Predictable, consistent performance
Snapshots & cloning at block level

✗ Cons

Typically attached to only one instance
No built-in sharing across multiple hosts
Higher cost per GB vs object storage
No intrinsic data redundancy (you manage)
No metadata or search capabilities
Not globally accessible (region-locked)

🏷 Common Implementations

AWS EBS GCP Persistent Disk Azure Disk iSCSI SAN Ceph RBD LVM SAN (NetApp, Pure) NVMe SSD

When to Use Each

Primary Use Cases & Workloads

🗂

File Storage

Best for shared, human-accessed data

Team file shares & NAS environments
Home directories & user files
Collaborative document editing
Media production (video editing, DAM)
Legacy apps requiring POSIX filesystem
Lift-and-shift enterprise workloads
Log aggregation on a shared server

🪣

Object Storage

Best for large-scale, cloud-native data

Static website assets & CDN origin
Media uploads: images, video, audio
Data lake & big data analytics
Database & VM backups / snapshots
Machine learning training datasets
Long-term archival & compliance
Software artifact & container registry

🧱

Block Storage

Best for performance-sensitive workloads

Relational databases (PostgreSQL, MySQL)
NoSQL databases (MongoDB, Cassandra)
OS boot volumes & system disks
Virtual machine persistent volumes
High-frequency transaction systems
Email servers (Exchange, Dovecot)
Kubernetes persistent volumes (PVCs)

Head-to-Head

Detailed Comparison Matrix

Dimension	🗂 File Storage	🪣 Object Storage	🧱 Block Storage
Data Unit	File in directory tree	Object with metadata + key	Fixed-size block (512B–4KB)
Access Method	`open()` / POSIX path	HTTP REST (GET/PUT/DELETE)	LBA address via iSCSI/NVMe
Hierarchy	Tree of directories	Flat (prefix-simulated)	None (raw blocks)
Mutability	In-place edit	Full overwrite	Random write
Scalability	★★★★★ Medium	★★★★★ Unlimited	★★★★★ Per-instance
Latency	Low–Medium (ms)	Medium–High (ms)	Ultra-low (µs)
Throughput	★★★★★	★★★★★ (parallel)	★★★★★
IOPS	Moderate	Low	Extreme (100K+)
Cost / GB	Medium	Very Low	High
Durability	★★★★★ RAID-dependent	★★★★★ 11 nines	★★★★★ Replication
Multi-host Access	Yes (NFS/SMB)	Yes (HTTP)	No (single attach)
Metadata	POSIX only (owner, perms, timestamps)	Unlimited custom key-value pairs	None (storage-level)
Versioning	Manual / FS-level	Built-in	Snapshots
Global Access	Network-limited	Global (HTTP/CDN)	Region-locked
Consistency Model	Strong (POSIX)	Strong (S3 2020+) / Eventual	Strong (per-volume)
Encryption	At rest / LUKS	SSE, CSE built-in	At-rest encryption
Best For	Shared workloads	Web/cloud scale	Databases/VMs

Architecture Guide

Choosing the Right Storage by Scenario

Scenario 01

🌐

Web / SaaS Application

A typical 3-tier web app serving millions of users with media uploads, user data, and a relational database.

🧱

Block Storage

Database volumes (PostgreSQL/MySQL) — max IOPS, low latency

Primary DB

🪣

Object Storage

User uploads, media assets, static files — serve via CDN

Media/Assets

🗂

File Storage

App config, shared logs across instances

Optional

Scenario 02

🤖

ML / Data Platform

Training pipelines, feature stores, and model registries at petabyte scale with multiple compute workers.

🪣

Object Storage

Training datasets, checkpoints, model artifacts — data lake

Data Lake

🧱

Block Storage

Fast NVMe scratch disks on GPU workers for hot data

Hot Cache

🗂

File Storage

Shared POSIX access for multi-node training jobs (EFS/Lustre)

Multi-node

Scenario 03

☸️

Kubernetes / Cloud-Native

Stateful microservices on K8s with PersistentVolumes, sidecars, and operator-managed databases.

🧱

Block Storage

PersistentVolumeClaims (RWO) for StatefulSets, databases

PVC (RWO)

🗂

File Storage

RWX volumes for shared configs, CMS content between pods

PVC (RWX)

🪣

Object Storage

Container registry, build artifacts, log archival

Artifacts

Scenario 04

🏢

Enterprise / Hybrid NAS

On-premise enterprise with shared drives, legal archives, and ERP/database systems.

🗂

File Storage

Department shares, home drives, project folders via SMB

NAS/SMB

🧱

Block Storage

Oracle/SQL Server databases on SAN (Fibre Channel)

SAN/DB

🪣

Object Storage

Regulatory archival, email backup, cold data tier

Media & Streaming Platform

Video ingestion, transcoding pipeline, and global streaming with petabytes of content.

🪣

Object Storage

Video library, HLS segments, thumbnails behind CDN

Content Store

🗂

File Storage

Shared NAS for editors & transcoding workers (high-bandwidth)

Edit/Ingest

🧱

Block Storage

Metadata DB (content catalog, user activity)

Metadata DB

Scenario 06

💳

Fintech / High-Frequency Trading

Ultra-low latency transaction processing, compliance archival, and real-time analytics.

🧱

Block Storage

Local NVMe + SAN for transaction engines — µs latency

Core DB

🪣

Object Storage

Regulatory compliance archives (7+ year retention)

Compliance

🗂

File Storage

Shared audit logs, config distribution

Ops

Quick Decision

Which Storage Should You Use?

New Storage Requirement

↓

Do you need low-latency
random read/write?

← YES

🧱 Block Storage

Databases, VM disks, OS volumes, high IOPS workloads

↔

NO →

Do many users / services
need concurrent shared access?

YES — POSIX?

🗂 File Storage

NAS, team shares,
legacy apps

YES — HTTP?

🪣 Object Storage

Cloud-native, CDN,
data lake, backups

Quick Rules of Thumb

🗂 Use File when…

You need multiple machines to share files over a network with directory semantics (NFS/SMB)

🪣 Use Object when…

You're storing unstructured data at scale (GBs to EBs), or need HTTP-accessible, globally distributed storage

🧱 Use Block when…

You're running a database, VM, or any workload that needs raw IOPS, byte-addressable random access

🗂

File Storage

The Human-Friendly Layer

Think of it as a shared filing cabinet. Best when people or legacy apps need to browse and edit files in a familiar folder hierarchy. Simple, intuitive, but limited in scale.

🪣

Object Storage

The Internet-Scale Layer

Think of it as a massive flat warehouse with infinite shelving. Perfect for cloud-native apps, data lakes, and anything that needs to be globally accessible at massive scale.

🧱

Block Storage

The Performance Layer

Think of it as a raw hard drive in the cloud. Essential for databases and VMs that demand the lowest possible latency and highest IOPS — no abstractions in the way.

File · Object · BlockStorage Explained

Storage Types — Structure & Internals

File Storage

Object Storage

Block Storage

Primary Use Cases & Workloads

File Storage

Object Storage

Block Storage

Detailed Comparison Matrix

Choosing the Right Storage by Scenario

Web / SaaS Application

ML / Data Platform

Kubernetes / Cloud-Native

Enterprise / Hybrid NAS

Media & Streaming Platform

Fintech / High-Frequency Trading

Which Storage Should You Use?

File · Object · Block
Storage Explained