Storage throughput is the volume of data transferred between a host and storage system per unit of time, measured in GB/s or MB/s — the key metric for large sequential I/O workloads like video streaming, AI training, and data analytics.
Throughput and IOPS measure different dimensions of storage performance. IOPS counts operations per second regardless of size — a 4K read and a 1MB read each count as one operation. Throughput measures bytes per second — that 4K read contributes 4 KB/s at 1 IOPS, while the 1MB read contributes 1 MB/s. For small random I/O (databases), IOPS is the binding constraint. For large sequential I/O (video ingest, AI training data loading, data warehouse scans), throughput is the binding constraint, and IOPS becomes almost irrelevant.
Storage throughput is ultimately bounded by the slowest element in the I/O path. For networked storage over NVMe/TCP, the network link is typically the bottleneck: a 25 GbE link provides approximately 3.0 GB/s of usable throughput after accounting for TCP/IP overhead; a 100 GbE link provides approximately 11–12 GB/s. NVMe SSDs can deliver 5–14 GB/s sequential read throughput (PCIe 4.0 × 4), so a single 25 GbE NVMe/TCP path will be network-bound before the SSD is saturated. 100 GbE or multi-path 25 GbE configurations are required to approach SSD sequential throughput limits.
Throughput is closely linked to queue depth for sequential workloads. With a single outstanding I/O request, throughput equals (block size / latency). At 100 µs latency with 128K blocks, that yields 1.28 GB/s maximum with a queue depth of 1. Increasing queue depth allows the storage path to prefetch and pipeline sequential blocks, increasing utilization of the available network and storage bandwidth. For NVMe/TCP over 100 GbE, a queue depth of 8–16 with 128K block sizes is typically sufficient to saturate the link.
NVMe/TCP scales throughput linearly with network bandwidth and the number of parallel data flows. Unlike iSCSI, which is limited by a single TCP connection per session (and thus single-stream TCP throughput limits), NVMe/TCP creates one TCP connection per NVMe I/O queue, enabling multiple parallel streams that each fill a portion of the available network bandwidth. This multi-stream architecture matches the behavior of modern network offload engines and achieves near wire-rate utilization on high-speed Ethernet links.
| Workload | Primary Metric | Typical Block Size | Access Pattern |
|---|---|---|---|
| OLTP Database | IOPS | 4K–8K | Random |
| AI/ML Training | Throughput | 256K–1M | Sequential |
| Video Ingest | Throughput | 512K–4M | Sequential |
| Message Queue | IOPS + Latency | 4K–64K | Mixed |