Performance

What is Throughput?

Storage throughput is the volume of data transferred between a host and storage system per unit of time, measured in GB/s or MB/s — the key metric for large sequential I/O workloads like video streaming, AI training, and data analytics.

Technical Overview

Throughput and IOPS measure different dimensions of storage performance. IOPS counts operations per second regardless of size — a 4K read and a 1MB read each count as one operation. Throughput measures bytes per second — that 4K read contributes 4 KB/s at 1 IOPS, while the 1MB read contributes 1 MB/s. For small random I/O (databases), IOPS is the binding constraint. For large sequential I/O (video ingest, AI training data loading, data warehouse scans), throughput is the binding constraint, and IOPS becomes almost irrelevant.

Storage throughput is ultimately bounded by the slowest element in the I/O path. For networked storage over NVMe/TCP, the network link is typically the bottleneck: a 25 GbE link provides approximately 3.0 GB/s of usable throughput after accounting for TCP/IP overhead; a 100 GbE link provides approximately 11–12 GB/s. NVMe SSDs can deliver 5–14 GB/s sequential read throughput (PCIe 4.0 × 4), so a single 25 GbE NVMe/TCP path will be network-bound before the SSD is saturated. 100 GbE or multi-path 25 GbE configurations are required to approach SSD sequential throughput limits.

Throughput is closely linked to queue depth for sequential workloads. With a single outstanding I/O request, throughput equals (block size / latency). At 100 µs latency with 128K blocks, that yields 1.28 GB/s maximum with a queue depth of 1. Increasing queue depth allows the storage path to prefetch and pipeline sequential blocks, increasing utilization of the available network and storage bandwidth. For NVMe/TCP over 100 GbE, a queue depth of 8–16 with 128K block sizes is typically sufficient to saturate the link.

How It Relates to NVMe/TCP

NVMe/TCP scales throughput linearly with network bandwidth and the number of parallel data flows. Unlike iSCSI, which is limited by a single TCP connection per session (and thus single-stream TCP throughput limits), NVMe/TCP creates one TCP connection per NVMe I/O queue, enabling multiple parallel streams that each fill a portion of the available network bandwidth. This multi-stream architecture matches the behavior of modern network offload engines and achieves near wire-rate utilization on high-speed Ethernet links.

Key Characteristics

  • Unit: GB/s (gigabytes per second) or MB/s
  • Block size dependency: Larger blocks = higher throughput at same IOPS
  • Network limit: 25 GbE ≈ 3 GB/s usable; 100 GbE ≈ 12 GB/s usable
  • NVMe SSD sequential: Up to 14 GB/s (PCIe 5.0 ×4)
  • Optimal block size: 128K–1M for sequential throughput testing
  • Multi-path benefit: Aggregating multiple NVMe/TCP paths scales throughput

Throughput vs IOPS — When Each Matters

Workload Primary Metric Typical Block Size Access Pattern
OLTP Database IOPS 4K–8K Random
AI/ML Training Throughput 256K–1M Sequential
Video Ingest Throughput 512K–4M Sequential
Message Queue IOPS + Latency 4K–64K Mixed