Queue depth is the number of I/O commands that can be submitted and outstanding simultaneously to a storage device or controller — higher queue depths enable greater parallelism and throughput for modern NVMe storage.
Queue depth — also called Outstanding I/O or I/O depth — is the number of I/O commands a host can have simultaneously in-flight to a storage target. When an application issues a read or write, that command enters a submission queue. The storage device processes it and returns a completion. If the device can process commands faster than they arrive, the queue is often empty (low queue depth). If the host can keep the queue full, the device's parallelism is fully utilized. Modern NVMe SSDs are designed to exploit deep queues: their internal NAND controllers process multiple flash operations simultaneously across thousands of flash dies, so a queue depth of 1 delivers a fraction of the device's potential IOPS.
The NVMe protocol was architected from the ground up for deep queue workloads. The NVMe specification allows up to 65,535 I/O queues per controller and up to 65,535 commands per queue, yielding a theoretical maximum of over 4 billion outstanding commands simultaneously. In practice, deployments use far fewer queues (typically one per CPU core), but even one queue per core at queue depth 64–256 provides dramatically more parallelism than legacy protocols.
Queue depth interacts with latency in a predictable way described by Little's Law: average latency = queue depth / throughput rate. At low queue depths, throughput is limited by the number of concurrent operations, not device speed. At very high queue depths, queueing delays begin to dominate and average latency rises. The optimal operating point — where IOPS are maximized while latency remains acceptable — is the "saturation point" of the storage path, and it depends heavily on queue depth support.
Queue depth is one of the primary reasons NVMe/TCP outperforms iSCSI and Fibre Channel at scale. Both iSCSI and traditional FC expose only a single queue per session/LUN, creating a fundamental concurrency bottleneck. NVMe/TCP inherits NVMe's multi-queue architecture, allowing each host CPU core to independently submit I/O without contention. For workloads like database OLTP — which require high concurrency of small random I/Os — this multi-queue advantage translates directly into 3–5× higher throughput and lower latency at the same network bandwidth.
| Protocol | Queues | Commands per Queue | Total Max Outstanding |
|---|---|---|---|
| NVMe/TCP | Up to 65,535 | Up to 65,535 | ~4 billion |
| iSCSI | 1 per session | 128 (CmdSN window) | 128 |
| Fibre Channel | 1 per LUN | 256 (TASK SET) | 256 |