TCP/IP (Transmission Control Protocol/Internet Protocol) is the foundational suite of communication protocols used to interconnect network devices, providing reliable ordered data delivery that NVMe/TCP uses as its transport layer.
TCP/IP refers collectively to two distinct protocols that work in tandem. IP (Internet Protocol, RFC 791) is a Layer-3 connectionless protocol responsible for addressing and routing packets between hosts — it provides no reliability guarantees and packets may arrive out of order, be duplicated, or be dropped entirely. TCP (Transmission Control Protocol, RFC 793) is a Layer-4 connection-oriented protocol that sits on top of IP and provides reliable, ordered, error-checked byte-stream delivery through mechanisms including sequence numbers, acknowledgments, retransmission timers, and sliding-window flow control.
TCP establishes connections via the three-way handshake (SYN, SYN-ACK, ACK) and tears them down with a four-way FIN sequence. Within an established connection, TCP's congestion control algorithms (Cubic, BBR, DCTCP) manage the sending rate to avoid overwhelming routers and switches. For storage workloads, TCP's default Nagle algorithm — which delays small packets to batch them into larger segments — must typically be disabled (TCP_NODELAY) to avoid adding unnecessary latency to small NVMe command PDUs.
Modern TCP implementations include numerous performance enhancements relevant to storage networking: TCP segmentation offload (TSO) and large receive offload (LRO) allow the NIC to handle segmentation and reassembly in hardware, reducing CPU overhead. Receive-side scaling (RSS) distributes incoming TCP connections across multiple CPU cores using a hash of the 4-tuple (source IP, source port, destination IP, destination port), enabling NVMe/TCP to scale throughput linearly with CPU core count.
TCP/IP is the transport layer that makes NVMe/TCP possible on commodity infrastructure. The NVMe/TCP specification defines a PDU framing layer that sits between NVMe and TCP: NVMe commands and data are encapsulated into NVMe/TCP PDUs, which are then handed to the TCP socket layer for reliable delivery. TCP's reliability guarantees mean NVMe/TCP does not need to implement its own retransmission or ordering logic — it inherits these properties from the transport. The tradeoff is that TCP's acknowledgment and congestion-control machinery adds latency compared to RDMA's credit-based lossless transport, but this is acceptable for the vast majority of storage workloads.
UDP is lower overhead than TCP, but its lack of reliability guarantees would require NVMe/TCP to re-implement retransmission, ordering, and flow control — duplicating TCP's work with no benefit. TCP's existing widespread hardware offload support (TSO, LRO, RSS) is a significant practical advantage: these offloads are available on virtually every commodity 10 GbE or faster NIC and dramatically reduce the CPU cost of NVMe/TCP. The NVMe/TCP specification leverages these offloads to achieve IOPS and throughput numbers that rival RDMA-based transports on modern hardware.