InfiniBand vs RoCE, How to Choose
As two main solutions for implementing RDMA, InfiniBand and RoCE play a key role in high-performance computing and data center networks due to their low latency and high bandwidth. How should we choose between them when building a network? This article will take you to learn in detail about what the two are, their characteristics, and their comparison.
Introduction and Principles of InfiniBand
InfiniBand network achieves low-latency, high-bandwidth data transmission by deploying InfiniBand components, generally including InfiniBand switches, cables, network cards, modules, and subnet managers. It adopts a point-to-point communication model and implements direct data transmission between hosts through direct memory access technology, reducing CPU intervention and data copy traffic overhead. QSFPTEK provides you with InfiniBand modules, which have the characteristics of stable transmission and low power consumption.
InfiniBand Technical Solution Features
InfiniBand network adopts a credit-based flow control mechanism to ensure the reliability and efficiency of data transmission. Before data transmission, the sender will confirm that the receiver has enough buffer to receive the corresponding number of data packets based on the credit limit provided by the receiver. Each link is equipped with a predefined buffer, and the data transmission of the sender is limited by the size of the available buffer of the receiver. After the data transmission is completed, the receiver will release the buffer and feedback on the currently available buffer size to the sender. This link-level flow control mechanism effectively prevents buffer overflow and data packet loss, ensuring the stability and high performance of the network.
In terms of bandwidth, latency and scalability, InfiniBand provides up to 400Gbps bandwidth and end-to-end latency as low as 1.6 microseconds, which can meet the needs of large-scale data transmission and real-time computing. InfiniBand also has strong scalability, supports large-scale clusters, can connect tens of thousands of servers, and is very suitable for ultra-large-scale computing environments.
However, to implement InfiniBand, you need to purchase dedicated InfiniBand network cards switches and other equipment, which are expensive.
Introduction and Principle of RoCEv2
RoCEv2 is an Ethernet-based RDMA protocol that allows remote direct memory access on Ethernet networks. However, unlike InfiniBand, it can provide high-performance RDMA communication based on existing infrastructure. Unlike InfiniBand, which relies on a subnet manager for centralized management, RoCEv2 networks operate as fully distributed networks, usually using a two-tier architecture. QSFPTEK's RoCE switches support both RoCEv1 and RoCEv2, making them compatible with all devices in your network.
RoCEv2 Technical Solution Features
RoCEv2 is characterized by its high versatility and low cost compared to InfiniBand. It can be shared in high-performance RDMA networks and traditional Ethernet networks without the need for additional hardware or network device changes. However, it requires the configuration of parameters such as headroom, PFC, and ECN on the switch, which is somewhat complex. In large-scale deployments, especially with a large number of network cards, its network throughput performance will be somewhat reduced compared to InfiniBand networks.
InfiniBand vs RoCEv2
Bandwidth and latency: Compared with RoCEv2, InfiniBand has higher bandwidth and lower latency, which is suitable for application scenarios with extremely high network performance requirements. Although RoCEv2 is slightly inferior, it is sufficient to meet most high-performance computing needs.
Compatibility and cost: InfiniBand hardware is generally expensive. To implement an InfiniBand network, you need to use InfiniBand hardware, so its cost is high. RoCEv2 is compatible with existing Ethernet hardware. You only need to add devices that support RoCEv2, which reduces deployment and maintenance costs. InfiniBand hardware is only produced by one manufacturer, while RoCEv2 hardware is supported by multiple manufacturers.
Scale: InfiniBand supports tens of thousands of servers and is suitable for larger computing environments. RoCEv2 is suitable for small and medium-sized data centers and clusters.
Configuration complexity: RoCEv2 configuration is relatively complex, and the switch needs to be finely configured to ensure its performance and reliability. InfiniBand configuration is relatively simple and easy to manage.
Conclusion
InfiniBand and RoCEv2 have their own advantages, and you need to decide based on your application scenario. InfiniBand is suitable for ultra-large-scale data centers with tens of thousands of servers. It has lower latency and higher bandwidth, and is easy to configure during deployment, but its cost is higher. RoCEv2 is suitable for small and medium-sized data centers with thousands of servers. It has a lower cost, but the configuration is more complicated.