Towards the Rubin Architecture: The Leap of AI Data Center Networks from 800G to 1.6T

Date 04/02/2026

This article explains that the 1.6T module provides a significant boost to the Rubin architecture, and that upgrading the network to 1.6T can maximize its performance.

NVIDIA's next-generation Rubin GPU platform is driving a fundamental shift in AI infrastructure design. As AI workloads grow from billions of parameters to trillions, the challenges extend far beyond computational performance. What truly determines system efficiency is often the ability to efficiently move data between thousands of GPUs. In this context, the network is no longer just a support layer, but a core component determining AI performance. These systems are inherently designed for massive parallelism, with continuous and intensive communication between nodes, thus reshaping the design logic of data centers.

What is Rubin Architecture

NVIDIA's Rubin architecture is an upgrade from the Blackwell architecture. It's a full-stack collaborative computing platform specifically designed for ultra-large-scale AI model training and low-latency inference. Composed of multiple new chips, it boasts higher performance and energy efficiency, supporting the efficient operation of large-scale AI tasks and serving as a core infrastructure driving the large-scale development of AI.

Unlike previous GPUs that primarily focused on improving single-chip performance, Rubin was designed from the outset for large-scale distributed AI systems. Its ultimate goal is to achieve efficient training on clusters containing tens of thousands of GPUs.

In this environment, GPUs need to continuously exchange gradients, parameters, and intermediate data. The communication load is extremely heavy, and data transmission efficiency directly impacts training speed. While technologies like InfiniBand and RDMA can reduce latency and increase throughput, they still rely on sufficient bandwidth at the physical layer.

The Role of 800G in AI Infrastructure

The 800G interconnect has gradually become the de facto standard for high-performance AI infrastructure. Compared to 400G solutions, 800G achieves double the bandwidth while striking a reasonable balance between power consumption and deployment complexity at this stage.

In actual deployments, 800G solutions exhibit a clear division of labor based on application scenarios:

DACs are used for ultra-short-distance connections between GPUs and ToR switches within a server rack. Copper cabling introduces almost no additional power consumption and has the lowest transmission latency, making it the preferred medium for high-density interconnects within server racks.

AOCs are suitable for short-to-medium-distance connections between server racks. AOCs integrate optical modules and cables, simplifying the on-site deployment process while ensuring signal integrity and reducing the cleaning and maintenance requirements of fiber optic interfaces.

Optical modules are used in the core backbone layer of structured networks, supporting long-distance interconnects in Spine-Leaf architectures. Optical modules, in conjunction with fiber optic cabling systems, enable network expansion across PODs and regions.

This layered deployment approach reflects the actual engineering needs of large-scale AI clusters. Taking QSFPTEK as an example, its 800G product line covers the full range of DACs, AOCs and optical modules. Its core value lies in ensuring that users obtain consistent compatibility performance in a three-layer network architecture. In high-density, high-throughput AI cluster environments, stable interoperability across scenarios is often more meaningful than a single performance indicator.

The Limitations of 800G in the Rubin Era

While 800G solutions have been well-proven in current AI clusters, their limitations are becoming increasingly apparent when facing the next-generation computing density represented by the NVIDIA Rubin architecture.

The main sources of pressure are the following three aspects:

Soaring GPU Density: The number of GPUs per rack is evolving from the current mainstream 8 to 16 or 32, and the aggregated bandwidth demand within the rack is leaping from the Tbps level to tens of Tbps, posing a severe challenge to the convergence ratio of ToR switch ports.

Cluster Size Expansion: Cluster nodes are expanding from thousands to tens of thousands or even hundreds of thousands of nodes, and the network congestion pattern is evolving from local hotspots to globally distributed congestion. Simply increasing the number of links will lead to a superlinear increase in cabling complexity, power consumption, and maintenance costs;

Evolution of Communication Modes: New model parallelism strategies, represented by Expert Parallelism (MoE), significantly increase the synchronization frequency between GPUs compared to the traditional Transformer architecture, requiring the network system to simultaneously meet the combined requirements of high bandwidth and low latency.

From the perspective of the matching relationship between computing and network, the single-card computing throughput of the Rubin platform is expected to be more than twice that of the Blackwell architecture. If network bandwidth remains unchanged, the proportion of idle time for computing resources waiting for data will increase significantly, directly limiting the effective utilization of GPU computing power.

Why Are 1.6T Modules Needed

The advancement of 1.6T interconnect is largely driven by the rigid demand for network bandwidth from next-generation GPU architectures. Its significance lies not only in doubling port speeds but also in providing a viable technical path to address the aforementioned structural bottlenecks.

The core value of 1.6T is reflected in the following dimensions:

Increased Bandwidth Density: A single port provides 200GB/s of data throughput. For example, a 48-port switch can provide nearly 10Tbps of aggregate bandwidth, reducing the number of ports and rack space occupied by about half compared to 800G solutions;

Simplified Topology: At the same cluster size, 1.6T supports a flatter network topology, such as simplifying a three-level Clos architecture to a two-level architecture, reducing the number of Spine layer devices, and simultaneously reducing latency accumulation caused by cross-layer forwarding.

Reduced Operational Complexity: The reduction in the number of links directly leads to a simultaneous decrease in the number of fiber cores, optical modules, and fault points. In large-scale clusters, this effect can translate into a considerable improvement in operational efficiency.

QSFPTEK has already begun its technology deployment in the 1.6T product line, offering several 1.6T modules for AI data centers. This means they can proactively plan for next-generation network integration on existing 800G infrastructure, rather than facing the pressure of reactive upgrades during full deployment of the Rubin architecture.

Recommended Actual Upgrade Path

The evolution from 800G to 1.6T in real-world data center environments typically follows a gradual path, rather than a one-time complete replacement.

You can refer to the following upgrade path:

Access Layer and Short-Range Interconnects: Continue using the 800G DAC and AOC solutions. The advantages of copper cabling and active optical cables in terms of power consumption and cost remain difficult to replace in the short term for short-range scenarios within and between racks;

Spine Layer and Core Backbone: Introduce 1.6T optical modules and switches first to carry aggregated traffic across PODs, reserving sufficient bandwidth margin for the core layer to accommodate future expansion needs;

Hybrid Architecture Transition: Achieve hybrid operation of 800G and 1.6T equipment through link aggregation or rate negotiation mechanisms, gradually completing the generational replacement from the access layer to the core layer, avoiding one-time capital expenditure and service interruption risks.

For data centers that have already deployed 800G infrastructure, a smooth transition period of 2-3 years is typically adopted. During this period, it is necessary to simultaneously pay attention to the backward compatibility of optical modules, the reserved capacity of the cabling system, and the multi-rate coordination capabilities of the network operating system.

Conclusion

The shift from 800G to 1.6T optical modules is essential to support NVIDIA’s Rubin architecture. 1.6T delivers greater bandwidth density, simpler network structure, and lower latency to meet the demands of next‑generation data centers. With a phased upgrade approach, users can smoothly transition their infrastructure while taking full advantage of QSFPTEK’s 1.6T optical solutions for stable, high‑performance cluster connectivity.