How Many 400G OSFP SiPh LPOs in Huawei Al CloudMatrix 384 Super-node?
On May 14, 2025, the "2025 Chip and Optical Forum" hosted by HiSilicon and organized by ICC was held at the Crowne Plaza Wuhan Optics Valley. The conference focused on developing intelligent optical interconnection technology, shared cutting-edge achievements, and explored industry trends. At the forum, Huawei Cloud, HiSilicon, iFlytek, etc., all mentioned the computing power super-node of AlCloudMatrix 384.
What is AlCloudMatrix 384 Super-node?
The Huawei CloudMatrix 384 super-node is a key technological breakthrough of Huawei AI computing infrastructure, mainly used to solve the communication efficiency problem of large-scale AI clusters. It was released on April 10, 2025 and has been launched on a large scale in the Wuhu Data Center. 384 means that the supernode contains 384 computing chips, that is, 384 Ascend 910c computing chips, referred to as CM384.
As Artificial Intelligence (AI) has become the key power driving industry transformation, how to enable AI to walk out from the laboratory to industry has become a "must-answer question" in the development of the times. CloudMatrix 384 supernode is the answer given by Huawei Cloud.
What Makes CM384 Different?
CloudMatrix 384 super-node uses 6912 x 400G OSFP silicon photonic (SiPh) Linear Drive Pluggable Optics (LPO) optical modules and 3168 fibers, connecting 384 Ascend 910C computing chips through a complete mesh interconnection architecture. Unlike the all-electric communication solution adopted by NVIDIA's NVL72 super-node, Huawei fully utilizes the high-bandwidth and low-latency features of optical transceiver technology to support long-haul transmission, breaking through the physical limitations of traditional electrical communications and achieving 1.7 times the computing power and 3.6 times the HBM storage. The optical transceivers ratio reaches 1:18.
How Many 400G OSFP SiPh LPOs Does AlCloudMatrix 384 Super-node Use?
3072 x 400G OSFP SiPh LPOs are Deployed in Computer Servers
In iFlytek's speech, SemiAnalysis's analysis data on CM384 was cited. We then extracted the optical module information from this data.
The CM384 super-node contains 48 computing servers (chassis). Each server has 8 x 910c computing chips, 56 x 400G silicon photonic LPOs for scale-up, and 8 x 400G silicon photonic LPOs for scale-out. The 48 servers have a total of 2688 scale-up 400G SiPh LPOs and 384 scale-out 400G SiPh LPOs, totaling 3,072 x 400G OSFP silicon photonic LPO modules. The table below also includes QSFP112 200G optical modules.
Table 1 - 400G OSFP SiPh LPOs and other components deployed in computing server chassis for CM384 super-node
3840 x 400G OSFP SiPh LPOs are Deployed in Scale-up and Out Switches
The CM384 super-node also contains scale-up and scale-out switches. The 4 scale-up switches have 2688 scale-up 400G OSFP SiPh LPOs and 1152 scale-out OSFP 400G SiPh LPOs, totaling 3,840 x 400GBASE OSFP silicon photonic modules.
Table 2 - 400G OSFP SiPh LPOs and other components deployed in scale-up and out switches for CM384 super-node
A Total of 6912 x 400G OSFP SiPh LPOs are Deployed in CloudMatrix 384 Super-node
Figure 1 - CM384, with 6912 x 400G OSFP silicon photonics LPO optical modules, a ratio of 1:18
If we calculate the total number of OSFP 400G optical modules involved here, the CM384 supernode includes 384 x 910c computing chips and 6912 x 400G silicon photonics LPO optical modules. The ratio of 400G OSFP SiPh optical modules to computing chips is 18x.
Table 3 - 384 x 910c computing chips and 6912 x 400G OSFP SiPh LPOs=1:18
Transceivers Share 68.2% Failure Rate in iFlytek Ten Thousand Level Cluster
The above is about the super-node based on a 384 computing server chassis. iFlytek also provided the failure rate data of its tens of thousands of clusters, which have been running for one year. It has hidden the absolute value of the coordinate axis, so let's look at the relative data.
Figure 2 - The iFlytek Ten thousand level cluster has been running for one year, and the optical module has the highest failure rate.
Figure 3 - The failure rate of optical modules accounts for 68.2%, the primary source of failure.
What iFlytek provides is the running data of tens of thousands of clusters. Huawei Cloud has estimated the impacts of optical transceivers’ reliability on training in a larger-scale network.
If the scale of the computing power cluster is further expanded from 10,000 chassis to 100,000 chassis, the number of 400G SiPh LPO optical modules required for complete configuration is 235.9296. According to the existing 10,000-chassis failure rate, there will be seven flash disconnects per hour, and each flash disconnect will force the training time to be extended, increasing the training expense.
Dirty Brought by Optics Connectivity is the Primary Cause For High Transceiver Failure Rate
According to Huawei's statistics, the failure rate caused by the transceiver itself is relatively low. However, the failure brought by dirty optical connectivity is the primary. Detecting the location of Fresnel reflection peaks through the OTDR time domain reflection, the dirty position of the active connector can be determined, reducing the failure rate by 70% to 80%.
Figure 4 - Optical module failures caused by dirty connections account for 64.7%.
In a larger-scale computing power network, the reliability of optical transceivers is vital.
A Case Study - QSFPTEK Transceivers'Failure Rate In a Three-Year Round Medium Data Center Deployment
The optical module failure rate discussed in the above forum is the proportion of optical module failures in ultra-large-scale cloud computing network training. Let us look at the failure conditions of various components in enterprise medium-sized data center applications.
QSFPTEK has 10+ years of R&D experience and industry-leading experience supporting global enterprise business success in optical transceivers. QSFPTEK has fulfilled 31,000+ orders with 1,000+ successful projects and is favored by 300+ SMBs/Telcos/MNOs/DCs in over 200 countries. According to the average statistics collected from our medium-scale data center clients over three years, the 100G/200G optical transceivers (provided by QSFPTEK) failure rate is 19.4%, and the 40G and other medium-to-low speed optical modules failure rate is 16.3%.
Figure 5 - The failure rate of QSFPTEK 100G/200G high-speed optical modules accounts for 19.4% in a three-year round medium data center deployment
QSFPTEK Reduce Your 100G Optics Expense from 18.57% to 69.82%
The table below lists the main 100G transceiver models priced from QSFPTEK, FS.COM, Fluxlight, and Naddod, some of the primary compatible optical transceiver brands popular in the Google SERP. As we see, QSFPTEK has a significant advantage in terms of price.
If you take the lowest price and the highest price from the three brands other than QSFPTEK and use them as the benchmark to calculate the discount ratio of QSFPTEK's price, you will find that QSFPTEK can save you at least 18.57% and up to 69.82% of your 100G module expenses.
Table 4 - The 100G optics price per piece by QSFPTEK, FS.COM, Fluxlight and Naddod
While cost is important, reliability of optical transceivers is also vital. Each module from QSFPTEK has undergone a complete testing process in our lab, from standardized production line, rigorous performance test to On-Site compatibility test. Welcome to check our quality assurance program.