Huawei Ascend 910C Rivals Nvidia H100 with 60% Inference Performance, Study Shows

Home » Technology » Huawei Ascend 910C Rivals Nvidia H100 with 60% Inference Performance, Study Shows
Huawei

The HiSilicon Ascend 910C, launched by Huawei as a variant of the 2019 Ascend 910 AI training processor, now shows just adequate performance for cost-effective training of substantial AI models. However, it achieves about 60% of the performance of Nvidia’s H100 when used for inference tasks, as per findings by DeepSeek researchers. Although not leading in performance, the Ascend 910C plays a key role in diminishing China’s dependency on Nvidia GPUs.

DeepSeek’s evaluations show that the 910C processor surpasses initial expectations in inference tasks. Moreover, the performance can be enhanced further through manual tuning of CUNN kernels. DeepSeek, with its robust support for Ascend processors and its repository for PyTorch, facilitates easy transition from CUDA to CUNN, streamlining the integration of Huawei’s technology into AI processes.

These developments indicate a rapid progression in the capabilities of Huawei’s AI processors, even amid U.S. sanctions and restricted access to TSMC’s cutting-edge processing technologies.

While Huawei and SMIC have somewhat caught up to TSMC’s 2019–2020 technology level, producing chips that rival Nvidia’s A100 and H100, the Ascend 910C still lags behind in AI training capabilities. Nvidia continues to dominate this area.

According to Yuchen Jin from DeepSeek, the main limitation of Chinese processors lies in their long-term training reliability. This issue is largely due to the comprehensive and mature integration of Nvidia’s hardware and software, which has evolved over twenty years. Although inference capabilities can be enhanced, consistent performance under extensive training demands further refinement in both the hardware and software components of Huawei’s offerings.

See also  Chinese Semiconductor Giants Smash Sales Records Amid Tech Boom

Similar to its predecessor, the Ascend 910C utilizes chiplet architecture, and its primary compute SoC boasts approximately 53 billion transistors. The original Ascend 910’s compute chiplet was manufactured by TSMC using its N7+ technology (7nm-class with EUV), whereas the Ascend 910C’s chiplet is produced by SMIC using its 2nd Generation 7nm-class process, referred to as N+2.

Future predictions by some experts suggest that as AI models increasingly adopt Transformer architectures, the significance of Nvidia’s software ecosystem might diminish. DeepSeek’s proficiency in optimizing both hardware and software could lessen the reliance on Nvidia, presenting a more budget-friendly option for AI enterprises, especially in inference tasks. Nonetheless, to be competitive on a global scale, China must address the issues related to training stability and further enhance its AI computing technologies.

TOPICS

Similar Posts

Rate this post
Share this :

Leave a Comment