News

ZTE Full Stack Computing Infrastructure Solution Builds Digital and Intelligent Foundation for the Whole Industry

2024-02-27

Under the trend of digital/networking/intelligence-based integration, the AIGC represented by ChatGPT have demonstrated powerful intelligence capabilities, and have profoundly changed lives and society. In addition, the incubation of hundreds of billions or trillions of parameter models and massive data brought by massive connections have generated massive demands for computing power, which has become an important driving force for the digital economy.

An interview with Mr Chen Xinyu - VP, GM of Computing & Core Network Products, ZTE Corporation

Question:

Computing power has a positive impact on our lives and society, and it improves information retrieval & dissemination, and promotes service innovation. How can we meet the ever-growing need for computational power?

Mr. Chen: ZTE has released an end-to-end computing infrastructure solution, which provides a complete set of computing, storage, and network/IDC solutions, achieves integrated deployment of full-stack software and hardware, and accelerates service deployment in the cloud. In terms of hardware, full-series servers provide high-quality heterogeneous computing power, and high-performance distributed all-flash storage achieves fast read and write of mass data, improving the training speed of large models. Lossless networks achieve zero packet loss and microsecond-level latency. Fully liquid-cooled modular pre-fabricated IDCs with PUE as low as 1.13. In terms of software, ZTE has built the AI Booster intelligent computing platform to maximize the utilization of GPUs through automatic parallel training, and greatly reduce the development threshold through visual development and adaptive parameter optimization.

Question:

We know that different applications require different computing power, such as general-purpose computing based on CPU and intelligent computing based on GPU. How to provide differentiated computing power solutions for different scenarios?

Mr. Chen: ZTE provides differentiated solutions for different application scenarios. In the general computing field, for general-purpose scenarios such as Internet cloud and telecom cloud, ZTE provides a series of general servers that support liquid cooling and provide customers with highly cost-effective and expandable general computing power. For the scenario of mass video storage, large storage servers are launched. For key application scenarios such as finance and scientific computing, ZTE launches the industry-leading 4-socket server. In addition, for high-density computing scenarios, ZTE launches the IceCube liquid-cooled cabinet solution to promote the green and efficient development of data centers. In the field of intelligent computing, ZTE provides a series of training servers, inference servers, and all-in-one AiCube training cabinet to meet the requirements of the central 10,000-card training pool, integrated inference pool of regional intelligent computing/intelligent computing, and all-in-one edge training.

Question:

The rapid development of generated AIs represented by ChatGPT has brought explosive growth in intelligent computing. But it is a huge challenge for operators and vertical industries. How to deploy intelligent computing efficiently for customers?

Mr. Chen: ZTE has launched full-stack intelligent computing infrastructure to meet the requirements of the central 10,000-card large-scale training pool, integrated regional intelligent computing/inference pool, and edge training and promotion. Based on the practical experience of large-scale model development and application, ZTE has built the AI Booster platform, a large-scale GPU training pool, and an all-in-one training cabinet, aiming to maximize GPU utilization and greatly reduce the development threshold.

In terms of maximizing the utilization of GPUs: High-performance AI servers, supporting 8 SXM/OAM GPUs per server; Through the end-network collaboration, ZTE provides a lossless network with zero packet loss and microsecond-level delay, reducing GPU waiting time. High-performance distributed all-flash file storage meets the high-speed read and write requirements of large-model intermediate files, checkpoint, and training data sets.
In terms of lowering the threshold of development, the navigation model is used to develop the assembly line, and the zero code is used for development. No parameter is configured manually through automatic super parameter optimization. AI booster supports multi-vendor GPUs, multi-AI frameworks and multi-model, also supports cross-platform migration.

ZTE has carried out in-depth cooperation with global operators and industry partners to jointly promote the transformation and upgrade of computing infrastructure, provide high-quality and efficient computing power for the entire industry, promote continuous technological innovation and commercial use, and accelerate the digital and intelligent transformation of the entire industry.

Information