With two exaflops of efficiency, the Intel-powered Aurora supercomputer is anticipated to beat the AMD-powered Frontier supercomputer, presently the quickest on the planet, and take the lead on the High 500 record of the quickest supercomputers. Nonetheless, attributable to Intel’s continued delays in delivering the {hardware}, Aurora has not but submitted a benchmark to the High 500 committee, so it did not make the record introduced right now. Intel shared new particulars concerning the system right now and introduced on the ISC convention that it has delivered ‘over’ 10,000 operational blades for the Aurora supercomputer — however with the caveat that these aren’t the precise blades wanted for full deployment. We’ll cowl the small print beneath.
Nonetheless, Intel says the system might be absolutely operational later this 12 months and shared benchmarks with Aurora going head-to-head in opposition to AMD and Nvidia-powered supercomputers, claiming a 2X efficiency benefit over AMD’s MI250X GPUs, and a 20% acquire over Nvidia’s H100 GPUs.
Intel says it has delivered the silicon for ‘over’ 10,000 blades — each the fourth-gen Sapphire Rapids Xeon chips and Ponte Vecchio GPUs — to the Argonne Management Computing Facility (ALCF).
Nonetheless, Aurora is designed to function with Intel’s HBM-equipped Sapphire Rapids “Xeon Max” chips, which have been perpetually delayed. Resulting from these delays, Intel initially started transport ALCF the non-HBM Sapphire Rapids chips, and the ability started populating Aurora with the usual non-HBM Sapphire Rapids as a stop-gap measure.
Intel is now offering the sooner HBM-equipped Xeon Max chips to ALCF, however not all the 10,000 blades it promotes as being delivered have the Max chips below the hood. We inquired with Intel, and firm representatives confirmed that not all the blades are outfitted with the ultimate Xeon Max silicon. The corporate tells us that roughly 75% of the blades comprise the ultimate Xeon Max revision of the silicon. Presumably, that’s the bottleneck that’s holding the system again from submitting a benchmark for the High500 record.
The system consists of 166 racks with 64 blades per rack, for a complete of 10,624 blades, so the ‘over’ 10,000 delivered blades are probably sufficient for the system to be operational — simply not at full efficiency.
Intel additionally shared extra specs for the Aurora supercomputer, together with detailed specs that you could see within the slide above. With 21,248 CPUs and 63,744 Ponte Vecchio GPUs, Aurora will both meet or exceed two exaflops of efficiency when it comes absolutely on-line earlier than the top of the 12 months. The system additionally options 10.9 petabytes (PB) of DDR5 reminiscence, 1.36 PB of HBM hooked up to the CPUs, 8.16 PB of GPU reminiscence, and 230 PB of storage capability that delivers 31 TB/s of bandwidth (different fascinating particulars are included within the slide above).
Intel additionally revealed that Aurora would start executing generative AI workloads on a number of workloads. The ‘Aurora GPT’ massive language mannequin might be science-oriented and have 1 trillion parameters with Megatron and DeepSpeed underpinnings. Intel supplied the next summation of the challenge:
“These generative AI fashions for science might be educated on basic textual content, code, scientific texts and structured scientific knowledge from biology, chemistry, supplies science, physics, drugs and different sources. The ensuing fashions (with as many as 1 trillion parameters) might be utilized in quite a lot of scientific purposes, from the design of molecules and supplies to the synthesis of information throughout tens of millions of sources to recommend new and fascinating experiments in methods biology, polymer chemistry and vitality supplies, local weather science and cosmology. The mannequin will even be used to speed up the identification of organic processes associated to most cancers and different ailments and recommend targets for drug design.”
Intel additionally teased just a few benchmarks from the Sunspot system, a smaller two-rack model of Aurora with 128 whole nodes. Intel in contrast Sunspot’s efficiency in opposition to extrapolated numbers that characterize ‘similarly-sized’ Polaris supercomputer with Nvidia A100 GPUs, and the Crusher supercomputer that is powered by AMD’s MI250X GPUs. Sadly, Intel didn’t present take a look at notes or particulars of those configurations, so take the outcomes with greater than the same old grain of salt.
In a take a look at of a single node in a reactor prediction workload, Intel claims its system is 45% sooner than the Nvidia contender and 12% sooner than the AMD system. Turning to scalability metrics, Intel claims that by normalizing the variety of whole GPUs used within the take a look at methods to 96 GPUs (the AMD and Nvidia nodes have 4 GPUs apiece, whereas the Intel system has six per node), Sunspot delivers greater than twice the efficiency of each the AMD and Nvidia methods within the Monte Carlo workload. For 90 nodes within the NWChemEx workload, Intel claims it’s 72% sooner than a 90-node Nvidia-powered Solaris system.
The Aurora supercomputer was first introduced in 2015, with a predicted end date in 2018. Again then, the system was designed to make use of the Knights Hill processors that were later canceled. The system has seen quite a few redesigns and reschedules within the years since, with the brand new Aurora being introduced in 2019 with one exaflop of efficiency to be delivered in 2021. Yet one more rescheduling in late 2021 claimed the system would ship two exaflops upon completion, which is now slated for later this 12 months.
The lengthy and winding highway continues, but it surely does lastly seem that the top is a minimum of in sight. Intel tells us it should ship all the Xeon Max processors to complete the system quickly, and that the system might be full and submit its first High 500 benchmark earlier than the top of the 12 months.