It must be said that, for the moment, what Intel has shown of its next “Ponte Vecchio” server GPUs are preliminary data and, in fact, the product shown is still a mere prototype. However, and as we are going to tell you now, the performance figures that the silicon giant has achieved are, to say the least, overwhelming. Let’s see it.
Up to 45 TFLOPS on Intel Ponte Vecchio Server GPUs
As you can see in the slide above, Intel claims that its new GPU (based on A0 silicon as indicated) in its current state is capable of generating an output of more than 45 TFLOPS in FP32 calculations; To put this in perspective, you should know that NVIDIA’s A100 “Ampere” accelerator cards have “only” 19.5 TFLOPS under the same conditions, so literally Intel has achieved more than twice the performance of NVIDIA, and that, we repeat, we are talking about preliminary data for now because at this moment the A0 model is nothing more than a prototype.
It should also be mentioned that Intel is not only leaving NVIDIA (very) behind with these Ponte Vecchio GPUs, but also AMD– Their Instinct MI100 processors only offer 23.1 TFLOPS of FP32 performance.
It must be taken into account, on the other hand, that the prototype of this GPU for Intel servers was running at only 1.37 GHz, and also on a machine with only one of these accelerator cards and accompanied by an Intel Xeon processor « Sapphire Rapids’, also in the preliminary phase.
This is just a prototype, the business model will be better
The silicon with number A0 It is always the first batch of chips that leaves the foundry in which they are manufactured, and for this reason we are treating it as a simple prototype; Manufacturers typically use the first batch of chips produced to perform preliminary tests internally, and although they can sometimes send these samples to partners so that they can also carry out their own, it is clear that they are preliminary models and therefore also the performance figures they show are.
It is quite common for these chips to have a noticeably slower operating speed than commercial models will have, so while the performance numbers that Intel has shown now are quite impressive, they are lower than what they will eventually be. In fact, before we have told you that the prototype worked at 1.37 GHz during the test, but this figure is nothing more than a calculation (quite exact but not definitive) that has been carried out as follows:
During the presentation, Intel said that each of its Ponte Vecchio (OAM) server GPU packs generates 32,768 FP32 operations per clock cycle, and also says that each of the two stacks equates to 128 Xe cores. If each of the GPU’s vector engines offers 256 operations for each clock cycle, this adds up to precisely 32,768 FP32 operations for each stack of two. With these figures, we can calculate that 45,000 GLOPS divided by 32,768 FP32 operations for each clock cycle equates to a speed of 1,373 MHz, that is, 1.37 GHz of speed on the GPU.
With all this we want to tell you that in all probability the data that Intel is handling at the moment is much lower than what we will see once they have the product ready for the consumer market, so if at this time is leaving both NVIDIA and AMD far behind, in a few months when it is released these figures could be even more outrageous. And, remember, enterprise / server products from companies like Intel ultimately end up being reflected (at least their architecture and technology) in the consumer market as well.