It is no secret that one of the most important points regarding the performance of a CPU is its interaction with the RAM memory, although it is true that not all the processes that are executed have the same requirements in terms of bandwidth needs and latency, in the face of the high-performance computing market, it is important.
The Roofline model
To understand the relationship between the use of memory and the different algorithms we have to talk about the so-called Roofline model and it is based on measuring the arithmetic intensity of an algorithm. Which is measured by dividing the number of floating point operations necessary to execute it by the number of bytes of RAM bandwidth.
This is where we enter the two possible solutions for algorithms that require high arithmetic intensity and therefore high bandwidth, which are:
- The use of high-bandwidth RAM, being the HBM2E the most indicated for its low latency and high bandwidth. This solution is used by the ISA ARM-based Fujitsu A64FX CPU, as well as Intel’s future Xeon Sapphire Rapids.
- The considerable increase of the last level cache, which is the closest to the RAM and the one with the largest size. In the case of AMD, this is achieved by placing a SRAM chip through silicon or TSV pathways, which they have dubbed V-Cache.
The objective therefore is simply that the data flow is kept high enough in order to get the best possible performance.
Why are there no AMD CPUs that use HBM2E?
You will have observed that there is no EPYC processor or Threadripper that makes use of HBM2E memory in the road map, this is because AMD has decided to opt for V-Cache in order to give greater arithmetic intensity to certain algorithms that are usually run in the world of high-performance computing and it is very likely that we will see future CPUs not with a single SRAM memory stack like V-Cache but several.
However, both ideas can be combined with each other, on the one hand, a 3DIC structure and on the other a 2.5DIC with an interposer underneath. So we cannot rule out that AMD will use HBM2E memory in the future at the server level, but we cannot assure it from the moment it is not in its roadmap.
Creating a CPU that makes use of an interposer makes it extremely expensive to manufacture, as few customers use these specialized solutions. In addition, adding more steps in the manufacturing process raises costs even more and if we add the expensive HBM2E memory, then in the end AMD’s solution is much cheaper. In any case, we believe that Intel will do the same in its future CPUs, since it has the Foveros technology for it.