If we look at the hardware designed for artificial intelligence that is on the market, what stands out the most at the moment are the high-performance graphics cards such as the NVIDIA Tesla whose current models use its A100 GPU, on the part of AMD we have the AMD Instinct with its architecture CDNA. To this we have to add the inclusion of Intel’s AMX units in its future Xeon Sapphire Rapids CPU. The common point between all of them? They make use of HBM memory, therefore we are facing the fact that artificial intelligence as well as graphics rendering are applications that require high bandwidth.
Why does artificial intelligence require so much bandwidth?
Artificial intelligence algorithms, as with other types of algorithms, use a series of input data, but in these algorithms a new type of data is added, which is weights. To understand it, we must bear in mind that all AI processors are systolic systems with the aim of emulating the functioning of a biological neuron.
The neurons that exist in the brain are made up of dendrites, the soma, and the axon. The former are responsible for capturing the nerve impulses emitted by other neurons. These impulses are processed in the soma and transmitted through the axon that emits a nerve impulse towards the neighboring neurons. This is achieved in a systolic system where the input data from the neighboring ALU or processor, depending on the complexity, is the output data from the previous ALU or processor.
This at first glance should make us understand that since there is internal communication between the different processing elements, then the necessary external memory will be much less. But the matrices need a large amount of data to start working, since the data that the ALUs or processors contain at the ends have to come from the memory external to the chip as well as store the conclusion obtained in memory.
How is bandwidth calculated for AI?
Well, the answer is it depends on the case, since on the one hand we can be facing an algorithm that has a lot of internal recursion and barely accesses the external memory. It also depends on the type of neural network implemented, but generally the ideal in every processor is to achieve that at least one byte of bandwidth is transmitted per operation that the processor performs. A paradigm that is very difficult to obtain.
The fact is that there are applications in which the bandwidth is very small, but in others a bandwidth close to the theoretical ideal is required. Remember that AI handles an immense amount of data. Which leads to huge bandwidths that cannot be achieved with conventional DDR memory.