A processor, regardless of whether it is a CPU or a GPU does nothing but to process data, which leads to need a memory that is feeding it. Unfortunately, with the passage of time, the distance between the speed of memory and that of the CPU has been growing, which has led to the implementation of techniques such as cache memory. We cannot forget about the latency between the processor and memory, which occurs when the interface between the RAM and the processor cannot grant or modify data fast enough.
However, we cannot measure performance in a general way, since each program or rather, each algorithm within each program has a different computational load. And this is where the arithmetic intensity term comes in. But, let’s see what it is and what it consists of, as well as other elements that have to do with performance on a computer.
What is arithmetic intensity?
Arithmetic density is a performance measure, consisting of measuring the number of floating-point operations that a processor executes on a specific section of code. To obtain it, the number of floating-point operations is divided by the number of bytes that the algorithm uses to execute.
How useful is it? Well, the fact that it allows in certain fields of computing where very powerful computers are needed for specific tasks to be able to have the best possible hardware system to execute the algorithms under the best conditions. This model is used mainly in scientific computing. Although it also serves to optimize performance in closed systems such as video game consoles.
In the case of making use of a highly parallelized hardware architecture, a high arithmetic intensity is required, that is, a low ratio between bandwidth and computing capacity from the moment in which the ratio between the computing capacity of said processors and the available memory bandwidth is high. Since it is required in many applications and especially in graphics that a calculation is processed several times and therefore a great computational power is required in comparison.
Algorithm performance and relationship with arithmetic intensity
When writing an algorithm, programmers take into account the performance of the algorithms they write in their programs, which is measured by the Big O notation, which measures the mean of operations with respect to the data. The Big O notation is not measured using any benchmark, but the programmers calculate them by hand to have a rough idea of the workload of the programs
- O (1): eThe algorithm does not depend on the size of the data to be processed. An algorithm with an O (1) performance is considered to have ideal performance and is unbeatable.
- O (n): the execution time is directly proportional to the size of the data, the performance grows in a linear way. It can also be that a
- O (log n): It occurs in algorithms that usually chop up and solve a problem by part, such as data sorting algorithms or binary searches.
- O (n log n): It is an evolution of the previous one, it is about further dividing the resolution of the different parts.
- O (n2): there are algorithms that perform multiple iterations because they have to query the data multiple times. Therefore, they are usually highly repetitive algorithms and therefore have an exponential computational load.
- O (n!): an algorithm that follows this complexity is a totally flawed algorithm in terms of performance and needs to be rewritten.
Not all algorithms can reach complexity level O (1), and some of them perform much better on one type of hardware than another. This is why domain-specific accelerators or processors have been developed in recent years that accelerate one type of algorithm over others. The general idea is to divide the algorithms into parts and treat each of them with the most suitable processing unit for its arithmetic intensity.
Ratio between communication and computing
The inverse case is the ratio between communication and computation, which is measured inversely to arithmetic intensity and therefore is achieved by dividing the number of bytes by the power in floating point operations. So it is used to measure the bandwidth required to execute that part of the code. The problem when measuring comes from the fact that the data is not always in the same place and therefore the RAM bandwidth is used as a reference.
It must be taken into account that this is not a totally reliable measure, not only due to the fact that the cache system brings the data closer to the processor, but also due to the fact that there is the phenomenon of latency where each type of memory Used RAM have different advantages and disadvantages and a result can vary depending on the type of memory used.
Today, when choosing memory in a system, not only bandwidth is taken into account, but also energy consumption, since the energy cost of moving the data is exceeding the cost of processing it. So you are opting for certain types of specific memory in certain applications. Of course, always within the costs associated with building a system and are not the same in a supercomputer as in a home PC.