Why does the cache affect the performance of CPUs and GPUs so much?

The first PC CPU to have a cache was the Intel 80486, since then we have seen how all processors incorporate cache memory in a hierarchy of several levels, and not only in the case of CPUs, but also of GPUs.

The main utilities are known to all. First, reduce the enormous latency between the processor and RAM, second, the power consumption of each of the different instructions, and third, reduce contention with access to memory, which can lead to higher latency, especially if we talk about several cores accessing the same memory address.

However, there is a way to measure cache performance, which is taken into account when designing new processors. Read on to find out how cache affects performance on both your CPU and GPU.

Cache Summary

Cache first level spllit

First of all we have to bear in mind that the cache is not an addressable space by the CPU or the GPU, when we talk about addressable space we mean that the CPU or GPU can point to a specific memory address where the next data or instruction to process. So the cache cannot be pointed by the CPU and therefore cannot be managed by it.

It can be said that the cache mechanisms work automatically and when a CPU core performs a search what it does is go through the different caches to find the specific data. When a kernel makes a change, then all the copies of that data in the rest of the caches are also automatically updated. In the same way, it is the cache mechanisms themselves that decide which copies of the contents of RAM are kept in cache and which ones are not.

And what the cache does is store copies of the memory closest to the lines of code that are being executed at that precise moment. This is because the code is most of the time sequential, so most of the time the next line of the program to be processed will be the one immediately following.

How is cache performance measured?

When the CPU or GPU needs to access data that is in memory, the first thing they do is access the different levels of the previous cache. So first it will search the first level cache, if it does not find the data then it will go down to the next level and so on until it finds the data that is being searched for.

When the data is not in a cache level then what we call a “Cache Miss” is produced and therefore it is necessary to go down to the next level of the hierarchy. Which causes there to be an added time in terms of latency to be added. On the other hand, if we have

To understand it, suppose we have a CPU or a GPU with 3 cache levels and we have found the data in the third level. So the search time in that case can be summarized as follows:

Cached data search time = 1st level cache seek time + 2nd level cache jump time + 2nd level cache seek time + 3rd level cache jump time + 3rd level cache seek time the third level cache

It must be taken into account that if the cache search time exceeds the direct search time in RAM then the cache design of said processor will be poorly implemented, since under no logical sense it is justified that it takes longer to search for a cached data than in memory. This is why we don’t usually see additional cache levels, as the different lag times added by each level add even more additional latency to access time.

Measuring the AMAT


AMAT stands for Average Memory Access Time, or average memory access time. It is an average due to the fact that not all instructions in a CPU or a GPU have the same latency and depend on memory in the same way. But at the same time it helps us to measure the performance of the cache of a CPU or a GPU.

To calculate the AMAT of any CPU or GPU the following formula is used:

AMAT = Hit Time + Miss Rate * Miss Penalty

What interests us is that the AMAT is low, since it measures the access time of the CPU to the data and therefore the latency when it comes to finding the data. As for the different values ​​of the AMAT formula to measure the performance of the cache, these are the following:

  • The first value which is the Hit Time, which is the time that the CPU or GPU will take to find the data in the cache. In this case it is important that the cache is small so that its journey can be carried out faster. Since the larger the cache then the longer it will take to make the journey. This is why the cache levels closest to the kernel are very small in size.
  • The second value is the Miss Rate, which is the percentage of times that the data is not in the cache. This is in contradiction with the Hit Rate, since the best way to find a cached data is to increase its storage capacity. The cache must also have mechanisms to know what data it has to keep inside, to give space to others that will have greater access in the short term by the CPUs or GPUs.
  • The third value is the Miss Penalty, this is the latency time it takes to access data if it is in RAM memory and not in cache. This is a huge time in terms of clock cycles. Since as is obvious, in the event that the data is in RAM and not in the cache, the search time provided in the hierarchy of caches prior to RAM must be added.

The performance of the cache will therefore depend on how the Hit Time or Miss Rate is optimized, since optimizing one section means damaging the other, architects have to decide which value they give more importance to when designing a new one. CPU or a new GPU.

Related Articles