GPUs are processors just like CPUs, but they are different in that their parallelism is thread oriented. That is why it is said that a graphic processor is of the TLP type while a central one is of the ILP type. To this we must add that while the execution threads in the graph are not oriented to the processes but to the graphical primitives or the data.
In other words, each triangle, pixel or fragment has its own shader program for each stage of the 3D pipeline. Which is written in a high level language, for example HLSL if we are talking about DirectX or GLSL if we are talking about OpenGL or Vulkan. Although there are also some that are exclusive to certain systems, such as the PSGL of SONY’s PlayStation consoles. Well, these programs need to be compiled and, therefore, passed from source code to binary.
What is Shader Cache?
The problem comes when, contrary to what happens with CPUs where they are all x86, in the case of GPUs, the set of registers and instructions is not common, not between brands, but also between architectures of the same brand. This means that the code of the shaders has to be compiled while we play, causing stuttering and performance problems. Therefore, the generated binary is stored in a file that, when the game is loaded again, will be reloaded on the video memory of the graphics card, which we know as the shader cache or shader cache.
So if we update our graphics card these will have to be generated for each of our games again, if we delete it from the hard drive or the SSD as well. Moreover, this is one of the reasons why it is not good to judge a graphics card when we play a game for the first time. Especially if we want to measure their regular performance.
VRAM also takes up space
The GPU of your graphics card only accesses the system RAM through the PCI Express interface and using its DMA unit, so the shader cache will be loaded along with the visual information necessary to compose the scene from the storage unit to RAM and from RAM to video memory. Although with DirectStorage it is expected that the graphics hardware will be able to access the SSD directly, reducing access latency and the problem of having to depend on the CPU.
Many games are well developed and the shader cache takes up little space, but others not so much and it ends up taking up a large amount of graphics memory. What means that if the driver has an associated limit or the game itself, we suddenly find that the new compilations replace the old ones and the initial performance problems return.
In any case, we must bear in mind that if the game we are playing requires space in the video RAM, it will give preference to the rendering of the current frame. So it will seek to retrieve the shader cache stored in system memory temporarily. Also remember that the dumps to the storage unit are made continuously, since once the shader of a graphical primitive has been generated, it will not be necessary to do it again.
Ray tracing and the shader cache
In ray tracing the question most often asked by the GPU is of the Boolean type. Does this ray intersect this graph primitive? The answer is not always yes and, therefore, there is a condition that will happen or not depending on the circumstances. Therefore, when executing this shader for the first time, it will be necessary to compile and generate the cache shader for all possible results.
Shader Cache on Consoles
A video game console will always have the same components, from the first model that leaves the factory to the last. Although several years have passed, its specifications will not change. For example, today you can find older models than the oldest graphics card. One way to take advantage of its hardware immutability is to make use of so-called shaders already compiled into the game installation. This makes it unnecessary to generate a shader cache for each game, but it’s also a double-edged sword.
For example if you want to make a more powerful and backward compatible console you have the following options:
- Export the ISA to the new system’s GPU, making it work as a superset of the old one. This is what AMD has done in PlayStation 5 and Xbox Series with RDNA 2 architecture to smoothly run previous generation games designed for GCN.
- Integrate all the graphics circuitry of the previous system in the new system. This is something that Nintendo did on its consoles up to 3DS and SONY with its second PlayStation and some models of the third.
At the emulator level, what is done in many cases is to take the already compiled shaders and do the reverse process, generating an intermediate code that can generate a Shader Cache generated to work with your graphics card. However, it is not an exact science and you may encounter certain graphical glitches when running the games.
Optimizations for GPU manufacturers
Both NVIDIA and AMD usually include optimizations in the latest drivers for the most cutting-edge games. These not only include the optimal settings for higher performance, but also the already optimized shader cache for each game and better specs than the base game. And it is that both manufacturers are interested in their most advanced graphics card having the maximum possible performance in the most powerful game of all that exists today.
So these leave money to the different development studios, who optimize the shader programs for certain graphics architectures and generate a shader cache that works better than the one that comes with the base game. This is done to encourage the purchase of new models and create a program obsolescence, which is necessary for there to be a flow of consumption that sustains the industry. That is why many games with hardly any visual changes to their previous installments end up having worse performance.