Having the most powerful CPU or GPU is not enough today for games that use ray tracing to go smoothly. Even today with this technique being used in such a primitive way, the performance in Ray Tracing is not as expected. What phenomena are those that affect it?
How does the CPU affect Ray Tracing performance?
We have already told you about delimited volumes or BVH more than once in various articles, but today these are not generated by the GPU, but rather the games use a general BVH for the level, which allows it to position the different elements of the scene according to the position of the player. The most common use? Although it is the GPU that renders the scene, in each frame the CPU has to calculate the state of each of the elements.
That is why in games spatial data structures are usually used as a level map, which are used to locate the position of each element on the screen with respect to the player and the structure of the level. In this way, all the calculation efforts are located in what happens close to where their avatar is located and that the player can see at that moment.
There are tricks in various games that allow us to get out of the extremes of a map or even see everything that happens on the stage at the same time. When we do it then the performance of the game plummets. Well, the spatial data structures are generated by the CPU, since they are the tool used to discard from the scene everything that is not seen by the player and that therefore would be a waste of resources to have to calculate them at the same time.
Therefore, when the CPU creates the screen list to be calculated by the GPU, what it is going to do is create a screen list in said frame with all the objects in view of the player and it will be completely the same if it is mathematically correct as beyond the viewing space there are elements that also affect the scene.
At the same time it is also in charge of updating the BVH in each frame and delivering it renewed to the GPU. Since in the BVH the static elements of the level do not move, but the mobiles such as NPCs, characters or the player himself do so in each level and the CPU is in charge of updating the status of each one.
GPUs and Ray Tracing performance
GPUs are the processors for parallel processing par excellence, so much so that we have been using high-performance computing, or HPC, for years to speed up the parallel parts of different programs. Ray Tracing being one of the applications that most benefit from parallelization and therefore most benefit in appearance from the power of GPUs, however, we are still far from the ideal paradigm in which GPUs as we know them now are the ideal hardware for Ray Tracing.
In Ray Tracing, as in the other algorithm to generate 3D graphics, rasterization, what we do is take a collection of objects in a three-dimensional space and represent it in a two-dimensional space composed of a matrix of pixels. which is the screen. However, it has never been possible for a GPU to render a scene based on Ray Tracing in real time at the same speed and with the same performance as with rasterization.
Ray Tracing is not resolution dependent
Saying this may seem daring to many, after all the performance to render at FHD is not the same as at QHD or 4K. The variations in performance are disparate, but once you understand what we mean, you will understand why we talk about it not depending on the resolution.
The original Ray Tracing algorithm speaks of impacting the primary and secondary rays of the scene against each of the pixels in the scene. This requires a lot of power, so instead of using the original algorithm, what you do is use spatial data structures like BVH in order to speed up the algorithm. Since without the spatial data structure, Ray Tracing would require enormous computing power.
Hence the use of tree-shaped spatial data structures to accelerate GPU performance in ray tracing. However, games today have extremely complex scenes whose data structure used for the intersection cannot be cached by the GPU due to its size. For what it is done is to copy fragments of the structure in the cache of the GPU from the VRAM as they are needed.
The other trick is how you will know by now is the use of hardware in charge of calculating the intersection of each ray with the data structure. It is a calculation that is carried out continuously and recursively and would end up consuming a large amount of resources from the GPU, so with these units we see performance increases of between five times and an order of magnitude.
The problem of data localization
In reality we are not representing the path of light, because in real life light does not have time and because of its speed it is found in all places at once. So the data structure does not represent or what is the path of a ray of light in the scene, but rather parts of the scene that are affected by it.
What we do is organize the objects in the scene into sets that we call bounded volumes, in such a way that it is interpreted that if the ray does not pass through a bounded volume then it will not affect the objects, other bounded volumes, that are within its set.
The first thing we can think of is that it would be a good idea to divide the space of the scene on a regular basis. In such a way that one part of the scene takes care of a part of the GPU and another part of the scene another part of it. In other words, it would be like taking a map and delimiting it by quadrants, causing the information of each quadrant or even sub-quadrant to be stored by the different levels of the cache. This is a method that works very well with rasterization.
In ray tracing, on the other hand, we can find that if we divide a scene into several subspaces that a single ray of light passes through several subspaces, which can be assigned to different process elements in the GPU. That is why what is done is to take a totally different path, the scene is not divided according to the pixels that are affected by each ray, but rather each ray is cataloged according to the areas of the scene that it crosses.
The vehicle and map example
To better understand the situation, suppose we have to mark the layout of a delivery vehicle on a map. How can we do it? The first one is to create the map and draw the route. The second is none other than making a list of the areas on the map that the vehicle has passed through and then building its route more easily.
In the second case, we can completely reconstruct the route, turning it into a linear route. At the end of the day, we are only interested in where the vehicle has passed and not where it has not, so in the case of using a map much of the information would be superfluous and we could discard it by not contributing anything.
But the fact of creating an ordered list does not mean that those who drive the vehicle have to travel a large number of kilometers for their deliveries. This information is not taken into account and the same happens with the performance in Ray Tracing, each of the rays has a different computational cost, not only because they have a different route, but also due to the fact that we do not know the memory jumps that we will have to do.