Computer

What can we expect from the future NVIDIA RTX 50?

There are still almost two years left for the RTX 50, so talking about performance numbers and the like is nonsense, since not even NVIDIA itself knows at this point. However, analyzing what the architecture of the RTX 40 is like, the elements that they have not been able to add to the H100 chip and certain upcoming technologies, we can get an idea of ​​how they can be from a logical and realistic point of view.

Ampere Next Next

Long before the name Lovelace sounded like a code name for the RTX 40, NVIDIA published a somewhat curious roadmap, where the RTX 30 appeared named as ‘Ampere’, the RTX 40 under the name ‘Ampere Next’ and the RTX 50 as ‘Ampere Next Next’. Outside of joking, it’s not that they lacked scientists to name the new architectures, but the release of the RTX 40 has confirmed it for us.

What defines all GPUs are their cores, at NVIDIA these are called SM, which are also made up of sub-cores. Well, there is more difference from RTX 20 to RTX 30, than from RTX 30 to RTX 40, where the only thing that has been done is to make four changes to the ‘Tensor Cores’ and add a new RT Core with more capabilities. The rest? It has remained the same and is expected to do so compared to the RTX 40.

This, together with the jump from Samsung’s 8 nm to TSMC’s 4 nm, is what allows an RTX 4090 with 144 SM cores on the chip, although 128 active ones. Could the number of cores have been higher? Well yes, but we have to assume that the RTX 4090 uses the same amount of memory as the RTX 3090 Ti and, therefore, has less per available core.

NVIDIA Ampere Next-Next Roadmap

What changes do we expect to see in the SM of the RTX 50?

Outside of what is specified in the following list, the rest of the elements will continue as before.

  • Tensor Cores
  • RT Core
  • The planner or ‘Warp Scheduler’ that has not been renewed since RTX 30. It will do so to adopt certain CUDA9 changes.
    • This supposes changes in the first level caches: data and instructions. As well as in the local memory of each SM.

Rumors speak of a renewal of SM after a long time. We do not believe so and rather they confuse the implementation of concepts seen in the H100 chip that have not been implemented in the AD10x chips of the different RTX 40s.

NVIDIA H100 Hopper GPU

Will the RTX 50 have as much cache as the current ones?

No, we have not changed the subject or another article has been misplaced, but to understand what the NVIDIA RTX 50 can be like, we have to understand this change in the current generation, what it is due to and if there are signs of seeing it in the next generation .

  • To begin with, narrowing the memory bus compared to what is necessary so as not to end up having a larger chip, since the cost per wafer is higher and fewer mm² are available compared to the previous generation and unfortunately the memory interfaces they do not scale with chip fabrication nodes.
  • The second reason is that by cutting the bus we find ourselves with an anemia in the bandwidth. The best? Increasing the L2 cache so these are less likely to access video RAM, if anything they are capped in that case, both on chip space and latency.
  • The third reason is that placing more cores increases the demands on memory. Although we have GDDR6X available at 23 and 24 Gbps. If NVIDIA has not released them yet, it is due to the fact that at the moment they do not dare to launch a graphics card with a TDP of 600 W and this must be due to something, possibly due to the fact that they do not fully trust their refrigeration mechanisms for such high consumption.

RTX 4090 Render

So, can we know what the RTX 50 will be like?

At this point only NVIDIA knows, but we can make a number of predictions about the changes we’re going to see. The main one of them will be the use of GDDR7 memory and believe us that even relaunching an RTX 4090 with an interface for said memory we would see a considerable increase in performance in various sections today limited by bandwidth.

Although the 96 MB of L2 cache that the maximum configuration of the AD102 chip has may seem impressive to us, in the end it is quite anemic for the enormous amount of data that a GPU of such caliber as the RTX 4090 handles, which is why that it is more than possible in an eventual high-end RTX 50, let’s not see the same amount of L2 cache on the chip, but much less in order to adopt a greater number of SM cores, although what interests us rather is internal communication , which we do know will change.

CUDA9

One of the biggest differences between the H100 and AD102 is the fact that the latter is a CUDA 8.9 GPU, since it does not integrate a series of changes that NVIDIA has added to its GPU for high performance computing. And no, we are not referring to the SM cores that will always be different between the two markets, but rather the way in which they communicate with each other.

  • The Tensor Memory Accelerator or TMA is a unit that allows the transfer and direct communication between the local memories of the SMs. Not to be confused with cache and shared global memory, which is inside the chip and is not a cache either.
  • Thread Block Cluster, which consists of several SMs sharing both their L1 caches and their local memories to work as a group.

These changes, although minor, we will see in the SM of the RTX 50, yes or yes, from the moment these graphics cards are going to have to run said version of CUDA.
Tensor Memory Accelerator NVIDIA GH100

And what about the latest rumors?

There’s a lot of talk that we’ll see a 512-bit bus on the RTX 50, which we doubt due to the fact that that would make the chip bigger and more expensive than it can already be. The justification for it? The fact of disaggregating the GPU as AMD has done, but we believe that doing this only makes sense if your design if it is in a single chip exceeds the limit and for years they have tried not to exceed it, but that is not the problem anymore, but the associated cost of making such a large chip.

CUP-GPU

To this day, not even NVIDIA itself dares to do so as it did before. What is the most logical scenario? Well, the fact of going to a dual-chip configuration, in which both have access to the same memory. This would lead us to the addition of an L3 cache and separate memory controller, all in a configuration very similar to that of RDNA 3, but it would not be a copy since the company itself already talked about the idea in its document CUP. Something that NVIDIA would have access to, since it is a TSMC customer. In any case, the different elements must be on top of an Interposer to reduce the energy cost of data transfer.

Conclusions for now

In short, things would remain, for the moment, as follows:

  • RTX 50 adding things from the H100 that are useful for gaming and that were not added in the current generation.
  • They will seek to make it more efficient, to gain higher clock speeds.
  • The biggest boost will come from GDDR7, which will bring higher capacity and bandwidth, two elements where the RTX 40 is lacking.

Outside of all this we are not going to see any important change, what’s more, it is possible that the final number of cores will not increase as much or not even increase and the performance will come from the elements that we have told you about. The reason for this is that if AMD ultimately decides to release a graphics card with a higher number of Compute Units it is very unlikely that it will exceed the number of cores that the RTX 4090 has, so NVIDIA has no need for it.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *