The RTX 4090 would be crazy: 512-bit bus and 128 MB cache?

Talking about a 512-bit bus for the RTX 4090 is big words, however, with its configuration of 144 SMs distributed in 12 GPCs, a 384-bit bus is short for us, even making the fastest VRAM types available in the market. Let’s not forget that the RTX 3090 and its Ti version with its consumption of up to 450 W taken from the 12-pin connector are nothing more than a GeForce Titan, but with another name.

From NVIDIA they have been leading the market for several generations and leaving behind AMD, which despite how excellent its RDNA 2 architecture is, is still behind in certain aspects compared to its rival. Which is going to bet on a monolithic configuration and not on chiplets, but no less aggressive for that.

512-bit bus on the NVIDIA RTX 4090?

Extraordinary claims require extraordinary evidence and this is not just saying, but such information can be found in the files that have been leaked from NVIDIA itself yesterday. Specifically, what a 512-bit bus on the RTX 4090 are the 16MB of L2 cache to which the image above refers, specifically in the part of the box in red.

In GPUs the amount of L2 cache is related to the bandwidth with the VRAM. Thus, in the current RTX 30 for each 64-bit GDDR6 controller we have 1 MB of second level cache. Well, the caches are always presented in multiples of 2 and if we were to divide the 16 MB by the 6 64-bit drivers for the 384-bit bus, it would not give us an exact division. On the other hand, if we do it for 8 controllers of the same type and, therefore, with a 512-bit bus, yes. This means that NVIDIA has doubled the amount of L2 per memory controller in its RTX 40 compared to the RTX 30 in each of the corresponding models.

As for the amount of VRAM, we would be talking in the case of the RTX 4090 of configurations of 32 GB in normal mode and 64 GB shared bus mode. The highest to date on a graphics card.

The equivalent of the Infinity Cache in NVIDIA?

The other interpretation of the information is that those 16 MB correspond neither more nor less than each memory controller, which would mean that with a 512-bit bus we would have 128 MB of L2 cache in total. Something that is not impossible and reminds us of the Infinity Cache in the RX 6000 for AMD PCs. However, it must be taken into account that going from a size of a few tens of KB to several tens of MB for the search of data in the caches can suppose an enormous latency in the search of data in said level that can go against the purpose of such a hierarchy.

In addition, the Infinity Cache in AMD is not in the second level, but in the third level precisely to reduce latency in searching for data. And speaking of third-level caches, not long ago we saw mention of large caches in the Composable GPU paper, where the image above these lines corresponds. In any case, a large amount of L2 cache would also not be ruled out since the A100 has 40 MB in total. Although we have to make one thing clear, if the L2 has that size then this would mean that NVIDIA would have added an intermediate cache level, an L1.5, but that is something that falls into the realm of speculation and that we cannot say.