The trend in hardware is that motherboards are getting simpler compared to what they were a few years ago. For a few years now, people have been talking about certain technologies that would allow RAM and CPU to merge into the same entity. Will we see it in the future or instead are there limitations to it?
Is it possible to integrate RAM inside the processor?
One of the concepts regarding the performance of processors is that the closer the memory is to the units that execute the instructions, the better performance, since it takes less time to solve the instruction. The reason? Simple, the electrical signal has less distance to travel. So the ideal would be that processor and memory were on the same chip.
However, we find ourselves with the fact that there is RAM memory, which is the support memory that every processor uses to execute programs. Which, as you already know, is outside the chip and, therefore, does not bring the ideal performance. As if that were not enough, this distance increases energy consumption when transmitting data.
Bearing in mind that RAM memory is a fundamental piece of any architecture and we cannot delete it, we have wondered what would happen if we removed this component from view. That is, if we integrated it into the processor. In any case, chips of this type already exist, since microcontrollers have precisely the memory with which they work inside the chip, but its function is very limited.
Advantages of integrating RAM inside the processor
First of all, we are going to talk about the advantages of integrating RAM into the processor. We put it on conditional because we are not going to take into account the technical limitations that prevent manufacturers from doing it so far. About which we will enter later. So for the moment we are going to limit ourselves to what the concept would be in theory and what advantages it would bring with it.
BMI would disappear
The integrated memory controller would end up disappearing, since it is in charge of managing the accesses with the external memory and, therefore, its space would be occupied by the RAM memory inside the processor. In the same way, the entire part of the periphery, which is the interface in charge of communicating with the system’s memory externally, would also cease to exist as said memory does not exist in theory.
Processor performance would increase
Obviously, by completely reducing the access time to the data and instructions, all of them would be executed in less time and obviously the performance of these is measured by the number of instructions that can be executed in a given time. Why? Well, due to the fact that this translates into executing the programs faster or, failing that, being able to manage several of them at the same time.
Consumption would decrease
Transmitting a bit of data within the processor currently has an energy cost of 0.1 pJ/bit, doing it to DDR5 has a cost of 7 pJ/bit, that is, energy consumption would be reduced up to 70 times. as far as data communication is concerned.
The cache would not disappear
The job of the cache is to have a temporary copy of the information in RAM, but inside the chip, so that it takes less time to get to it. If we put the random access memory inside the chip, then we would find that we would get to it with less latency and thus higher cache levels would become unnecessary. So in theory we would load the cache utility. However, this is not entirely the case and we would encounter a problem.
Given that the caches have the second utility of reducing blockages in memory accesses by having local copies in each core at the lowest levels of the same, if there were no cache we would have a congestion problem in the data bus due to the enormous number of requests that would be. That is, having a memory pool for all processors and without a local backup, performance is burdened by excess accesses.
So we would end up having at least one level of cache, the first level, which is usually divided into data and instructions and is the one with the least latency. It would be possible to add an intermediate cache level, already shared between several cores, but designers would have to make sure that its latency was lower than the processor’s integrated RAM.
So why isn’t it done?
If you are wondering why we have explained the cache memory to you, it is because one of the reasons is the lack of space on the chip. When manufacturing a processor, as you already know, whoever wants to sell it has to take into account how many wafers are available, how many processors per wafer and at what cost. Take a DIMM or SO-DIMM of RAM memory and look at all the chips there, do you think that would fit inside the processor? No it can not be done.
That is why the methods used to manufacture RAM memory and processors have differed over time, to the point where today foundries like TSMC that have specialized in processors do not stand out when it comes to manufacturing memories and processors. vice versa. It is very difficult to see a process that allows combining both types of chips at the same time, unless there is a chip under development for the same that economically justifies it, which is rare to see in home systems.
Alternatives to RAM in the processor
The alternatives are nothing other than to use the new methods to build integrated circuits based on vias through silicon, either placing the processor side by side with the RAM and both mounted in an interposer, which is known as 2.5DIC and which we have already seen in systems with HBM memory. The other solution is to stack RAM chips on top of the processor or 3DIC. The problem? The extra costs of these methods are so high that they are not viable for the domestic market, although the memory offered in both cases has lower latency and consumption than conventional RAM.
In the market for server processors we will soon see configurations with HBM memory, the problem is that they are fixed in terms of size and if a larger amount of RAM is needed, it will be necessary to throw out conventional RDIMM sockets. Which completely destroys integrating RAM into the processor to save us from having to implement it in the system. In any case, these solutions do not give a lower latency or consumption in terms of data access than a complete integration of the RAM in the processor, since we are not really doing that, but bringing the RAM closer.