Why does it currently cost more to move data than to process it on CPU / GPU?

In recent years there has been a paradigm shift in design, the cost of processing data has become really derisory due to the increasing rise in power of CPUs and GPUs, but this continuous improvement has a counterpart and is the cost of transporting data. data or information movement, which has become more expensive. Today it is much more expensive to move data to process an instruction with it than to execute the instruction itself.

When Intel made the leap from single-core to multi-core processors in 2005, their reason for this was performance per watt. Power consumption became crucial in the design of new processors, which allowed the advent of the era of portable computers, which from that moment were no longer a generation behind in terms of architecture. But, at that time the problem of the energy cost of data movement did not exist, but it began to become apparent as time has passed, to become the greatest nightmare of designers.

The problem with data movement is in the cabling

If you look at the memory hierarchy of a processor you will realize that the bandwidth of the different cache levels is different, being much higher in the first levels and lower in the successive levels of the hierarchy until reaching the RAM memory. . Why don’t we have processors with a bandwidth equal to the L1 cache in memory? For two reasons, the first of them is due to the interface, but the second has to do with the fact that as the distance in the wiring that communicates the processor with the different memories increases, the more energy consumed increases.

That is why the ideal processor would be one that had the RAM attached to the processor, something that at the integration and manufacturing level is impossible to do today. Since the manufacturing processes are too expensive and time consuming to mass produce. They are expensive in that they add new manufacturing steps that increase the margin of error and make fewer units come out. For this reason, systems in development that seek to solve this problem in an “exotic” way are focusing on the high-performance computing market, where the money moved by large government contracts and with large companies pay for the deployment of this technology. .

Regarding the domestic market, the biggest problem is in the development of new types of memory that have a consumption for transmitted data, measured in pJ / bit, increasingly low. And not only memory, but also for the intercommunication of different processors with each other. For this, they are looking for types of solutions, from optical interfaces that do not spend more with distance, to the use of 3DIC interfaces, either through pathways or silicon bridges. All of them are solutions not suitable for the domestic market, which means a possible stagnation in performance for the following years.

What parts of the hardware are affected the most?

Road Map GPUs 2021

GPUs are what we call flow processors or stream processors, the reason is that their performance of any contemporary graphics chip depends on the flow of data it receives to process. If we find that the bottleneck is currently in the movement of data and the energy they consume, then we find that as we increase the computing capacity of a GPU then obviously we are going to need an increasing volume of data. . The problem is that a graphics card is limited in terms of the power consumption shared between its VRAM and the GPU itself, which is provided by the power supply of the PCI Express port.

For this, what you are looking for is the addition of new ideas to alleviate the problem for the GPUs, one of them is the addition of a high-capacity last-level cache. With the ability to adopt the data lines discarded by the L2 cache and prevent the GPU from having to access the VRAM. The first to do so was AMD with its Infinity Cache.


The other problem we have in very low consumption devices, the LPDDR5, which is the latest standard, begins to be promoted as memory beyond these devices. Although they have managed to cut consumption by 20% compared to the LPDDR4X and reach an average of 4 pJ / bit, the higher bandwidths do not mean energy savings in terms of memory. Which poses a challenge for the development of future mobile SoCs. Which has led ARM to evolve its ISA to markets beyond PostPC devices and is why we began to see ARM cores in designs that go beyond what they had been until relatively recently.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *