Computer

Hardware focused on a task, is specialization the future?

As time goes by, programs need to go faster and faster, but at the same time CPUs and GPUs have become behemoths of complexity where it is very difficult to increase performance in a traditional way. The future solution for this problem? Domain Specific Accelerators

First of all, the throttle concept

Since the dawn of computing, support chips have been necessary to speed up certain techniques, originally these chips freed the CPU to perform a repetitive and recursive task. The clearest example was the graphics systems that made the CPU not have to waste most of the time drawing on the screen.

An accelerator is a support chip that goes further, because it not only frees the CPU from doing that task, but also speeds it up. That is, it is done by the task in a portion of the time of what the CPU would take. Which means that it is accelerated and has an impact on everything going faster. Hence the name accelerator.

Accelerators come in many types and designs, any type of hardware can be an accelerator: a microcontroller, an FPGA, a combinational or sequential circuit, etc. In recent years, a type of accelerator has appeared that will dominate the hardware in the following years, the Domain Specific Accelerators.

Domain Specific Accelerators, general definition

CPU-clock

In hardware, we have been using accelerators for a long time for different types of jobs and specific applications and therefore for a specific domain in particular. Today these domains can be graphics, Deep Learning, bioinformatics, voice and image processing in real time. There are many specific domains in which a Domain Specific Accelerator can solve the problem in a better way than a CPU, that is, in less time and consuming less.

The first thing that will come to mind is the question: is a GPU a Domain Specific Accelerator? No, it is not. DSAs take care of very specific tasks in particular, so a GPU is going to have several of these units. To make it more understandable, it must be taken into account that every task can be divided into several smaller ones that can be accelerated independently with this type of processors.

However, Domain Specific Accelerators are different from other options that exist in the market, since they exploit in their design a series of characteristics that place them between general-purpose processors and conventional accelerators. In other words, they do not reach the complexity of a CPU, but they are much more complex than the classic solutions, especially those based on a fixed function.

Specific domain, specific ISA

Source code

The first thing we have to bear in mind is that a Domain Specific Accelerator is not a CPU, although it also executes a program, its design is optimized for a specific solution and not a general one, for this a totally ISA is created around the DSA unit exclusive whose instructions, registers and types of data used are thought to solve in a short time certain instructions that a CPU would take many cycles to do.

The CPUs in their ISAs build instructions today from microinstructions that share a common data path through the instruction cycle. This means that due to the complexity of the instruction set, a complex instruction takes many cycles to complete. In a DSA we can create instruction loops and specific data paths for certain instructions that run faster. We can even create units in parallel that execute just that instruction recursively.

But the biggest advantage of this is that it allows us for certain applications to get rid of instructions that a general purpose unit that for our specific application is useless. And that in recent years they have ended up transforming the sets of registers and instructions of CPUs and GPUs into mastodons that occupy a large space.

Domain Specific Accelerators and memory access

Relative Energy Cost

Another improvement of the DSA has to do with memory, since like a microcontroller they use memory within the accelerator itself. Which is important, since the physical distance in which the memory is located influences the energy cost of the instructions.

Its memory configuration is the main advantage of accelerators, since each executed instruction consumes much less energy than in a CPU, in addition it avoids the problem of contention with memory. A DSA does not use the system RAM to perform its calculations, so it can work in parallel all the time.

In addition, because of the way they work, we can place them in a SoC or a similar structure and have the CPU communicate with them directly, all without having to go through RAM to get the data.

Hardware and Software go hand in hand in DSAs

Paper Circuit Design

Hardware design is not normally done for specific software, rather it is the software that is adapted to take advantage of the hardware. This is done through the use of specialized APIs at the software level where the software interacts with an abstraction of the hardware so that a program that is the driver is the one that performs the translation between the abstraction and the hardware.

In Domain Specific Accelerators the idea is that they can execute a program that runs on them as if it were a CPU, but given that they have a specialized set of instructions for a specific problem with the aim that the programs run faster under a DSA that under a CPU due to its specialized ISA and architecture.

Processor Design Test

Many of the hardware designs in the future will be DSAs for specialized problems. Which will be created locally in each company and institution to accelerate specific parts of one or more programs that have been developed. Its implementation will be through the creation of unique chips, its implementation in SoCs and even in FPGAs through languages ​​such as Verilog or VHDL.

So it is to completely reverse the relationship between hardware and software, since we go from designing software to take advantage of a specific hardware to designing hardware for specific software solutions.

Related Articles