In the world of servers, a type of RAM memory is used that is usually accompanied by the acronym ECC, in addition to what we call registered memory, which is presented in the form of RDIMM modules. So in this article we are not going to describe the hardware that you will normally find in a PC, unless you have a workstation with a CPU of the HEDT type or you work daily with servers.
What is Registered RAM or RDIMM?
Registered RAM memories are different from conventional ones due to the detail of having a register that is located between the memory module, called in this case RDIMM, and the system memory controller. This register is a memory that what it does is store the following information:
- Memory addresses to access.
- Commands (read, write or copy)
- Clock speed with which it communicates with RAM
It differs in its operation from conventional DIMM modules in the fact that while the data is sent through the classic pins for data communication, the commands that allow access to the memory are performed using this register. This is done to have greater signal integrity and lower the electrical load on the memory controller, one of the advantages derived from this being the ability to support more memory modules in the system.
However, RDIMM modules are slower to use than conventional modules, since an additional stage is added for access to the registry, which translates into much lower bandwidth and much higher latency in RDIMMs. However, the DIMM modules that are used in PCs are intended for use in desktop PCs where there are usually no more than two access channels, as is the case with workstations that are based on HEDT CPUs and also servers.
What is ECC in RAM?
We must bear in mind that RAM being volatile memory depends on electricity to maintain the data it stores and this makes it vulnerable to loss. Especially if we take into account that the ones we use in our PCs are of the DRAM type and therefore need a refresh of the load from time to time, this leads to them being vulnerable to magnetic and electrical interferences.
Usually the bits of each cell in a RAM memory do not usually change, since there is a sufficiently large distance in voltage between both values so that there is not a jump from one value to another, however as new generations are released memory are lowering the difference voltage and it is where error correction methods or ECC are necessary in order to maintain the information.
But to get an idea we are going to take a simple value in binary, 011101011, which in decimal is equivalent to the number 235.
- If the first bit changes, 111101011, then the value becomes 491.
- If it is the second bit that changes, 001101011, the value becomes 107.
- A change in the third bit makes the value 011001011 and therefore 203.
So you can see a simple change in the values of the RAM can change the value of a data in a memory address and we have to take into account that they not only store data, but also instructions and if we think about servers we have to have Keep in mind that these are designed to work 24 hours a day and 7 days a week, that is, perpetually. It must be taken into account that the DRAM is not going to remain stable, so that as time goes by the errors in it grow that can lead to a generalized crash of the system.
The method used to correct the errors in the ECC memory signal is called Hamming code, this is not any type of algorithm executed by the CPU, but is carried out in the memory controller itself that is in the processor. with support for this type of RAM. Its name is due to the American mathematician Richard W. Hamming who is the one who developed it.
How does it work? The idea of the Hamming code is to add redundancy bits for error correction by comparing the value of the parity bits. To do this, what is done is to count the bits from left to right and they are numbered, not according to the value they store, but their position is taken into account.
Those bits that in position correspond to a power of 2 (1, 2, 4, 8…) are marked as parity bits, while the rest of the bits are marked as data-only bits. The parity bits are doubled at the end of the bit string, hence they are also called redundancy bits, due to the fact that they are repeated. To check the integrity of the data, what is done is to compare them with each other to check the integrity of the signal that is stored in memory.
If the operation is a memory write:
- The CPU sends the information to the memory controller where it tells it which memory address it wants to modify and the data it wants to write.
- The controller generates the ECC code and sends it to memory.
- The data is stored in memory.
On the other hand, if it is read from memory.
- The CPU makes a request to a memory address to the controller that makes the request to RAM.
- The controller executes the Hamming code to carry out the check, if the data is correct then it is sent back to the CPU, if it is not correct, the error correction process is carried out automatically.
RAM modules with ECC
ECC memory modules have more memory chips despite having the same storage capacity, since some of the bits are used for error correction. It must be taken into account that in the ECC memory, parity is not executed in every byte. What you do is use blocks of 9 bits per byte, so in the end you end up having 64 bits to store data and 8 as parity bits.
This also means that the memory controller in the processor that communicates in the RAM must be able to generate the Hamming code and since ECC type memory is usually used in CPUs that are more advanced than desktop CPUs, this is means that supporting this type of memory requires not only special memory controllers, but also special motherboards that support this type of memory. Since a home PC will not be perpetually powered on and will have regular reboots no ECC memory is used
With this we come to the final part of the article which are the RDIMM modules with ECC, which obviously combine the characteristics of the two types of RAM and with it all their advantages and disadvantages. To date, given that all RDIMM memories are for server and HEDT and that market requires the Error Correction Code, there is not a single RDIMM memory module that is not ECC.