This crucial element is in the transmission of data from one part of a processor to another, between different processors and between processor and memory. Although it may seem surprising, it can happen that due to certain elements the signal sent from one place to another ends up being distorted, in such a way that the zeros of the binary code become a 1 or vice versa.
The changes produced can have effects as simple as the data to be processed changing slightly or even the instruction to be executed. That is why when carrying out the physical implementation of a processor or memory, the certification team has to check point by point that there are no elements that cause these variations in the sending of data. As well as implement mechanisms in the hardware for error correction.
The first point in which it is important that there are no hardware errors when transmitting the data is in the storage media, since these are crucial to store the information and we are interested in 100% accuracy at the time. to store the data in binary format.
The resulting code on all architectures, however, is not always the same, since the set of registers and instructions, the disk format and other standards used will mean that the binary is not the same on all systems. In any case, we must start from the fact that every file that we store on our hard drive or SSD and regardless of its nature, will always be encoded in binary.
One way to correct errors is to use data redundancy, that is, make several copies of each data so that if one receives changes, at least it can be checked automatically, but this is inefficient as it requires three times more bandwidth and storage. So we have to find an alternative solution and the best known is to add a series of additional bits at the end of each data block to certify that no errors have occurred. These extra bits will be used for a series of algorithms to verify the stability of the information.
The hardware implementation
Hardware error correction systems are not only invisible to the user, they are invisible to the application and even to the operating system, since everything happens at the hardware level and in the space of communication interfaces, within which there is a series of mechanisms, usually with a fixed function, which are responsible for applying the corresponding algorithms to ensure that the data that goes from one point to another remains unchanged at all times.
Every error correction system needs two participating elements, on the one hand, the one who sends the data and on the other, the one who receives it. Let’s not forget that sending data also includes its operation. The transmitter will be in charge of sending the message together with a series of special bits that will be used for error correction in which the information that has been transmitted will be compared with that which has been received in order to maintain the integrity of the binary information in every moment
The hardware systems in charge of carrying out this task are not accessible by any software element, since they are located in the data transmission interfaces continuously and redundantly performing the same task over and over again. This does not require complex processors, but units that are as simple as they are efficient.
The easiest way to do this is with the parity bit, which consists of adding an additional bit to each byte and making it have a value of 0 or 1 depending on whether the amount of 1 in the rest of the byte is odd or even. . This does not fix errors, it just tells us that the binary has been modified, but we do not know how many modifications have been made and, therefore, it is not the ideal solution to correct them, since it is necessary for us to locate where the error is. However, there are ways to be able to execute the error correction with the parity bit and that at the same time will help us to understand its operation.
Bug fixes, practical example
However, a more elegant way to solve it is to use several parity bits within each data block and organize the information in an array, for example we can have a 16-bit or 4-byte code that can be represented in a 4 x array. 4. At the beginning of each line we can add a parity bit that counts whether the number of 1s in that byte is odd or even. At the end an additional byte of information is added, however what it does is count the number of 1’s vertically and not horizontally.
Thus, what the receiving system does is receive a 16-bit data block, of which 12 bits include information and 4 are parity bits. What is done is to compare the code that has been issued with the one that has been received, first checking the first and third columns, then the first with the second, the second with the fourth and finally the third with the fourth column. A similar process is then done with the rows.
The objective of this procedure is to check at what exact point the bit that has been changed is located, since it allows locating the exact column and row where the error is found. Once the receiving system knows in which position the information has changed, it is as easy as flipping the value to the correct one. Today much simpler error mechanisms are used than what we have explained to you (especially for performance), but it serves as an example so that you understand how hardware error correction works in a simple way.