This is how Intel CPUs manage to distribute the workload between Cores

Deepak Gupta

2 years ago

Windows 10 and the current versions of Linux do not have, for now, a kernel adapted to a new task scheduler from the blue giant called Thread Director, which we already talked about in depth in its corresponding article, but which at the same time is totally linked to protagonist of this.

So if only Windows 11 is ready for Alder Lake, what about the other OS? Well, as we say, they are not optimized and require a profound change if you want to take advantage of the new benefits of these CPUs. The problem lies with the operating system programmer, since this is the one who decides which of all the free logical processors can work with the software thread that is intended to be assigned.

The options in this case are only two:

The OS decides that the free logical processor will be the one with the maximum performance because it is the least time it will take with that kind of thread because the system is high performance.
The OS decides that the free logical processor has to be a high-efficiency one because for that kind of thread it is more optimal since the system is programmed to be as efficient as possible.

And here comes the explanation of Intel for the simple calculation of a task, since when there is this position of choice by the programmer of the operating system between two logical processors of the CPU, both are variable by way of i, j, k1 and k2.

The performance ratio would be calculated according to Intel as: Perfijkx = Perfikx / Perfjkx, while the efficiency ratio of this problem would be calculated as Energyijkx = Energyikx / Energyjkx. Therefore, the OS has a lot to say here since following the example above the programmer could determine in the hypothetical case that Perfijk1> Perfijk2.

So software thread k1 would go to logic processor j, while K2 would go to logic processor j. What if two software threads belong to the same ID? Well, very simple, since the OS programmer can choose several logical processors of performance or efficiency to carry out the assignments.

This is how the loading and balancing of software threads is correctly distributed by the Windows 11 operating system.

So what is EHFI?

EHFI is the abbreviation for Enhanced Hardware Feedback Interface or translated into our language would be Improved Hardware Feedback Interface and it is precisely the set of instructions that manage to guide the kernel programmer (Windows 11 for now exclusively) on the location of the loads of work that determines the OS between the logical processors of the system, that is, between cores and threads.

These instructions are loaded when the operating system starts and are hosted as non-paged memory. These instructions are stored as a table that is assigned to the lazy write memory so that once the load is finished and they are enabled by the OS and the Intel Task Scheduler (Thread Director), is when it comes into operation to distribute the OS workloads.

It must be added that logically there is a key component in this and it is nothing more than the limitation that the thermal and power restrictions impose for EHFI and TD in the SKU tables of each processor. Intel’s microcode fixes the mapping availability in these tables for EHFI and TD, so that if a value exceeds the threshold of either of the two values, it gives way to the processor’s security systems for those tasks.

But how does EHFI and TD know where to go to look in memory? According to the leaked patent of the system, there is a notification log indicator that is inside Thread Director, which detects when there has been a change in the information and when it has been written to the memory, so the OS will not request or write again in that region until TD gives the go-ahead to delete the relevant indicator.

The control to which EHFI is subject

As we have mentioned above, everything revolves around temperature and efficiency, not in vain it is one of the fundamental aspects of Alder Lake. The current sensors that implement both types of Cores are indispensable to tell the energy controller the current consumption that each core or thread is having.

According to the patents, the total energy level registered by the controller is comparable to the processor TDP and cannot be exceeded, where we now also know that Intel has equated PL1 to PL2 and the value of said TDP is the sum of both at the same time.

Understanding this, the controller actions must be divided into four sections to be reviewed by the controller:

Domain of the Cores
Domain of the graphics
Domino of interconnection
Domain of uncore

Therefore, the sum of them can never exceed the PL1 = PL2, but it can be managed independently so that the controller can assign more power to some than to others depending on how the load behaves. It should be added to this that logically the EHFI has an assigned time that sets the rate of updating of the table and the register of the controllers, so that there is a balance in the assignments.

Therefore and at this point, EHFI only has to communicate with TD to discern the type of kernel, its performance and the measure of performance and efficiency that it boasts, where in this last step the Windows programmer is already told 11 where the software thread has to send.