Computer

Intel’s new CPU architecture in depth: + 19% IPC and 16 Cores

What is presented by Intel is centrally encompassed in two very well differentiated parts: the so-called efficient cores or E-Cores and the performance cores, now called P-Cores. The company’s goal is to create a series of silicones that can scale multicore workloads and horizontally across the entire number of them. In addition, there are a number of very interesting news and data that the company has provided that are worthy of mention.

Intel Alder Lake, an architecture for all desktop segments

Intel’s dilemma was clear: it had to face AMD and Apple in two totally different sectors, the first in high performance and the second in the ratio of efficiency and performance per watt. Logically they have nothing to do with it, but the company really had the option to diversify or opt for what we are going to see next.

Efficient cores or E-Cores

Intel-Performance-and-Efficient-Cores

The microarchitecture of efficient cores or E-Cores is Gracemont, thought and designed for efficiency where the most striking thing will be the amount of frequency ranges that they will support, as well as the low voltages that will reduce total energy consumption.

Efficient-cores use a variety of technical advancements to prioritize workloads without wasting processing power and to directly improve performance with features that improve Instruction Per Cycle (IPC):

  • The Target Cache has 5,000 input branches which results in a more accurate branch prediction.
  • 64 kilobyte instruction cache to keep useful instructions close by without wasting memory or subsystem power.
  • Alder Lake has Intel’s first on-demand instruction length decoder that generates pre-decode information.
  • Intel clustered decoder that allows decoding of up to six instructions per cycle while maintaining energy efficiency.
  • A comprehensive back-end with five-wide and eight-wide mapping, 256 out of order inputs for Windows, and 17 execution ports.
  • Intel Control-flow Enforcement Technology application technology and Intel Virtualization Technology Redirection Protection.
  • The AVX ISA implementation, along with new extensions to support comprehensive Artificial Intelligence operations.

Intel makes a curious comparison, since the most logical thing was to measure the performance data of Alder Lake with Rocket Lake, which is the architecture it replaces, but they have focused on Skylake as such to affirm that their new architecture achieves in performance Single Core A 40% more performance with the same power or the same performance while consuming less than 40% of energy.

At the same time they claim that efficient cores offer a 80% more performance while consuming less power than two Skylake cores with their four threads or the same performance as these while consuming a 80% less energy.

Performance Cores or P-Cores

Intel-Architecture-Day-2021_Pressdeck_Final_EMBARGO-compressed-079

As for Intel’s P-Core or Performance Core cores, they arrive with the microarchitecture Golden cove And logically they are designed to achieve maximum performance, reduce latencies and improve in a single thread what Rocket Lake already did. Therefore, the changes that Intel cites are the following:

  • More extensive: six decoders (instead of four); eight wide µop cache (instead of six); six assignments (out of five); 12 execution ports (instead of 10).
  • Deeper – Larger physical log files; Deeper reordering buffer with 512 entries.
  • Smarter: improved accuracy in branch prediction; reduced effective L1 latency; bandwidth optimizations in predictive writing on L2.

P-Cores are the highest performing CPU cores Intel has ever built and push the limits of low latency and single threaded application performance with:

  • ~ 19% Geomean improvement across a wide range of workloads over the current 11th Gen (Cypress cove) at ISO frequency for general purpose performance.
  • Exposure to more parallelism and an increase in execution parallelism.
  • Intel Advanced Matrix Extensions, a breakthrough for next-generation AI acceleration, as well as deep learning inference and training performance. It includes dedicated hardware and new instruction set architecture to perform matrix multiplication operations significantly faster.
  • Reduced latency and better compatibility with large data and code applications.

Intel Thread Director

Intel-Thread-Director

It is probably the most remarkable in terms of technological innovations as a whole. Intel ensures that for performance cores and efficient cores to run smoothly with the operating system, Intel has developed an improved programming technology called Intel Thread Director.

Built directly into the hardware (obviously not software), Thread Director provides low-level telemetry about the kernel state and combination of thread instructions, helping and directing the operating system to put the correct thread in the correct kernel at the moment. Right.

Therefore, Intel Thread Director is dynamic and adaptable, adjusting scheduling decisions to real-time computing needs. This is important, since until now the OS was the one who made the decisions based on statistics and usage times, while Intel Thread Director is a game changer:

  • Use of hardware telemetry to direct threads that require higher performance to the correct P-Core at the time.
  • Monitors instruction mix, kernel health, and other relevant microarchitectural telemetry, helping the operating system make smarter programming decisions.
  • Optimizing Thread Director for best performance on Windows 11 by collaborating with Microsoft.
  • Expansion of the PowerThrottling API, which allows developers to explicitly specify quality-of-service attributes for their threads.
  • Apply a new classification EcoQoS which informs the scheduler if the thread prefers an E-Core (such threads are scheduled on efficient cores).

Intel Alder Lake: Intel 7 process and other general news

Intel-Architecture-Day-2021_Pressdeck_Final_EMBARGO-compressed-075

As we already knew, Intel is going to introduce its Intel 7 process to be general architecture and as such the efficiency has changed. Now with Alder Lake we can have a TDP of 9 to 125 watts, from desktop PCs to UltraBooks, each with its corresponding processors (not yet presented logically).

In its maximum configuration we speak of a processor (supposed i9-12900K) with 8 P-Cores and 8 E-Cores, where the curiosity is that only the first ones will have HyperThreading. Therefore, the thread count amounts to a maximum of 24 Threads.

As for its iGPU, we talk about the new Xe with nothing less than 96 EU, which is why it should outperform AMD APUs on this standout point. What about the CPI? In this case, Intel has compared Rocket Lake with Alder Lake and has encrypted the improvement in a thread in a + 19%, so again Intel will be ahead of AMD, at least until Zen 3+, where they will see each other once again.

Another novelty is its LL cache, technically called LL Cache, which obtains 30 MB and it is non-inclusive. This cache responds to a new I / O Fabric to make memory of the subsystem, which reaches nothing less than 204 GB / s with bus and dynamic frequencies to optimize energy. Instead, the so-called Compute Fabric for the cores and the iGPU achieves up to 1000 GB / s with optimizations in real time of the latency of the system, which will also be dynamic to save energy when required.

About his memoirs, he admits DDR5-4800, LP5-5200, DDR4-3200 and LP4x-4266, this as the maximum configuration and without overclock, that is to say, from these configurations it is considered overclock for the BMI. As we already knew and changing the registry, Alder Lake will be compatible with PCIe 5.0, being the first desktop architecture to do so, but there is something we must know.

PCIe 5.0 lanes are limited to 16, so NVMe M.2 SSDs will remain at PCIe 4.0 even though versions and models are released under the new interface. This can be remedied in the future with the following architecture, but this one will have that limitation, so the theoretical performance of the SSDs will continue to be around 8000 MB / s at best.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *