Date: Mon, 29 Dec 2025 17:56:58 -0300
On Monday, 29 December 2025 17:34:30 Brasilia Standard Time Andrey Semashev
via Std-Proposals wrote:
> > Also do note the next extension isn't documented: all Intel AVX512-capable
> > processors support 256-bit atomic loads and stores.
>
> This may be the case for existing CPUs, but is this guaranteed for
> future products? Was there some sort of official commitment to maintain
> this guarantee? Especially given that Intel E-cores are going to support
> AVX10, which has 512-bit vectors.
Neither is it guaranteed architecturally for the AVX2-capable ones for 128-
bit. The SDM section you've referred to says what current products do, but
fails to promise future ones will continue doing so.
However, knowing the microarchitectural details, it's likely to stay for good.
The issue here is that N-bit wide operations can be cracked into two ½N-bit
wide uops with acceptable performance, but cracking four-ways into ¼N-bit uops
(4x128-bit loads for example) aren't likely so. It would introduce an extra 3-
cycle latency for some operations and keep ports busy for far too long.
Keeping it running while the machine takes exceptions is also difficult. This
was done for the Sandybridge/Ivy Bridge AVX1 generation, the Gracemont & later
E-cores when those got AVX1 & 2, and I understand that's how AMD did both AVX1
& 2 at first, as well as AVX512 in their newest before transitioning to full
512-bit wide data paths. The E-cores on the upcoming AVX10-equipped Nova Lake
are probably going to be the same, but I haven't studied the E-core
microarchitecture in detail.
I don't see much need for 256-bit atomic loads and stores, seeing as we lack a
256-bit CAS.
via Std-Proposals wrote:
> > Also do note the next extension isn't documented: all Intel AVX512-capable
> > processors support 256-bit atomic loads and stores.
>
> This may be the case for existing CPUs, but is this guaranteed for
> future products? Was there some sort of official commitment to maintain
> this guarantee? Especially given that Intel E-cores are going to support
> AVX10, which has 512-bit vectors.
Neither is it guaranteed architecturally for the AVX2-capable ones for 128-
bit. The SDM section you've referred to says what current products do, but
fails to promise future ones will continue doing so.
However, knowing the microarchitectural details, it's likely to stay for good.
The issue here is that N-bit wide operations can be cracked into two ½N-bit
wide uops with acceptable performance, but cracking four-ways into ¼N-bit uops
(4x128-bit loads for example) aren't likely so. It would introduce an extra 3-
cycle latency for some operations and keep ports busy for far too long.
Keeping it running while the machine takes exceptions is also difficult. This
was done for the Sandybridge/Ivy Bridge AVX1 generation, the Gracemont & later
E-cores when those got AVX1 & 2, and I understand that's how AMD did both AVX1
& 2 at first, as well as AVX512 in their newest before transitioning to full
512-bit wide data paths. The E-cores on the upcoming AVX10-equipped Nova Lake
are probably going to be the same, but I haven't studied the E-core
microarchitecture in detail.
I don't see much need for 256-bit atomic loads and stores, seeing as we lack a
256-bit CAS.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Data Center - Platform & Sys. Eng.
Received on 2025-12-29 20:57:09
