Date: Fri, 5 Sep 2025 18:17:00 -0400
On Fri, Sep 5, 2025 at 5:36 PM Paul Caprioli via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> It's one AVX512 instruction. (Oh, I see in my previous email I referred to the floating-point instructions movaps and movups out of habit, but same idea.)
> Yes, you are correct that there is a performance penalty when a load (or store) is split across cache lines, and it can be significant.
> At the microarchitectural level, the cache delivers 64B-aligned chunks to the memory pipeline and the circuitry handles requesting two cachelines and putting the right bits into the register.
> So, in that sense, it's not a single load/store.
> But in the sense of assembly language, it's one instruction.
If it always compiles down to the same assembly/ML instructions...
can't you just stick `alignas(64)` in front of it if you want that
performance? It'd be the same code output either way, so this would
just be a thing you could do as an optimization if you need it, right?
Obviously if this is happening at a library interface, then adding
that alignment to that interface after the fact could break ABI. But
there's plenty of stuff that doesn't happen in library interfaces.
<std-proposals_at_[hidden]> wrote:
>
> It's one AVX512 instruction. (Oh, I see in my previous email I referred to the floating-point instructions movaps and movups out of habit, but same idea.)
> Yes, you are correct that there is a performance penalty when a load (or store) is split across cache lines, and it can be significant.
> At the microarchitectural level, the cache delivers 64B-aligned chunks to the memory pipeline and the circuitry handles requesting two cachelines and putting the right bits into the register.
> So, in that sense, it's not a single load/store.
> But in the sense of assembly language, it's one instruction.
If it always compiles down to the same assembly/ML instructions...
can't you just stick `alignas(64)` in front of it if you want that
performance? It'd be the same code output either way, so this would
just be a thing you could do as an optimization if you need it, right?
Obviously if this is happening at a library interface, then adding
that alignment to that interface after the fact could break ABI. But
there's plenty of stuff that doesn't happen in library interfaces.
Received on 2025-09-05 22:17:14