ISOCPP std-proposals List: Re: [std-proposals] D3666R0 Bit-precise integers

From: Paul Caprioli <paul_at_[hidden]>
Date: Fri, 5 Sep 2025 23:10:05 +0000

>From the "Arm Architecture Reference Manual for A-profile architecture" in section C3.3, "Loads and stores -- SVE": The unpredicated vector register load, LDR, and store, STR, instructions transfer a single vector register from or to memory locations that are specified by a scalar base register plus an immediate index value that is in the range -256 to 255, inclusive. When alignment checking is enabled for loads and stores, the base address must be 16-byte aligned. For predicated contiguous load and store SVE instructions: When alignment checking is enabled for loads and stores, the value of the base address register must be aligned to the memory element access size. Elements sizes are B, H, W, and D, for byte, halfword or half-precision floating point, word or single-precision, and doubleword or double-precision floating point. -----Original message----- From: Tiago Freire <tmiguelf_at_[hidden]> Sent: Friday, September 5 2025, 3:49 pm To: std-proposals_at_[hidden] <std-proposals_at_[hidden]> Cc: Paul Caprioli <paul_at_[hidden]> Subject: Re: [std-proposals] D3666R0 Bit-precise integers Ok, let's not forget that arm and risc are also supported CPUs. ----------- From: Std-Proposals <std-proposals-bounces_at_[hidden]> on behalf of Paul Caprioli via Std-Proposals <std-proposals_at_[hidden]> Sent: Saturday, September 6, 2025 12:35:44 AM To: std-proposals_at_[hidden] <std-proposals_at_[hidden]> Cc: Paul Caprioli <paul_at_[hidden]> Subject: Re: [std-proposals] D3666R0 Bit-precise integers When I said it's one instruction, I meant that vmovdqu32 is one instruction and can handle loads that are split across cache lines. The instruction vmovdqa32 is a different instruction, and it causes an exception if the memory address is not 64B-aligned. Similarly for floating point: vmovups is one instruction, and vmovaps is one instruction. The former handles loads that are split across cache lines. The latter causes an exception if the address is not 64B aligned. Yes, the compiler output I've seen (working in floating point) is always vmovups, even if the compiler knows the address is aligned. That's fine; there's no penalty for using vmovups when the address is aligned (on hardware less than a decade or so old). Yes, is a good thing for performance to align arrays by sticking `alignas(64)` in front. Or, use std::aligned_alloc() with a 64B alignment. The vmovups is faster and more efficient when the address is 64B aligned. -----Original message----- From: Jason McKesson via Std-Proposals <std-proposals_at_[hidden]> Sent: Friday, September 5 2025, 3:17 pm To: std-proposals_at_[hidden] <std-proposals_at_[hidden]> Cc: Jason McKesson <jmckesson_at_[hidden]> Subject: Re: [std-proposals] D3666R0 Bit-precise integers On Fri, Sep 5, 2025 at 5:36 PM Paul Caprioli via Std-Proposals <std-proposals_at_[hidden]> wrote: > > It's one AVX512 instruction. (Oh, I see in my previous email I referred to the floating-point instructions movaps and movups out of habit, but same idea.) > Yes, you are correct that there is a performance penalty when a load (or store) is split across cache lines, and it can be significant. > At the microarchitectural level, the cache delivers 64B-aligned chunks to the memory pipeline and the circuitry handles requesting two cachelines and putting the right bits into the register. > So, in that sense, it's not a single load/store. > But in the sense of assembly language, it's one instruction. If it always compiles down to the same assembly/ML instructions... can't you just stick `alignas(64)` in front of it if you want that performance? It'd be the same code output either way, so this would just be a thing you could do as an optimization if you need it, right? Obviously if this is happening at a library interface, then adding that alignment to that interface after the fact could break ABI. But there's plenty of stuff that doesn't happen in library interfaces. -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-09-05 23:10:12