C++ Logo

std-discussion

Advanced search

Re: Architecture specific library extensions

From: Julien Villemure-Fréchette <julien.villemure_at_[hidden]>
Date: Tue, 02 Sep 2025 22:41:13 -0400
> Having a single way to do this identification should just be a part of the standard, period.

It seems too much, there is way too many assembly mnemonics and all their variants to maintain, and lots of them get added over time. C++ is intended to abstract assembly language, so adding a function that just wrap an assembly instruction or an intrinsic is contrary to this intent.

Since C++ is an abstract language, then it could provide functions that implement an abstract operation that would work over different platforms; the function's implementation could be intented to use
compiler intrinsic or inline assembly or whatever is most efficient for the target platform (-march=... -mtune=...). For instance, AES computations.

On August 30, 2025 4:41:06 a.m. EDT, Tiago Freire via Std-Discussion <std-discussion_at_[hidden]> wrote:
>> What kind of compiler directives? Are we talking about preprocessor directives?
>
>Yes. I used the wrong the term.
>#if defined(__ARM64__)
>
>> Can you show some concrete example of the kind of code you want to write?
>
>Sure.
>This kind of thing pops up everywhere
>ex. One of my proposals for overflow arithmetic I want to be able to provide a sample implementation so that users can play with, which looks like this:
>https://github.com/tmiguelf/std_prop_overflow/blob/master/reference/std_ovf/include/numeric_ovf.hpp#L10
>Issue being that some architectures have explicit support for what I want to do and can optimize by rolling out a single instruction, others don't and I have to role out a less efficient algorithm to the same job.
>
>Here's another example, let's say I want to optimize hashing algorithm: https://github.com/tmiguelf/crypt/blob/master/Crypt/src/hash/crc.cpp#L374
>Or if I'm providing an AES implementation.
>I want to be able to provide an implementation that is both efficient and work across multiple platforms.
>If a particular architecture has a hardware extension, I want to know that my target application is built for that architecture, I want to be able to pull the corresponding CPUID to test for the feature at runtime an decide to either offer the hardware implementation or a fallback software one.
>
>Having a single way to do this identification should just be a part of the standard, period.
>
>
>> This seems a bit unreasonable. Keep in mind that compilers nowadays sometimes come with builtin vector types for SIMD, so this old style of __mm256i or whatever is not a great fit for standardization; it's too low-level. AVX intrinsics often require you to reinterpret these vector types as something else, like when you do a bit-masking operation on what was a bunch of f64s just a moment ago. This doesn't fit well into C++'s strict aliasing.
>
>It's not unreasonable, I want to do low-level stuff. I should be able to do it.
>Every single serious project I've worked on at one point or another as need something like this.
>Keeping only to the things that standard C++ dictates doesn't allow to create very useful applications, it must allow to support theses things without someone saying "it's UB so monkeys can fly out of your computer case".
>
>
>> Isn't it obvious that it comes with a huge maintenance burden if you need to maintain a set of intrinsics for all sorts of architectures, many of which depend on various CPU extensions? We operate on a three-year cycle and this seems very difficult to keep up. Which ones do we maintain? RISC-V, x86_64, ARM, WASM at the very least, and all their possible extensions?
>> Remember that we're on a three-year cycle, so if Intel releases a new CPU extension in 2026, tough luck, you're waiting until 2029 until that intrinsic is in a C++ standard.
>
>Well, this wouldn't be part of the "C++ standard", it would be part of a "C++ standard extension". The "extension" word is doing a lot of the heavy lifting.
>It would be an appendage, doesn't have to follow the same release cycle. The different architecture subgroups could even work independently.
>
>Even "if <CPU vendor> releases a new CPU extension in <end of cycle>, tough luck, you're waiting until <next cycle> until that intrinsic is in a C++ standard [extension]". "Tough" is an acceptable solution, people can wait until the next cycle, and if they want to have it sooner, they can use a proprietary implementation while they wait for it to become standard.
>
>
>> If it's not simply one intrinsic per instruction, this sounds like an ungodly amount of bikeshedding over which instruction wrappers to [standardize]
>Possibly. It will take time. But would allow for a much better playing field for low-level development.
>
>
>
>-----Original Message-----
>From: Jan Schultke <janschultke_at_[hidden]>
>Sent: Saturday, August 30, 2025 06:49
>To: std-discussion_at_[hidden]
>Cc: Tiago Freire <tmiguelf_at_[hidden]>
>Subject: Re: [std-discussion] Architecture specific library extensions
>
>> 1st pre-compiler directives for major cpu architectures should have a
>> standard name. I.e. pre-compiler directives that tells you if you are
>> emitting instructions for an Arm64, Risc-V, x86_64, etc…
>
>What kind of compiler directives? Are we talking about preprocessor directives?
>
>Can you show some concrete example of the kind of code you want to write?
>
>> 2nd All supported data types that make sense that makes sense to perform operations (or register types) should a well-defined type. No “unsigned int” or “float”, more of a “uint16”, “ieee_fp32”, or “AVX_512_fp64”
>
>This seems a bit unreasonable. Keep in mind that compilers nowadays sometimes come with builtin vector types for SIMD, so this old style of __mm256i or whatever is not a great fit for standardization; it's too low-level. AVX intrinsics often require you to reinterpret these vector types as something else, like when you do a bit-masking operation on what was a bunch of f64s just a moment ago. This doesn't fit well into C++'s strict aliasing.
>
>> 3rd All function that make sense on a specific architecture (things that the CPU provide specific instructions for) should be put in a library collection that is only available on a specific architecture being targeted.
>
>Isn't it obvious that it comes with a huge maintenance burden if you need to maintain a set of intrinsics for all sorts of architectures, many of which depend on various CPU extensions? We operate on a three-year cycle and this seems very difficult to keep up. Which ones do we maintain? RISC-V, x86_64, ARM, WASM at the very least, and all their possible extensions?
>
>Remember that we're on a three-year cycle, so if Intel releases a new CPU extension in 2026, tough luck, you're waiting until 2029 until that intrinsic is in a C++ standard.
>
>> Not necessarily every single instruction but most intrinsics should probably end-up there. I’m talking stuff like “cupid” (on arm or x86), AVX instructions, AES or CRC instructions.
>
>If it's not simply one intrinsic per instruction, this sounds like an ungodly amount of bikeshedding over which instruction wrappers to stan
>
>> The intel intrinsics library does a very good job to make this more or less portable, but I’m talking a similar thing for all major architectures. And if the functions do something deterministic, make it constexpr as well.
>
>This works much better in third-party libraries. We don't need this to be in the standard.
>--
>Std-Discussion mailing list
>Std-Discussion_at_[hidden]
>https://lists.isocpp.org/mailman/listinfo.cgi/std-discussion

Received on 2025-09-03 02:41:26