C++ Logo

std-discussion

Advanced search

Re: Architecture specific library extensions

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Sat, 30 Aug 2025 08:41:06 +0000
> What kind of compiler directives? Are we talking about preprocessor directives?

Yes. I used the wrong the term.
#if defined(__ARM64__)

> Can you show some concrete example of the kind of code you want to write?

Sure.
This kind of thing pops up everywhere
ex. One of my proposals for overflow arithmetic I want to be able to provide a sample implementation so that users can play with, which looks like this:
https://github.com/tmiguelf/std_prop_overflow/blob/master/reference/std_ovf/include/numeric_ovf.hpp#L10
Issue being that some architectures have explicit support for what I want to do and can optimize by rolling out a single instruction, others don't and I have to role out a less efficient algorithm to the same job.

Here's another example, let's say I want to optimize hashing algorithm: https://github.com/tmiguelf/crypt/blob/master/Crypt/src/hash/crc.cpp#L374
Or if I'm providing an AES implementation.
I want to be able to provide an implementation that is both efficient and work across multiple platforms.
If a particular architecture has a hardware extension, I want to know that my target application is built for that architecture, I want to be able to pull the corresponding CPUID to test for the feature at runtime an decide to either offer the hardware implementation or a fallback software one.

Having a single way to do this identification should just be a part of the standard, period.


> This seems a bit unreasonable. Keep in mind that compilers nowadays sometimes come with builtin vector types for SIMD, so this old style of __mm256i or whatever is not a great fit for standardization; it's too low-level. AVX intrinsics often require you to reinterpret these vector types as something else, like when you do a bit-masking operation on what was a bunch of f64s just a moment ago. This doesn't fit well into C++'s strict aliasing.

It's not unreasonable, I want to do low-level stuff. I should be able to do it.
Every single serious project I've worked on at one point or another as need something like this.
Keeping only to the things that standard C++ dictates doesn't allow to create very useful applications, it must allow to support theses things without someone saying "it's UB so monkeys can fly out of your computer case".


> Isn't it obvious that it comes with a huge maintenance burden if you need to maintain a set of intrinsics for all sorts of architectures, many of which depend on various CPU extensions? We operate on a three-year cycle and this seems very difficult to keep up. Which ones do we maintain? RISC-V, x86_64, ARM, WASM at the very least, and all their possible extensions?
> Remember that we're on a three-year cycle, so if Intel releases a new CPU extension in 2026, tough luck, you're waiting until 2029 until that intrinsic is in a C++ standard.

Well, this wouldn't be part of the "C++ standard", it would be part of a "C++ standard extension". The "extension" word is doing a lot of the heavy lifting.
It would be an appendage, doesn't have to follow the same release cycle. The different architecture subgroups could even work independently.

Even "if <CPU vendor> releases a new CPU extension in <end of cycle>, tough luck, you're waiting until <next cycle> until that intrinsic is in a C++ standard [extension]". "Tough" is an acceptable solution, people can wait until the next cycle, and if they want to have it sooner, they can use a proprietary implementation while they wait for it to become standard.


> If it's not simply one intrinsic per instruction, this sounds like an ungodly amount of bikeshedding over which instruction wrappers to [standardize]
Possibly. It will take time. But would allow for a much better playing field for low-level development.



-----Original Message-----
From: Jan Schultke <janschultke_at_googlemail.com>
Sent: Saturday, August 30, 2025 06:49
To: std-discussion_at_lists.isocpp.org
Cc: Tiago Freire <tmiguelf_at_[hidden]>
Subject: Re: [std-discussion] Architecture specific library extensions

> 1st pre-compiler directives for major cpu architectures should have a
> standard name. I.e. pre-compiler directives that tells you if you are
> emitting instructions for an Arm64, Risc-V, x86_64, etc…

What kind of compiler directives? Are we talking about preprocessor directives?

Can you show some concrete example of the kind of code you want to write?

> 2nd All supported data types that make sense that makes sense to perform operations (or register types) should a well-defined type. No “unsigned int” or “float”, more of a “uint16”, “ieee_fp32”, or “AVX_512_fp64”

This seems a bit unreasonable. Keep in mind that compilers nowadays sometimes come with builtin vector types for SIMD, so this old style of __mm256i or whatever is not a great fit for standardization; it's too low-level. AVX intrinsics often require you to reinterpret these vector types as something else, like when you do a bit-masking operation on what was a bunch of f64s just a moment ago. This doesn't fit well into C++'s strict aliasing.

> 3rd All function that make sense on a specific architecture (things that the CPU provide specific instructions for) should be put in a library collection that is only available on a specific architecture being targeted.

Isn't it obvious that it comes with a huge maintenance burden if you need to maintain a set of intrinsics for all sorts of architectures, many of which depend on various CPU extensions? We operate on a three-year cycle and this seems very difficult to keep up. Which ones do we maintain? RISC-V, x86_64, ARM, WASM at the very least, and all their possible extensions?

Remember that we're on a three-year cycle, so if Intel releases a new CPU extension in 2026, tough luck, you're waiting until 2029 until that intrinsic is in a C++ standard.

> Not necessarily every single instruction but most intrinsics should probably end-up there. I’m talking stuff like “cupid” (on arm or x86), AVX instructions, AES or CRC instructions.

If it's not simply one intrinsic per instruction, this sounds like an ungodly amount of bikeshedding over which instruction wrappers to stan

> The intel intrinsics library does a very good job to make this more or less portable, but I’m talking a similar thing for all major architectures. And if the functions do something deterministic, make it constexpr as well.

This works much better in third-party libraries. We don't need this to be in the standard.

Received on 2025-08-30 08:41:13