Date: Thu, 11 Dec 2025 11:51:36 +0530
On 2025-12-09 17:27, Sebastian Wittmeier via Std-Proposals wrote:
> It is not so simple.
>
> You have to ensure that no side-channels are introduced in any of the
> layers.
>
> E.g. If you use a result as index into memory, the CPU has to
> circumvent the caches and avoid combining transactions that no timing
> side-channel is introduced.
>
> The C++ code is compiled first into an intermediate representation
> (IR) and later into assembly code.
>
> And at each level optimizations according to certain rules are
> allowed.
>
> To avoid timing side-channels certain code transformations and
> optimizations have to be forbidden and the information about which
> values and code-paths are affected, has to be kept.
>
> Similar to a volatile value, which has rules about instruction
> reordering and rules about that each memory access actually has to
> happen.
>
> Basically you have variables, which can have certain values and
> afterwards the optimizer may change the flow of the program depending
> on the value of those variables. And translating into assembly has to
> make sure that neither the CPU does, e.g. circumvent the cache
> accessing the memory, use a (barrel) shifter without dependence on
> number of bits shifted.
>
> This code block has inputs and outputs. At the input the compiler
> should forget all knowledge, after the output the limitations are
> relaxed again. Also one has to make sure that this code block is
> compiled only into one variant.
>
>> -----Ursprüngliche Nachricht-----
>> Von: David Brown via Std-Proposals <std-proposals_at_[hidden]>
>> Gesendet: Di 09.12.2025 12:40
>> Betreff: Re: [std-proposals] Constant-time selection primitive
>> following memset_explicit precedent
>> An: std-proposals_at_[hidden]; Jonathan Wakely
>> <cxx_at_[hidden]>;
>> CC: David Brown <david.brown_at_[hidden]>;
>>
>> I'm wondering if there is a simpler solution that could be used to
>> implement any of these "explicit" functions, as well as related
>> uses:
>>
>> template<class T>
>> T hide_value(T x) noexcept;
>>
>> (The C version could be a _Generic macro rather than just supporting
>>
>> uintmax_t.)
>>
>> The semantics of "hide_value" would be that it returns the argument
>> without changing its value, but the compiler is forbidden from
>> tracking
>> anything about its value, other than that it is valid. (Thus if "T"
>> is
>> "bool", the compiler can still assume that "hide_value(x)" is either
>> 0
>> or 1.)
>>
>> Your select_explict then becomes:
>>
>> T select_explicit(bool cond, T a, T b) {
>> T mask = -(T)(!!cond);
>> return (a & hide_value(mask)) | (b & hide_value(~mask));
>> }
>>
>> The compiler knows that "mask" is either all zeros or all ones, and
>> that
>> "~mask" is its complement, and how they depend on "cond". But it
>> does
>> not know anything about "hide_value(mask)" or "hide_value(~mask)",
>> or
>> how they relate, and therefore has to generate code that does the
>> "and"
>> operations and the "or" operation.
>>
>> A test implementation (considering only small, simple types) of
>> "hide_value" for gcc and clang could be :
>>
>> T hide_value(T x)
>> {
>> __asm__ volatile ("" : "+g" "" (x));
>> }
>>
>> A more general, but less efficient implementation could be :
>>
>> T hide_value(T x)
>> {
>> volatile T y = x;
>> return y;
>> }
>>
>> I'd still be in favour of having your suggested _explicit functions
>> in
>> the standard library, but "hide_value" (or something along those
>> lines)
>> might allow more generic implementations, and let people write
>> additional _explicit-style functions without having to wait for them
>> to
>> be added in later standards.
Thanks for the discussion. I agree that `hide_value` doesn't solve
microarchitectural timing, even if the compiler emits perfect code, cpus
can still leak timing through variable time instructions, cache
behavior, data depending prefetching, and yes speculative execution,
but here is the thing, the C and C++ Standard can't mandate CPU
Behavior, what it can do is, guarantee the compiler won't introduce
timing dependencies, it can specify the intent (just like
memset_explicit does) so implementations know what's needed, and then it
leaves room for implementations to use hardware features like Intel
DOIT, ARM DIT
and afaik this is exactly what memset_explicit does, it states the
intent ("make this data inaccessible") without defining "optimization"
in the abstract machine, and I am thinking same approach works here.
> It is not so simple.
>
> You have to ensure that no side-channels are introduced in any of the
> layers.
>
> E.g. If you use a result as index into memory, the CPU has to
> circumvent the caches and avoid combining transactions that no timing
> side-channel is introduced.
>
> The C++ code is compiled first into an intermediate representation
> (IR) and later into assembly code.
>
> And at each level optimizations according to certain rules are
> allowed.
>
> To avoid timing side-channels certain code transformations and
> optimizations have to be forbidden and the information about which
> values and code-paths are affected, has to be kept.
>
> Similar to a volatile value, which has rules about instruction
> reordering and rules about that each memory access actually has to
> happen.
>
> Basically you have variables, which can have certain values and
> afterwards the optimizer may change the flow of the program depending
> on the value of those variables. And translating into assembly has to
> make sure that neither the CPU does, e.g. circumvent the cache
> accessing the memory, use a (barrel) shifter without dependence on
> number of bits shifted.
>
> This code block has inputs and outputs. At the input the compiler
> should forget all knowledge, after the output the limitations are
> relaxed again. Also one has to make sure that this code block is
> compiled only into one variant.
>
>> -----Ursprüngliche Nachricht-----
>> Von: David Brown via Std-Proposals <std-proposals_at_[hidden]>
>> Gesendet: Di 09.12.2025 12:40
>> Betreff: Re: [std-proposals] Constant-time selection primitive
>> following memset_explicit precedent
>> An: std-proposals_at_[hidden]; Jonathan Wakely
>> <cxx_at_[hidden]>;
>> CC: David Brown <david.brown_at_[hidden]>;
>>
>> I'm wondering if there is a simpler solution that could be used to
>> implement any of these "explicit" functions, as well as related
>> uses:
>>
>> template<class T>
>> T hide_value(T x) noexcept;
>>
>> (The C version could be a _Generic macro rather than just supporting
>>
>> uintmax_t.)
>>
>> The semantics of "hide_value" would be that it returns the argument
>> without changing its value, but the compiler is forbidden from
>> tracking
>> anything about its value, other than that it is valid. (Thus if "T"
>> is
>> "bool", the compiler can still assume that "hide_value(x)" is either
>> 0
>> or 1.)
>>
>> Your select_explict then becomes:
>>
>> T select_explicit(bool cond, T a, T b) {
>> T mask = -(T)(!!cond);
>> return (a & hide_value(mask)) | (b & hide_value(~mask));
>> }
>>
>> The compiler knows that "mask" is either all zeros or all ones, and
>> that
>> "~mask" is its complement, and how they depend on "cond". But it
>> does
>> not know anything about "hide_value(mask)" or "hide_value(~mask)",
>> or
>> how they relate, and therefore has to generate code that does the
>> "and"
>> operations and the "or" operation.
>>
>> A test implementation (considering only small, simple types) of
>> "hide_value" for gcc and clang could be :
>>
>> T hide_value(T x)
>> {
>> __asm__ volatile ("" : "+g" "" (x));
>> }
>>
>> A more general, but less efficient implementation could be :
>>
>> T hide_value(T x)
>> {
>> volatile T y = x;
>> return y;
>> }
>>
>> I'd still be in favour of having your suggested _explicit functions
>> in
>> the standard library, but "hide_value" (or something along those
>> lines)
>> might allow more generic implementations, and let people write
>> additional _explicit-style functions without having to wait for them
>> to
>> be added in later standards.
Thanks for the discussion. I agree that `hide_value` doesn't solve
microarchitectural timing, even if the compiler emits perfect code, cpus
can still leak timing through variable time instructions, cache
behavior, data depending prefetching, and yes speculative execution,
but here is the thing, the C and C++ Standard can't mandate CPU
Behavior, what it can do is, guarantee the compiler won't introduce
timing dependencies, it can specify the intent (just like
memset_explicit does) so implementations know what's needed, and then it
leaves room for implementations to use hardware features like Intel
DOIT, ARM DIT
and afaik this is exactly what memset_explicit does, it states the
intent ("make this data inaccessible") without defining "optimization"
in the abstract machine, and I am thinking same approach works here.
Received on 2025-12-11 06:21:40
