Date: Fri, 17 Apr 2026 08:37:56 -0700
On Friday, 17 April 2026 06:24:29 Pacific Daylight Time Jason McKesson via Std-
Proposals wrote:
> Put simply, whether `runtime_tuple<int, float>` should be 24 bytes
> instead of 8 bytes depends entirely on the how a particular user uses
> it. One user might not index enough to make the extra 16 bytes of
> storage worth having, while the other person might. This decision must
> be made by the user; it cannot be made by the compiler, since the
> compiler must pick one or the other at instantiation time.
*Processors* like predictable access patterns.
If you always dereference this object at Top+0 and Top+4 bytes, the processor
will learn and predict properly (even assuming there are branches). If instead
you're accessing sometimes at Top+8 and at other times at Top+16, with no
overwhelming majority, the processor will not predict properly - it'll predict
probably no better than randomly.
And in this particular example of an object of 24 bytes aligned at 8, may
cross a cacheline. The solution to that would be to force a 32-byte alignment,
making it 32 bytes wide, which is even more space dedicated to the object.
Is this trade-off worth it? I agree with Jason that this is not knowable by the
compiler. Only profiling might tell.
Proposals wrote:
> Put simply, whether `runtime_tuple<int, float>` should be 24 bytes
> instead of 8 bytes depends entirely on the how a particular user uses
> it. One user might not index enough to make the extra 16 bytes of
> storage worth having, while the other person might. This decision must
> be made by the user; it cannot be made by the compiler, since the
> compiler must pick one or the other at instantiation time.
*Processors* like predictable access patterns.
If you always dereference this object at Top+0 and Top+4 bytes, the processor
will learn and predict properly (even assuming there are branches). If instead
you're accessing sometimes at Top+8 and at other times at Top+16, with no
overwhelming majority, the processor will not predict properly - it'll predict
probably no better than randomly.
And in this particular example of an object of 24 bytes aligned at 8, may
cross a cacheline. The solution to that would be to force a 32-byte alignment,
making it 32 bytes wide, which is even more space dedicated to the object.
Is this trade-off worth it? I agree with Jason that this is not knowable by the
compiler. Only profiling might tell.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Data Center - Platform & Sys. Eng.
Received on 2026-04-17 15:38:04
