Date: Thu, 04 Dec 2025 09:15:17 -0800
On Thursday, 4 December 2025 05:29:25 Pacific Standard Time Jens Maurer via
Std-Proposals wrote:
> My understanding is that current CPUs have quite some machinery in their
> branch predictors to optimize call/return patterns, but given the
> dynamic nature of your problem, I'm not seeing how you can avoid
> the "call (or jmp) through function pointer" fundamental penalty.
They do, and call/rets being kept in sync is now enforced, as a way to prevent
Return-Oriented Programming attacks. To "defeat" that, we need the compiler to
generate code *designed* to operate in a task list without return to a central
dispatcher. That is, the compiler would emit an unconditional, indirect jump
at the end of the function instead of the regular return instruction, and the
entry from the caller will also need a different ABI than a plain function
call.
Jens' point here is that this may very well be slower than returning to the
dispatcher and making a new function call. Indirect jumps are expensive and
the Branch Predictor Units may or may not track them well. They are expensive
because the instruction to be executed next may not be known sufficiently ahead
of time for the processor to go ahead, begin decoding and speculatively start
executing them.
Can this be turned into a QoI problem? The way I see it, this cannot exist
without a set of Library functions: can the performance problems be hidden in
those? That would also limit the number of core language changes - possibly
only to function boundaries.
Std-Proposals wrote:
> My understanding is that current CPUs have quite some machinery in their
> branch predictors to optimize call/return patterns, but given the
> dynamic nature of your problem, I'm not seeing how you can avoid
> the "call (or jmp) through function pointer" fundamental penalty.
They do, and call/rets being kept in sync is now enforced, as a way to prevent
Return-Oriented Programming attacks. To "defeat" that, we need the compiler to
generate code *designed* to operate in a task list without return to a central
dispatcher. That is, the compiler would emit an unconditional, indirect jump
at the end of the function instead of the regular return instruction, and the
entry from the caller will also need a different ABI than a plain function
call.
Jens' point here is that this may very well be slower than returning to the
dispatcher and making a new function call. Indirect jumps are expensive and
the Branch Predictor Units may or may not track them well. They are expensive
because the instruction to be executed next may not be known sufficiently ahead
of time for the processor to go ahead, begin decoding and speculatively start
executing them.
Can this be turned into a QoI problem? The way I see it, this cannot exist
without a set of Library functions: can the performance problems be hidden in
those? That would also limit the number of core language changes - possibly
only to function boundaries.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Data Center - Platform & Sys. Eng.
Received on 2025-12-04 17:15:26
