Date: Thu, 4 Dec 2025 11:15:37 +0530
Hello Thiago,
Thank you for the thoughtful question — it touches on the central issue of
whether this idea belongs in the C++ Standard or should remain an
architecture-specific extension.
Let me clarify the motivation more precisely, because the original message
did not convey the full context.
*1. This proposal is not about prefetching or cache-control*
The draft text unfortunately used terminology that sounded
hardware-adjacent, but the intent is *not* to introduce anything analogous
to:
-
prefetch intrinsics
-
non-temporal loads/stores
-
cache control hints
-
pipeline fetch controls
Those are ISA-level optimizations and naturally belong in compiler
extensions or architecture-specific intrinsics.
The concept here is completely different and exists at the *semantic level*,
not the CPU-microarchitecture level.
*2. The actual concept: explicit sequencing of dynamically changing
micro-tasks*
Many event-driven systems (HDL simulators are just one example) share a
common execution model:
-
Thousands of *micro-tasks*
-
The order of tasks is computed dynamically
-
The order changes every cycle or even more frequently
-
Tasks themselves are small and often side-effect-free
-
A dispatcher function repeatedly selects the next task to run
In C++ today, this typically results in a control-flow pattern like:
dispatcher → T1 → dispatcher → T2 → dispatcher → T3 → …
even when the *intended evaluation sequence* is conceptually:
T1 → T2 → T3 → T4 → …
The key issue is *expressiveness*: C++ currently has no mechanism to
express “evaluate these things in this dynamic order, without re-entering a
dispatcher between each step”.
Coroutines, tasks, executors, thread pools, and dispatch loops all still
fundamentally operate through:
-
repeated function calls, or
-
repeated returns to a central controller, or
-
runtime-managed schedulers
which means that as programs scale down to extremely fine-grained
tasks, *sequencing
overhead becomes dominant*, even on architectures where prefetching is not
a concern.
*3. Why this is not architecture-specific*
The misunderstanding arises when “fetch-only operation” sounds like a CPU
fetch-stage mechanism.
The actual idea is:
A mechanism in the abstract machine that allows a program to express a
*sequence
of evaluations* that does not pass through a central dispatcher after each
step.
This can be implemented portably in many ways:
-
Using a software-run interpreter that consumes the sequencing structure
-
Using an implementation-specific optimization when available
-
Mapping to architecture-specific controls on processors that support
such mechanisms
-
Or completely lowering into normal calls on simpler hardware
This is similar in spirit to:
-
*coroutines*: abstract machine semantics, many possible implementations
-
*atomics*: abstract semantics, architecture maps them to whatever
operations it has
-
*SIMD types*: portable semantics, mapped to different instructions per
architecture
-
*executors*: abstract relationships, many possible backends
So the goal is not to standardize “fetch-stage hardware instructions”,
but to explore whether C++ can expose a *portable semantic form of
sequencing that compilers may optimize very differently depending on
platform capabilities*.
*4. Why prefetch intrinsics are insufficient*
Prefetch intrinsics provide:
-
data locality hints
-
non-temporal load/store hints
-
per-architecture micro-optimizations
They do *not* provide:
-
a semantic representation of evaluation order
-
a way to represent a dynamically computed schedule
-
a portable abstraction
-
a way to eliminate dispatcher re-entry in the program model
Prefetching does not remove the repeated calls/returns between tasks.
This proposal focuses on *expressing intent*, not on cache behavior.
*5. Why this might belong in the Standard*
Because the idea is:
-
semantic, not architectural
-
portable across CPU, GPU, and FPGA-style systems
-
potentially optimizable by compilers
-
relevant to domains beyond HDL tools
-
conceptually related to execution agents, coroutines, and sequencing in
executors
-
about exposing user intent (dynamic ordering), not hardware control
Many modern workloads — including simulation, actor frameworks, reactive
graph engines, and fine-grained schedulers — could benefit from a portable
way to express:
“Here is the next unit of evaluation; no need to return to a dispatcher.”
even if the implementation varies drastically per target platform.
*6. Early-stage nature*
This is still an early R0 exploratory draft.
I fully expect that the idea will require:
-
reframing in abstract-machine terminology
-
better examples
-
clarification of how sequencing is expressed
-
exploration of implementability on real compilers
I appreciate your question because it helps anchor the discussion in the
right conceptual layer.
Thank you again for engaging — your perspective is extremely valuable.
On Wed, Dec 3, 2025 at 10:22 PM Thiago Macieira via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> On Wednesday, 3 December 2025 03:38:57 Pacific Standard Time Kamalesh
> Lakkampally via Std-Proposals wrote:
> > The goal is to support workloads where the execution order of micro-tasks
> > changes dynamically and unpredictably every cycle, such as *event-driven
> > HDL/SystemVerilog simulation*.
> > In such environments, conventional C++ mechanisms (threads, coroutines,
> > futures, indirect calls, executors) incur significant pipeline
> redirection
> > penalties. Fetch-only instructions aim to address this problem in a
> > structured, language-visible way.
> >
> > I would greatly appreciate *feedback, criticism, and suggestions* from
> the
> > community.
> > I am also *open to collaboration.*
>
> Can you explain why this should be in the Standard? Why are the prefetch
> intrinsics available as compiler extensions for a lot of architectures not
> enough? My first reaction is that this type of design is going to be very
> architecture-specific by definition, so using architecture-specific
> extensions
> should not be an impediment.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Thank you for the thoughtful question — it touches on the central issue of
whether this idea belongs in the C++ Standard or should remain an
architecture-specific extension.
Let me clarify the motivation more precisely, because the original message
did not convey the full context.
*1. This proposal is not about prefetching or cache-control*
The draft text unfortunately used terminology that sounded
hardware-adjacent, but the intent is *not* to introduce anything analogous
to:
-
prefetch intrinsics
-
non-temporal loads/stores
-
cache control hints
-
pipeline fetch controls
Those are ISA-level optimizations and naturally belong in compiler
extensions or architecture-specific intrinsics.
The concept here is completely different and exists at the *semantic level*,
not the CPU-microarchitecture level.
*2. The actual concept: explicit sequencing of dynamically changing
micro-tasks*
Many event-driven systems (HDL simulators are just one example) share a
common execution model:
-
Thousands of *micro-tasks*
-
The order of tasks is computed dynamically
-
The order changes every cycle or even more frequently
-
Tasks themselves are small and often side-effect-free
-
A dispatcher function repeatedly selects the next task to run
In C++ today, this typically results in a control-flow pattern like:
dispatcher → T1 → dispatcher → T2 → dispatcher → T3 → …
even when the *intended evaluation sequence* is conceptually:
T1 → T2 → T3 → T4 → …
The key issue is *expressiveness*: C++ currently has no mechanism to
express “evaluate these things in this dynamic order, without re-entering a
dispatcher between each step”.
Coroutines, tasks, executors, thread pools, and dispatch loops all still
fundamentally operate through:
-
repeated function calls, or
-
repeated returns to a central controller, or
-
runtime-managed schedulers
which means that as programs scale down to extremely fine-grained
tasks, *sequencing
overhead becomes dominant*, even on architectures where prefetching is not
a concern.
*3. Why this is not architecture-specific*
The misunderstanding arises when “fetch-only operation” sounds like a CPU
fetch-stage mechanism.
The actual idea is:
A mechanism in the abstract machine that allows a program to express a
*sequence
of evaluations* that does not pass through a central dispatcher after each
step.
This can be implemented portably in many ways:
-
Using a software-run interpreter that consumes the sequencing structure
-
Using an implementation-specific optimization when available
-
Mapping to architecture-specific controls on processors that support
such mechanisms
-
Or completely lowering into normal calls on simpler hardware
This is similar in spirit to:
-
*coroutines*: abstract machine semantics, many possible implementations
-
*atomics*: abstract semantics, architecture maps them to whatever
operations it has
-
*SIMD types*: portable semantics, mapped to different instructions per
architecture
-
*executors*: abstract relationships, many possible backends
So the goal is not to standardize “fetch-stage hardware instructions”,
but to explore whether C++ can expose a *portable semantic form of
sequencing that compilers may optimize very differently depending on
platform capabilities*.
*4. Why prefetch intrinsics are insufficient*
Prefetch intrinsics provide:
-
data locality hints
-
non-temporal load/store hints
-
per-architecture micro-optimizations
They do *not* provide:
-
a semantic representation of evaluation order
-
a way to represent a dynamically computed schedule
-
a portable abstraction
-
a way to eliminate dispatcher re-entry in the program model
Prefetching does not remove the repeated calls/returns between tasks.
This proposal focuses on *expressing intent*, not on cache behavior.
*5. Why this might belong in the Standard*
Because the idea is:
-
semantic, not architectural
-
portable across CPU, GPU, and FPGA-style systems
-
potentially optimizable by compilers
-
relevant to domains beyond HDL tools
-
conceptually related to execution agents, coroutines, and sequencing in
executors
-
about exposing user intent (dynamic ordering), not hardware control
Many modern workloads — including simulation, actor frameworks, reactive
graph engines, and fine-grained schedulers — could benefit from a portable
way to express:
“Here is the next unit of evaluation; no need to return to a dispatcher.”
even if the implementation varies drastically per target platform.
*6. Early-stage nature*
This is still an early R0 exploratory draft.
I fully expect that the idea will require:
-
reframing in abstract-machine terminology
-
better examples
-
clarification of how sequencing is expressed
-
exploration of implementability on real compilers
I appreciate your question because it helps anchor the discussion in the
right conceptual layer.
Thank you again for engaging — your perspective is extremely valuable.
On Wed, Dec 3, 2025 at 10:22 PM Thiago Macieira via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> On Wednesday, 3 December 2025 03:38:57 Pacific Standard Time Kamalesh
> Lakkampally via Std-Proposals wrote:
> > The goal is to support workloads where the execution order of micro-tasks
> > changes dynamically and unpredictably every cycle, such as *event-driven
> > HDL/SystemVerilog simulation*.
> > In such environments, conventional C++ mechanisms (threads, coroutines,
> > futures, indirect calls, executors) incur significant pipeline
> redirection
> > penalties. Fetch-only instructions aim to address this problem in a
> > structured, language-visible way.
> >
> > I would greatly appreciate *feedback, criticism, and suggestions* from
> the
> > community.
> > I am also *open to collaboration.*
>
> Can you explain why this should be in the Standard? Why are the prefetch
> intrinsics available as compiler extensions for a lot of architectures not
> enough? My first reaction is that this type of design is going to be very
> architecture-specific by definition, so using architecture-specific
> extensions
> should not be an impediment.
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Received on 2025-12-04 05:46:16
