Kamalesh,
It’s not clear that you’re looking to change anything at all in the language. If you are, you haven’t said exactly what it is.
It seems more that you have an unusual highly parallel hardware architecture, and that what you’re looking for is a very different compiler implementation, not a different or modified language.
For example, in C++ or most any procedural language, you can write some sequence of independent steps:
a = b + c;
d = e * f;
g = 3 + 5 / h;
The compiler (compiler backend typically) is totally free to reorder these, or dispatch these, in whatever way the underlying hardware architecture allows.
If there are dependencies, such as say, `k = a – 1;` the compiler ensures operations are ordered to satisfy the dependency, in whatever way the hardware architecture allows for.
So it seems already possible to express “micro-tasks”, whatever these might be, as simple independent C++ statements.
Assuming you have a novel computer hardware architecture, and you want compiler support for it, your time is very likely much better spent studying LLVM and how to port it to a new architecture,
than trying to propose things to this group without understanding what you’re proposing.
You may also find different languages or language extensions (many supported by LLVM) that help targeting highly parallel hardware such as GPUs, that perhaps better fit your hardware architecture.
At least you might come out of that exercise with much better understanding of the relationship between your hardware and compilers.
(My 2 cents.)
Marc
From: Std-Proposals <std-proposals-bounces@lists.isocpp.org>
On Behalf Of Kamalesh Lakkampally via Std-Proposals
Sent: Wednesday, December 3, 2025 21:46
To: std-proposals@lists.isocpp.org
Cc: Kamalesh Lakkampally <info@chipnadi.com>
Subject: Re: [std-proposals] Core-Language Extension for Fetch-Only Instruction Semantics
Hello Thiago,
Thank you for the thoughtful question — it touches on the central issue of whether this idea belongs in the C++ Standard or should remain an architecture-specific extension.
Let me clarify the motivation more precisely, because the original message did not convey the full context.
The draft text unfortunately used terminology that sounded hardware-adjacent, but the intent is
not to introduce anything analogous to:
Those are ISA-level optimizations and naturally belong in compiler extensions or architecture-specific intrinsics.
The concept here is completely different and exists at the semantic level, not the CPU-microarchitecture level.
Many event-driven systems (HDL simulators are just one example) share a common execution model:
In C++ today, this typically results in a control-flow pattern like:
dispatcher → T1 → dispatcher → T2 → dispatcher → T3 → …
even when the intended evaluation sequence is conceptually:
T1 → T2 → T3 → T4 → …
The key issue is expressiveness: C++ currently has no mechanism to express “evaluate these things in this dynamic order, without re-entering a dispatcher between each step”.
Coroutines, tasks, executors, thread pools, and dispatch loops all still fundamentally operate through:
which means that as programs scale down to extremely fine-grained tasks,
sequencing overhead becomes dominant, even on architectures where prefetching is not a concern.
The misunderstanding arises when “fetch-only operation” sounds like a CPU fetch-stage mechanism.
The actual idea is:
A mechanism in the abstract machine that allows a program to express a sequence of evaluations that does not pass through a central dispatcher after each step.
This can be implemented portably in many ways:
This is similar in spirit to:
So the goal is not to standardize “fetch-stage hardware instructions”,
but to explore whether C++ can expose a portable semantic form of sequencing that compilers may optimize very differently depending on platform capabilities.
Prefetch intrinsics provide:
They do not provide:
Prefetching does not remove the repeated calls/returns between tasks.
This proposal focuses on expressing intent, not on cache behavior.
Because the idea is:
Many modern workloads — including simulation, actor frameworks, reactive graph engines, and fine-grained schedulers — could benefit from a portable way to express:
“Here is the next unit of evaluation; no need to return to a dispatcher.”
even if the implementation varies drastically per target platform.
This is still an early R0 exploratory draft.
I fully expect that the idea will require:
I appreciate your question because it helps anchor the discussion in the right conceptual layer.
Thank you again for engaging — your perspective is extremely valuable.
On Wed, Dec 3, 2025 at 10:22 PM Thiago Macieira via Std-Proposals <std-proposals@lists.isocpp.org> wrote:
On Wednesday, 3 December 2025 03:38:57 Pacific Standard Time Kamalesh
Lakkampally via Std-Proposals wrote:
> The goal is to support workloads where the execution order of micro-tasks
> changes dynamically and unpredictably every cycle, such as *event-driven
> HDL/SystemVerilog simulation*.
> In such environments, conventional C++ mechanisms (threads, coroutines,
> futures, indirect calls, executors) incur significant pipeline redirection
> penalties. Fetch-only instructions aim to address this problem in a
> structured, language-visible way.
>
> I would greatly appreciate *feedback, criticism, and suggestions* from the
> community.
> I am also *open to collaboration.*
Can you explain why this should be in the Standard? Why are the prefetch
intrinsics available as compiler extensions for a lot of architectures not
enough? My first reaction is that this type of design is going to be very
architecture-specific by definition, so using architecture-specific extensions
should not be an impediment.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Principal Engineer - Intel Data Center - Platform & Sys. Eng.
--
Std-Proposals mailing list
Std-Proposals@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals