ISOCPP std-proposals List: Re: [std-proposals] Core-Language Extension for Fetch-Only Instruction Semantics

From: Kamalesh Lakkampally <founder_at_[hidden]>
Date: Fri, 5 Dec 2025 10:10:07 +0530

It appears that many optimization-related responsibilities are currently
expected to be handled entirely by the backend.
May be my understand is wrong.

*Best Regards,Kamalesh Lakkampally,*
*Founder & CEO*
www.chipnadi.com

On Thu, Dec 4, 2025 at 11:43 PM Marc Edouard Gauthier via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> I’m sorry, no decent person would answer this way. What I wrote was
> essentially ignored. What essentially everyone writes is being ignored.
>
> This smells extremely strong of AI. (And if not, it’s entirely equivalent
> to AI: not forming as a discussion, rather a mostly one-way information
> regurgitation.)
>
> Given the non-responsiveness, it seems to me blocking this email is
> appropriate, if there’s any way to do so, to avoid wasting everyone’s time.
>
>
>
> My 2 cents.
>
>
>
> Marc
>
>
>
>
>
>
>
> *From:* Kamalesh Lakkampally <founder_at_[hidden]>
> *Sent:* Wednesday, December 3, 2025 22:43
> *To:* std-proposals_at_[hidden]
> *Cc:* Marc Edouard Gauthier <MarcEdouard.Gauthier_at_[hidden]>
> *Subject:* Re: [std-proposals] Core-Language Extension for Fetch-Only
> Instruction Semantics
>
>
>
> Hi Marc,
>
> Thank you for your comments. Since the mailing list strips attachments,
> you have not seen the core details of the idea, so let me restate the
> proposal from the beginning.
> 1. *What the proposal is NOT*
>
> It is *not*:
>
> - a new CPU instruction
> - a hardware prefetching mechanism
> - a cache hint
> - a pipeline control mechanism
> - an optimization directive
> - a request for compiler backend changes
> - a form of parallelism or GPU-style execution
>
> None of these describe the concept accurately.
>
> The proposal operates *purely at the C++ abstract-machine level*.
>
>
> 2. *What the proposal **is**: A new way to express dynamic sequencing of
> micro-tasks*
>
> Many event-driven systems maintain a queue of very small tasks
> ("micro-tasks") whose execution order changes *frequently* at runtime:
>
> At one moment: T1 → T2 → T3
> Later: T3 → T4 → T1 → T2
>
> In C++ today, these systems must route control through a dispatcher:
>
> dispatcher();
> → T1();
> dispatcher();
> → T2();
> dispatcher();
> → T3();
>
> Even though the *intended* program order is simply:
>
> T1 → T2 → T3
>
> This repeated dispatcher → task → dispatcher pattern:
>
> - is semantically unnecessary
> - consumes execution bandwidth
> - prevents compiler optimization
> - introduces unpredictable control flow
> - creates overhead for extremely fine-grained tasks
>
> The proposal asks:
>
> *Can C++ express dynamic sequencing declaratively, without requiring the
> program to re-enter the dispatcher between every micro-task?*
>
>
> 3. *The key idea: “Fetch-only operations”*
>
> A *fetch-only operation* is a C++ semantic construct that:
> ✔ does NOT compute ✔ does NOT read or write memory ✔ does NOT have
> observable side effects ✔ does NOT correspond to a function call or branch
> ✔ EXISTS ONLY to describe “what executes next”
>
> In other words, it is a *pure sequencing directive*, not an instruction
> or computation.
>
> For example (placeholder syntax):
>
> fad q[i] = next_address;
>
> fcd q[i] = thread_context;
>
> fed q[i] = exec_context;
>
> These operations place *sequencing metadata* into a dedicated structure.
> 4. *What metadata is being represented?*
>
> Each “micro-task” is associated with small context fields:
>
> - *8-bit thread-context*
> - *8-bit execution-context*
> - *an instruction address (or function entry)*
>
> These fields allow the program to encode:
>
> “After completing this task, the *next* task to evaluate is at address X,
> but only if its context matches Y.”
>
> This enables the expression of dynamic scheduling decisions *without
> returning through the dispatcher*.
>
>
> 5. *Fetch-Only Region: where this metadata lives*
>
> Just as C++ programs conceptually have:
>
> - a *stack region* (automatic storage)
> - a *heap region* (dynamic storage)
>
> the proposal introduces a *fetch-only region*:
>
> - memory that stores sequencing metadata
> - strictly controlled by the implementation--> MMU(memory management
> unit)
> - with context validation
> - not accessible for ordinary loads/stores
> - used only for sequencing, not computation
>
> This region is *not hardware-specific*; it is an abstract-machine
> concept, much like the thread-local storage model or atomic synchronization
> regions.
>
>
> 6. *Why this belongs at the language level*
>
> This is not expressible today because C++ has *no construct to describe
> sequencing separately from execution*.
>
> Existing mechanisms:
> Threads / tasks / executors
>
> → Require execution-path transitions
> Coroutines
>
> → Maintain suspended execution frames
> Function calls
>
> → Require call/return semantics
> Dispatch loops
>
> → Centralize sequencing and exhibit overhead
> As-if rule
>
> → Cannot remove dispatcher calls; they are semantically required
>
> Dynamic sequencing is fundamentally:
>
> - *not parallelism*,
> - *not computation*,
> - *not scheduling*,
> - *not hardware control*.
>
> It is *language-level intent*:
>
> “Evaluate micro-tasks in this dynamic order, without routing control back
> through a dispatcher.”
>
> This cannot be expressed with intrinsics, LLVM passes, or target-specific
> code.
>
>
> 7. *Why this is portable*
>
> Different compilers/runtimes/architectures may choose different
> implementations:
>
> - pure software interpreter for sequencing
> - optimizing JIT
> - coroutine resumption graph
> - architecture-specific fast paths (optional)
> - or normal dispatch loops as fallback
>
> The semantics remain:
>
> *Fetch-only operations provide sequencing metadata*
>
> * fetch-only region stores it execution proceeds in the declared order.*
>
> This is a valid addition to the C++ abstract machine, not a hardware
> feature.
> *Why context fields matter*
>
> Context metadata enables:
>
> - *correct sequencing:* ensuring that only valid successor tasks are
> chosen
> - *safety checks:* preventing unintended jumps to unrelated micro-tasks
> - *structural integrity:* maintaining a well-defined evaluation graph
>
> *Security aspect*
>
> Because the sequencing structure is stored in a dedicated *fetch-only
> region*—conceptually similar to the way the stack and heap represent
> distinct memory roles—the context fields also allow:
>
> - *validation of allowed transitions*,
> - *prevention of unauthorized or accidental modification*, and
> - *protection against control-flow corruption.*
>
> In other words, the combination of:
>
> - *context identifiers*, and
> - *a dedicated fetch-only region (analogous to stack/heap regions)*
>
> provides a framework in which implementations can enforce both correctness
> and security properties for dynamic sequencing.
>
> This occurs entirely at the semantic level; no explicit hardware behavior
> is assumed.
> 8. *Why the proposal exists*
>
> Highly dynamic micro-task workloads (event-driven simulation is just one
> example) cannot currently express their intent in C++ without:
>
> - repeated dispatcher calls
> - unnecessary control-flow redirections
> - significant overhead for fine-grained scheduling
>
> This proposal explores whether C++ can support such workloads in a
> portable and declarative manner.
>
>
> 9. *Still early-stage*
>
> This is an R0 exploratory draft. I am refining terminology,
> abstract-machine semantics, and examples with the help of feedback from
> this discussion.
>
>
>
>
>
>
> *Best Regards, Kamalesh Lakkampally,*
>
> *Founder & CEO*
>
> www.chipnadi.com
> <https://urldefense.us/v3/__http:/www.chipnadi.com__;!!Fqb0NABsjhF0Kh8I!cpAdSa8276cm8w5XL1eZCHz6f4bI9Ovm1HN1NMpukyUIl8QDZgFKBhhDtXh_VvDwalPV6wEVUZGmFQ2tF0Oaj1lzIQ$>
>
>
>
>
>
>
>
> On Thu, Dec 4, 2025 at 11:49 AM Marc Edouard Gauthier via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
> Kamalesh,
>
>
>
> It’s not clear that you’re looking to change anything at all in the
> language. If you are, you haven’t said exactly what it is.
>
>
>
> It seems more that you have an unusual highly parallel hardware
> architecture, and that what you’re looking for is a very different compiler
> implementation, not a different or modified language.
>
> For example, in C++ or most any procedural language, you can write some
> sequence of independent steps:
>
>
>
> a = b + c;
>
> d = e * f;
>
> g = 3 + 5 / h;
>
>
>
> The compiler (compiler backend typically) is totally free to reorder
> these, or dispatch these, in whatever way the underlying hardware
> architecture allows.
>
> If there are dependencies, such as say, `k = a – 1;` the compiler
> ensures operations are ordered to satisfy the dependency, in whatever way
> the hardware architecture allows for.
>
>
>
> So it seems already possible to express “micro-tasks”, whatever these
> might be, as simple independent C++ statements.
>
>
>
> Assuming you have a novel computer hardware architecture, and you want
> compiler support for it, your time is very likely much better spent
> studying LLVM and how to port it to a new architecture, than trying to
> propose things to this group without understanding what you’re proposing.
>
> You may also find different languages or language extensions (many
> supported by LLVM) that help targeting highly parallel hardware such as
> GPUs, that perhaps better fit your hardware architecture.
>
>
>
> At least you might come out of that exercise with much better
> understanding of the relationship between your hardware and compilers.
>
>
>
> (My 2 cents.)
>
>
>
> Marc
>
>
>
>
>
> *From:* Std-Proposals <std-proposals-bounces_at_[hidden]> *On Behalf
> Of *Kamalesh Lakkampally via Std-Proposals
> *Sent:* Wednesday, December 3, 2025 21:46
> *To:* std-proposals_at_[hidden]
> *Cc:* Kamalesh Lakkampally <info_at_[hidden]>
> *Subject:* Re: [std-proposals] Core-Language Extension for Fetch-Only
> Instruction Semantics
>
>
>
> Hello Thiago,
>
> Thank you for the thoughtful question — it touches on the central issue of
> whether this idea belongs in the C++ Standard or should remain an
> architecture-specific extension.
> Let me clarify the motivation more precisely, because the original message
> did not convey the full context.
> *1. This proposal is **not** about prefetching or cache-control*
>
> The draft text unfortunately used terminology that sounded
> hardware-adjacent, but the intent is *not* to introduce anything
> analogous to:
>
> - prefetch intrinsics
> - non-temporal loads/stores
> - cache control hints
> - pipeline fetch controls
>
> Those are ISA-level optimizations and naturally belong in compiler
> extensions or architecture-specific intrinsics.
>
> The concept here is completely different and exists at the *semantic
> level*, not the CPU-microarchitecture level.
>
>
> *2. The actual concept: explicit sequencing of dynamically changing
> micro-tasks*
>
> Many event-driven systems (HDL simulators are just one example) share a
> common execution model:
>
> - Thousands of *micro-tasks*
> - The order of tasks is computed dynamically
> - The order changes every cycle or even more frequently
> - Tasks themselves are small and often side-effect-free
> - A dispatcher function repeatedly selects the next task to run
>
> In C++ today, this typically results in a control-flow pattern like:
>
> dispatcher → T1 → dispatcher → T2 → dispatcher → T3 → …
>
> even when the *intended evaluation sequence* is conceptually:
>
> T1 → T2 → T3 → T4 → …
>
> The key issue is *expressiveness*: C++ currently has no mechanism to
> express “evaluate these things in this dynamic order, without re-entering a
> dispatcher between each step”.
>
> Coroutines, tasks, executors, thread pools, and dispatch loops all still
> fundamentally operate through:
>
> - repeated function calls, or
> - repeated returns to a central controller, or
> - runtime-managed schedulers
>
> which means that as programs scale down to extremely fine-grained tasks, *
> sequencing overhead becomes dominant*, even on architectures where
> prefetching is not a concern.
>
>
> *3. Why this is not architecture-specific*
>
> The misunderstanding arises when “fetch-only operation” sounds like a CPU
> fetch-stage mechanism.
>
> The actual idea is:
>
> A mechanism in the abstract machine that allows a program to express a *sequence
> of evaluations* that does not pass through a central dispatcher after
> each step.
>
> This can be implemented portably in many ways:
>
> - Using a software-run interpreter that consumes the sequencing
> structure
> - Using an implementation-specific optimization when available
> - Mapping to architecture-specific controls on processors that support
> such mechanisms
> - Or completely lowering into normal calls on simpler hardware
>
> This is similar in spirit to:
>
> - *coroutines*: abstract machine semantics, many possible
> implementations
> - *atomics*: abstract semantics, architecture maps them to whatever
> operations it has
> - *SIMD types*: portable semantics, mapped to different instructions
> per architecture
> - *executors*: abstract relationships, many possible backends
>
> So the goal is not to standardize “fetch-stage hardware instructions”,
> but to explore whether C++ can expose a *portable semantic form of
> sequencing that compilers may optimize very differently depending on
> platform capabilities*.
>
>
> *4. Why prefetch intrinsics are insufficient*
>
> Prefetch intrinsics provide:
>
> - data locality hints
> - non-temporal load/store hints
> - per-architecture micro-optimizations
>
> They do *not* provide:
>
> - a semantic representation of evaluation order
> - a way to represent a dynamically computed schedule
> - a portable abstraction
> - a way to eliminate dispatcher re-entry in the program model
>
> Prefetching does not remove the repeated calls/returns between tasks.
> This proposal focuses on *expressing intent*, not on cache behavior.
> *5. Why this might belong in the Standard*
>
> Because the idea is:
>
> - semantic, not architectural
> - portable across CPU, GPU, and FPGA-style systems
> - potentially optimizable by compilers
> - relevant to domains beyond HDL tools
> - conceptually related to execution agents, coroutines, and sequencing
> in executors
> - about exposing user intent (dynamic ordering), not hardware control
>
> Many modern workloads — including simulation, actor frameworks, reactive
> graph engines, and fine-grained schedulers — could benefit from a portable
> way to express:
>
> “Here is the next unit of evaluation; no need to return to a dispatcher.”
>
> even if the implementation varies drastically per target platform.
>
>
> *6. Early-stage nature*
>
> This is still an early R0 exploratory draft.
> I fully expect that the idea will require:
>
> - reframing in abstract-machine terminology
> - better examples
> - clarification of how sequencing is expressed
> - exploration of implementability on real compilers
>
> I appreciate your question because it helps anchor the discussion in the
> right conceptual layer.
>
> Thank you again for engaging — your perspective is extremely valuable.
>
>
>
>
>
>
>
>
>
> On Wed, Dec 3, 2025 at 10:22 PM Thiago Macieira via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
> On Wednesday, 3 December 2025 03:38:57 Pacific Standard Time Kamalesh
> Lakkampally via Std-Proposals wrote:
> > The goal is to support workloads where the execution order of micro-tasks
> > changes dynamically and unpredictably every cycle, such as *event-driven
> > HDL/SystemVerilog simulation*.
> > In such environments, conventional C++ mechanisms (threads, coroutines,
> > futures, indirect calls, executors) incur significant pipeline
> redirection
> > penalties. Fetch-only instructions aim to address this problem in a
> > structured, language-visible way.
> >
> > I would greatly appreciate *feedback, criticism, and suggestions* from
> the
> > community.
> > I am also *open to collaboration.*
>
> Can you explain why this should be in the Standard? Why are the prefetch
> intrinsics available as compiler extensions for a lot of architectures not
> enough? My first reaction is that this type of design is going to be very
> architecture-specific by definition, so using architecture-specific
> extensions
> should not be an impediment.
>
> --
> Thiago Macieira - thiago (AT) macieira.info
> <https://urldefense.us/v3/__http:/macieira.info__;!!Fqb0NABsjhF0Kh8I!ds4TmlnUR3HOjbM1tqX9-z0t9wQ_nycW2FAesvQrFCjO_zmCC0zHqKeuUhLgM4zDQODE_PGWjrlB6uvm8sb9qpRHdb1jbfiyjPrE3A$>
> - thiago (AT) kde.org
> <https://urldefense.us/v3/__http:/kde.org__;!!Fqb0NABsjhF0Kh8I!ds4TmlnUR3HOjbM1tqX9-z0t9wQ_nycW2FAesvQrFCjO_zmCC0zHqKeuUhLgM4zDQODE_PGWjrlB6uvm8sb9qpRHdb1jbfhL3Q_pjg$>
> Principal Engineer - Intel Data Center - Platform & Sys. Eng.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> <https://urldefense.us/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/std-proposals__;!!Fqb0NABsjhF0Kh8I!ds4TmlnUR3HOjbM1tqX9-z0t9wQ_nycW2FAesvQrFCjO_zmCC0zHqKeuUhLgM4zDQODE_PGWjrlB6uvm8sb9qpRHdb1jbfipwWQT2w$>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> <https://urldefense.us/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/std-proposals__;!!Fqb0NABsjhF0Kh8I!cpAdSa8276cm8w5XL1eZCHz6f4bI9Ovm1HN1NMpukyUIl8QDZgFKBhhDtXh_VvDwalPV6wEVUZGmFQ2tF0MIJc_ytg$>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2025-12-05 04:40:46