ISOCPP std-proposals List: Re: [std-proposals] Core-Language Extension for Fetch-Only Instruction Semantics

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Thu, 4 Dec 2025 13:08:43 +0100

Could you sketch (e.g. pseudo-code) how the result would look like in assembler? You promise to get rid of overhead. It can be probably done on the C++ side with nice new syntax. But how should it actually work? Example: It is meant for a future special CPU. This CPU has a register/counter pointing to a next micro-function list and a special instruction to jump to the next entry in the list and increase the counter. Current CPUs could do similarly. Is that what you mean? [There would be some performance questions to be solved, instruction caching, list manipulation.] Would the execution be really much faster than a manual solution today? -----Ursprüngliche Nachricht----- Von:Kamalesh Lakkampally via Std-Proposals <std-proposals_at_[hidden]> Gesendet:Do 04.12.2025 13:07 Betreff:Re: [std-proposals] Core-Language Extension for Fetch-Only Instruction Semantics An:marcinjaczewski86_at_[hidden]; CC:Kamalesh Lakkampally <founder_at_[hidden]>; std-proposals_at_[hidden]; Hello All, Today, in standard C++, the control-flow structure of a function is always statically bounded. Even if there are multiple possible branches, all possible successor functions are known at compile-time. For example: function1 → function2 → function3 ↘ function4 Here the compiler knows that function2 can transfer control only to: * function3, or * function4. This is a fundamental assumption behind C++: All possible control-flow destinations are statically known, even if the choice is dynamic. But in my case , successor is not statically bounded . The model I’m working with looks like this: function1 → function2 → (next function is chosen at runtime) At runtime, not at compile time, the program may write the address of: * function3, or * function4, or * functionN, or * any function from a dynamically created set. In other words: The set of possible successors is not known statically. It is constructed at runtime and may vary extremely frequently. This is very different from C++’s normal branching model, where all successors are part of the program’s fixed control-flow graph. In ordinary C++ you can write: void (*next)() = pick_at_runtime(); next(); But semantically: * this is still a call (not a sequencing directive), * it requires participating in the full call/return convention, * it forces the compiler to preserve ABI boundaries, * it prevents more aggressive sequencing optimizations, * it hides the structure of the scheduling inside a dispatcher loop, * and it mixes the notion of “control flow” with “function invocation semantics.” In my case: * the tasks are extremely small (micro-tasks) * the schedule changes constantly * the dispatcher overhead dominates * and there is value in separating sequencing from execution This motivates a model where the schedule itself is represented as: runtime-constructed sequencing metadata, not as a chain of function calls. The key limitation we are addressing is: C++ has no way to express “the next function to evaluate will be decided at runtime from an open-ended set,” without routing through a dispatcher and performing a normal call. Every dispatcher loop today must: * call a function via ABI rules * return * then load the next target * then call again * and repeat For extremely fine-grained tasks, this overhead dominates the actual work. What I am exploring is whether the C++ abstract machine could support a separate semantic layer for representing: the dynamic schedule itself as a first-class, side-effect-free structure that describes successor relationships chosen at runtime. This allows implementations to: * optimize the walking of that schedule, * avoid repeated dispatcher transitions, * or use specialized fast paths if available. Whenever there is actual data flow between tasks, the ABI naturally governs how parameters and registers are passed. But in cases where the sequencing construct itself carries no program-visible data, implementations are free to realize it using whichever internal mechanism is most efficient. The key idea is that the sequencing construct remains semantically side-effect-free, while giving implementations room to optimize the execution strategy. Best Regards, Kamalesh Lakkampally, Founder & CEO www.chipnadi.com <http://www.chipnadi.com> On Thu, Dec 4, 2025 at 4:36 PM Marcin Jaczewski <marcinjaczewski86_at_[hidden] <mailto:marcinjaczewski86_at_[hidden]> > wrote: czw., 4 gru 2025 o 06:23 Kamalesh Lakkampally via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > napisał(a): > > Hello everyone, > > All of the messages I sent were written by me personally. I did not use an LLM to generate the content. I typed everything myself, in plain English, doing my best to explain the concept in detail with headlines and with paragraph style. > > Thank you for the detailed questions and discussions. Let me clarify the concept more precisely in terms of the C++ abstract machine, because several responses show that my original message was unclear. > > 1. What are “fetch-only operations” conceptually? > > They are sequencing operations, not computations. > A fetch-only operation does not: > > load from memory > > store to memory > > execute user code > > allocate resources > > produce side effects > I would disagree that it is not producing "side effects". It changes the state of the CPU and that it is detectable. Whole point of it is to have faster code, right? Only when considering an abstract machine does that operation do not it have effects as speed is more of a concern for compiler than abstract machine. And this would be prefered to ask the compiler to try to add prefechs to speed code up? This could be hinted at by the `[[prefech]]` attribute. > Instead, it carries a piece of metadata describing what should execute next(i.e where control goes) , and the evaluation order is adjusted accordingly. > > 2. What problem does this solve? > > In event-driven execution models (e.g., hardware simulators or fine-grained task schedulers), programs often end up in patterns like: > > dispatcher(); > → T1(); > → dispatcher(); > → T2(); > → dispatcher(); > → T3(); > ... > > The intent is simply: > > T1 → T2 → T3 → … > > > but the actual control flow repeatedly returns through the dispatcher, causing excessive sequencing overhead. --> I would be very interested in hearing what existing C++ techniques the committee considers suitable to express this kind of dynamic sequencing without repeatedly returning through a dispatcher. > > > 3. Before / After example (as requested) > > Today: > > void scheduler() { > > while (!q.empty()) { > auto t = q.pop_front(); > t(); // call and return repeatedly > } > } > > > Abstract machine control-flow: > > scheduler → T1 → scheduler → T2 → scheduler → T3 → … > > > With fetch-only sequencing operations: > > fetch_schedule(q); // purely sequencing metadata > > execute_scheduled(); // walks metadata in-order > > ... -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-12-04 12:23:14