C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Core-Language Extension for Fetch-Only Instruction Semantics

From: Peter Bindels <dascandy_at_[hidden]>
Date: Wed, 3 Dec 2025 14:26:44 +0100
Hi,

On Wed, Dec 3, 2025 at 12:58 PM Kamalesh Lakkampally via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> *Hello all,*
> My name is *Kamalesh Lakkampally,* founder of an EDA startup, I am
> submitting draft proposal (*PnnnnR0*) for consideration by the Evolution
> Working Group (EWG).
>
> The goal is to support workloads where the execution order of micro-tasks
> changes dynamically and unpredictably every cycle, such as *event-driven
> HDL/SystemVerilog simulation*.
> In such environments, conventional C++ mechanisms (threads, coroutines,
> futures, indirect calls, executors) incur significant pipeline redirection
> penalties. Fetch-only instructions aim to address this problem in a
> structured, language-visible way.
>
> II. Introduction
>
> This paper proposes a core-language extension introducing fetch-only
> instructions—metadata-carrying constructs handled entirely within the
> instruction-fetch stage, using a dedicated path that never enters the
> normal execution pipeline.
>
> The design targets workloads with highly dynamic, micro-task–driven
> execution patterns, such as SystemVerilog simulation, where pipeline
> redirection penalties dominate performance.
> III. Scenario Description
>
> In event-driven engines such as HDL simulators, the execution order of
> micro-tasks changes dynamically based on events.
>
> Example Queue State 1:
> Arr[0] = T1
> Arr[1] = T2
> Arr[2] = T3
> Execution order: T1 → T2 → T3
>
> Queue State 2 (after events):
> Arr[0] = T3
> Arr[1] = T4
> Arr[2] = T1
> Arr[3] = T2
> Execution order: T3 → T4 → T1 → T2
>
> These reorderings may occur very frequently. Existing hardware and
> language mechanisms must execute branches, indirect jumps, or scheduler
> calls—each causing pipeline flushes, mispredictions, and stalls.
> IV. Limitations of Existing C++ Features
>
> Modern C++ provides threads, executors, coroutines, futures, task tables,
> and function-pointer invocation. All share a limitation: control-flow
> changes occur in the execution pipeline.
>
> Impacts:
> - Indirect calls cause branch mispredictions.
> - Dynamic reordering causes repeated pipeline flushes.
> - Coroutines incur state-machine execution overhead.
> - std::thread/std::async rely on OS scheduling, unsuitable for micro-task
> switching.
> - Executors/work-stealing operate in software and still rely on
> pipeline-based jumps.
>
> For workloads with thousands of per-cycle task transitions, these
> penalties accumulate into a dominant bottleneck.
>
I do not recognize a problem that I can see in C++ code, for which a
solution could be found. Can you give concrete examples instead?

> V. Proposed Solution: Fetch-Only Instructions
>
> We propose a C++ core-language abstraction mapping to architectural
> fetch-only instructions.
>
> Characteristics:
> - Processed entirely in the fetch stage.
> - Never decoded or executed.
> - Carry: instruction_address, thread_context, execution_context.
> - Redirect the fetch PC without pipeline involvement.
> - Avoid many branch misprediction and call/jump penalties.
>
> Provisional C++ syntax:
> fad q[i] = address_of(task);
> fcd q[i] = thread_context;
> fed q[i] = exec_context;
>
> The fetch subsystem uses these metadata entries to follow the dynamic
> queue order with minimal redirection cost.
>
> For example, updating q[] enables hardware to execute:
> T3 → T4 → T1 → T2
> without pipeline flushes.
> VI. Required Changes in C++ and OS
>
> A. C++ Core Language
> Introduce new semantic category of instructions (fetch-only) with defined
> behavior:
> - Visible to the abstract machine as fetch-control constructs.
> - No execution-stage participation.
> - No side effects beyond fetch redirection.
>
C++ doesn't have instructions. Adding a category of instructions does not
make sense since it doesn't have any to begin with.


> B. Abstract Machine Model
> Extend program order to include fetch-stage–only redirections.
> Clarify that these operations do not constitute observable side effects.
>
How would the abstract machine deal with CPU implementation level detail
specifics? It's the *abstract* machine - not tied to details of how you
might implement something.


> C. Fetch-Only Memory Region (OS-Level Support)
> A new region in the virtual address space, analogous to:
> - Stack (for call frames)
> - Heap (for dynamic objects)
>
Fetch-Only Region properties:
> - Holds fetch-only metadata (instructions addresses + 8-bit
> thread/execution context).
> - MMU-enforced write validation.
> - Stricter rules than normal memory.
> - Prevents unauthorized metadata forging.
>
How would that region exist? What is it for? This reads like wishful
thinking or, honestly, keyword vomit.


> D. MMU / Validation
> MMU enforces:
> - Whether context bits may be updated.
> - Whether address updates are permitted.
> - Masking/rejecting invalid writes.
>
That's indeed part of what an MMU does ... but how does this relate to
anything?


> E. Optional Function Prologue Context Check
> Functions may embed an 8-bit comparison to ensure valid entry conditions.
>
> These changes enable fetch-only instructions as a safe and efficient
> core-language construct.
>
To do what?!

Regards,
Peter

Received on 2025-12-03 13:26:57