C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Strategic Direction for AI in C++: Governance, and Ecosystem

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Wed, 3 Jun 2026 10:44:03 -0400
On Tue, Jun 2, 2026 at 4:44 PM Adrian Johnston via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> Recently (2026-02-23) the ISO C++ Directions Group (DG) / WG21 published a document:
>
> Strategic Direction for AI in C++: Governance, and Ecosystem
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4023r0.pdf
>
> As one of its findings it identified a problem with "Garbage In, Garbage Out".
>
> The DG sees or recognizes a critical "Garbage In, Garbage Out" problem facing C++ developers using AI. Current models are trained on legacy C++ (C++98/03), vendor-specific dialects, and unsafe patterns found online.
>
>
> I'd say this is an understatement.
>
> What I am observing is that high quality websites like https://en.cppreference.com/ are blocking AI search tools because they don't generate advertising revenue. And so my AI (Claude) routinely ends searching for online posts made by people who are confused and asking for help and getting terse responses that may be incomplete at best.
>
> Next, if I ask Claude what data it was given about the C++ standard, it says it was trained on "commentary, documentation, and discussion during training — not verbatim text." It can identify final drafts like N4950 as being available, but for some reason it needs to be explicitly encouraged to consult that document.
>
> In general, the AI companies are being very careful to avoid been seen to use copywritten data like the C++ standard.
>
> If we want AI generated responses and AI generated code to be as modern and correct as possible, I think it would make sense to release the copyright to the AI companies to use in training. And then insist they used that information as purveyors of programming tools.
>
> If it is well known that there is no barrier to training an AI correctly on the most recent C++ standard and that users should expect verbatim information, and standards aware code from their AI, then I would hope for some improvement on the current situation. It is very easy to add RLHF training data if the AI company is allowed to use the standard to create it.
>
> Oddly enough, Claude is capable of providing more modern code when requested. In general, I find AI has a serious issue where (for no reason) it assumes your software may be 10 years out of date, unless told otherwise.

This seems to be a very poorly researched suggestion. The problem of
AI code generation for C++ being trained on poor C++ techniques is a
problem. But I don't see any research into how the suggested changes
would help anything. Is copyright what is stopping AI companies? Is
the ISO C++ committee not "insisting" that AI companies train their
code generators on better C++ idioms?

Basically, you seem to be just guessing at solutions. If you want to
solve this problem, you need to investigate the practices of AI
companies and actually find out what they need to solve it. And of
course whether the actions of the ISO C++ Committee can even
meaningfully influence this.

Until that research/investigation is done, this is just spitballing.

Received on 2026-06-03 14:44:15