Date: Tue, 02 Jun 2026 22:26:47 +0100
On 2 June 2026 21:44:00 BST, Adrian Johnston via Std-Proposals <std-proposals_at_[hidden]> wrote:
>Recently (2026-02-23) the ISO C++ Directions Group (DG) / WG21 published a
>document:
>
>Strategic Direction for AI in C++: Governance, and Ecosystem
>https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4023r0.pdf
>
>As one of its findings it identified a problem with "Garbage In, Garbage
>Out".
>
>*The DG sees or recognizes a critical "Garbage In, Garbage Out" problem
>facing C++ developers using AI. Current models are trained on legacy C++
>(C++98/03), vendor-specific dialects, and unsafe patterns found online.*
>
>
>I'd say this is an understatement.
>
>What I am observing is that high quality websites like
>https://en.cppreference.com/ are blocking AI search tools because they
>don't generate advertising revenue. And so my AI (Claude) routinely ends
>searching for online posts made by people who are confused and asking for
>help and getting terse responses that may be incomplete at best.
>
>Next, if I ask Claude what data it was given about the C++ standard, it
>says it was trained on "commentary, documentation, and discussion during
>training — not verbatim text." It can identify final drafts like N4950 as
>being available, but for some reason it needs to be explicitly encouraged
>to consult that document.
>
>In general, the AI companies are being very careful to avoid been seen to
>use copywritten data like the C++ standard.
Are you serious?
>If we want AI generated responses and AI generated code to be as modern and
>correct as possible, I think it would make sense to release the copyright
>to the AI companies to use in training. And then insist they used that
>information as purveyors of programming tools.
I see no chance of that ever happening, not that it would stop AI companies. At the very least the standard draft is hosted on github, and I expect that many LLMs already train on that.
I am somewhat sympathetic of the notion of overly strong copyright being a blocker of progress. But if society decides that copyright should be weakened, then it should be done for the benefit of everyone, not just for AI companies.
>If it is well known that there is no barrier to training an AI correctly on
>the most recent C++ standard and that users should expect verbatim
>information, and standards aware code from their AI, then I would hope for
>some improvement on the current situation. It is very easy to add RLHF
>training data if the AI company is allowed to use the standard to create it.
>
>Oddly enough, Claude is capable of providing more modern code when
>requested. In general, I find AI has a serious issue where (for no reason)
>it assumes your software may be 10 years out of date, unless told otherwise.
>
>Regards,
>Adrian Johnston
>Recently (2026-02-23) the ISO C++ Directions Group (DG) / WG21 published a
>document:
>
>Strategic Direction for AI in C++: Governance, and Ecosystem
>https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4023r0.pdf
>
>As one of its findings it identified a problem with "Garbage In, Garbage
>Out".
>
>*The DG sees or recognizes a critical "Garbage In, Garbage Out" problem
>facing C++ developers using AI. Current models are trained on legacy C++
>(C++98/03), vendor-specific dialects, and unsafe patterns found online.*
>
>
>I'd say this is an understatement.
>
>What I am observing is that high quality websites like
>https://en.cppreference.com/ are blocking AI search tools because they
>don't generate advertising revenue. And so my AI (Claude) routinely ends
>searching for online posts made by people who are confused and asking for
>help and getting terse responses that may be incomplete at best.
>
>Next, if I ask Claude what data it was given about the C++ standard, it
>says it was trained on "commentary, documentation, and discussion during
>training — not verbatim text." It can identify final drafts like N4950 as
>being available, but for some reason it needs to be explicitly encouraged
>to consult that document.
>
>In general, the AI companies are being very careful to avoid been seen to
>use copywritten data like the C++ standard.
Are you serious?
>If we want AI generated responses and AI generated code to be as modern and
>correct as possible, I think it would make sense to release the copyright
>to the AI companies to use in training. And then insist they used that
>information as purveyors of programming tools.
I see no chance of that ever happening, not that it would stop AI companies. At the very least the standard draft is hosted on github, and I expect that many LLMs already train on that.
I am somewhat sympathetic of the notion of overly strong copyright being a blocker of progress. But if society decides that copyright should be weakened, then it should be done for the benefit of everyone, not just for AI companies.
>If it is well known that there is no barrier to training an AI correctly on
>the most recent C++ standard and that users should expect verbatim
>information, and standards aware code from their AI, then I would hope for
>some improvement on the current situation. It is very easy to add RLHF
>training data if the AI company is allowed to use the standard to create it.
>
>Oddly enough, Claude is capable of providing more modern code when
>requested. In general, I find AI has a serious issue where (for no reason)
>it assumes your software may be 10 years out of date, unless told otherwise.
>
>Regards,
>Adrian Johnston
Received on 2026-06-02 21:26:53
