C++ Logo


Advanced search

Re: [SG16] Draft Named Escape Sequences

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Wed, 3 Nov 2021 19:07:21 +0100
On 03/11/2021 03.07, Steve Downey via SG16 wrote:
> Updated paper with wording for UCN form of named unicode characters, with changes as suggested by Jens. This reflects the strongest consensus in EWG for exact matches.


The prose text of the paper says:

"The floating reference to ISO/IEC 10646 indicates a dependence on the version that is
current at the time of standardization. Thus, conformance with the C++ standard will
require conformance with the latest available publication of ISO/IEC 10646."

This means all implementations of C++ will become non-conforming the instant
a new version of ISO 10646 is published. I don't think we should permit a
foreign entity such as the ISO committee for ISO 10646 to render C++
implementations non-conforming in a whim.

Put differently, a compiler can no longer meaningfully state
"conforming to ISO 14882:2020" (because that's a moving target);
it always needs to say "conforming to ISO 14882:2020 and
ISO 10646:2017" or so.

I think we should do better by hardcoding the ISO 10646 version
used for the character names in each C++ release.
Yes, that require a manual (or editorial?) update for each new
release of C++, but that's not unlike our reference to the C standard,
and helps maintain overall sanity.

> This doesn't mention the latest CVE, Trojan Source, which might have mitigations using named UCNs, where it would be much more obvious what shenanigans were being played with RTL modifications. I will be prepared to talk about that, briefly.

Hex-based UCNs are equally fine to avoid strange games with left-to-right modifiers.


Received on 2021-11-03 13:07:29