C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sat, 6 Jan 2024 21:57:06 +0100
On 06/01/2024 21.28, Steve Downey via SG16 wrote:
> Neither is it useful to say that, picking some changes from the last few years, that C++{X} must not understand gender modifiers for emoji, must split Chinese glyphs incorrectly, and fail to understand Korean where it overlaps with other Asian languages.

> Not being able to process text is, however, in my opinion, strictly worse than dealing with the fact that the formal specification mechanisms are not adequate for describing what we need.

That's not the point.

The point is whether a reference to (say) "C++23" is well-defined and stable over
time. That's a major feature of a dated revision of a standard: It's stable,
and its feature set doesn't change. Thus, people can rely on the feature set
(and its boundaries), and it's meaningful to talk about "my program works
with a C++23 implementation" or "my program needs C++26".

If the meaning of the utterance "C++23" changes over time, we're doing
our customers a major disservice.

When we made the undated reference to Unicode, I think we were more focused on
permitting additional code points to be handled by newer compilers, which is
a rather small and well-controllable uncertainty. (Don't use newer scripts/
code points if you want to stay compatible with all C++23 implementations.)
Discovering that some Unicode algorithms that we happen to rely on got
changed in rather impactful ways is not that.

Jens



> On Sat, Jan 6, 2024, 15:02 Jonathan Wakely via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
>
>
> On Sat, 6 Jan 2024, 19:03 Tom Honermann, <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>
> __
> On 1/5/24 11:26 AM, Jonathan Wakely via SG16 wrote:
>
> o
>
>
> o *Poll 4: [FR-010-133][FR-021-013]: SG16 recommends resolving these comments by restricting all references to the Unicode Standard to the version that corresponds to the referenced version of ISO/IEC 10646.*
> Attendees: 9 (1 abstention)
> SF
> F
> N
> A
> SA
> 2
> 3
> 0
> 3
> 0
>
> No consensus.
> A: It doesn't benefit the community to reference a Unicode version that is outdated by the time the standard is published.
>
>
> Except that it provides a baseline of what's supported as valid syntax. The set of valid code points that can be named by a universal character name in C++23 is unclear. According to the standard, the day that a new unicode standard is published, it should be possible to use any new code points in a C++23 program. Even if one implementation supports that, it creates a portability trap because other "C++23" implementations might not know those character names yet. (But maybe I'm missing some context here, and the relevant parts of the unicode standard that we depend on are stable between versions? That doesn't seem to be true for UAX #29.)
>
> Something like recommended practice to use "at least version 15.0.0" would give users a baseline they can rely on, and caution them that relying on anything newer might not be portable.
>
> Another way to look at this is just to say that there is no real portability or strict conformance in practice, and you can only ever rely on what your implementations happen to support. So this unstable normative reference isn't a problem. But that seems more fitting for a living standard like the HTML spec, rather than an ISO standard with fixed editions.
>
>
>
>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
>
>

Received on 2024-01-06 20:57:12