ISOCPP sg16 List: Re: Undated reference to Unicode Standard and UAX #29

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Sat, 6 Jan 2024 21:29:25 +0000

On Sat, 6 Jan 2024 at 20:29, Steve Downey <sdowney_at_[hidden]> wrote:

> Neither is it useful to say that, picking some changes from the last few
> years, that C++{X} must not understand gender modifiers for emoji, must
> split Chinese glyphs incorrectly, and fail to understand Korean where it
> overlaps with other Asian languages.
>

If we recommended "at least version 15.0.0" then nothing would say C++
"must not" use a later version. But it would guarantee a minimum set of
features. This was previously suggested by the SG16 chair at
https://github.com/cplusplus/draft/pull/5826#issuecomment-1306138028

The next comment quoting the ISO directives is interesting, as it
specifically calls out that the undated Unicode standard reference implies
"it will be possible to use all future changes of the referenced document
for the purposes of the [C++" and that "it is understood that the reference
will include all amendments to and revisions of the referenced document".
When I reviewed P2736R2 ("Referencing The Unicode Standard") those
implications were not obvious to me, in particular the abstract says "This
proposal has no impact on implementations." lol. lmao.

> Not being able to process text is, however, in my opinion, strictly worse
> than dealing with the fact that the formal specification mechanisms are not
> adequate for describing what we need.
>

When I push
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642000.html
Libstdc++ will avoid breaking inside an extended grapheme cluster
consisting of two InCB=Consonant characters joined by an InCB=Linker
character. MSVC and Libc++ will not avoid breaking those clusters (as far
as I can tell). It's unclear to me that this implementation divergence is
due to our specification mechanisms being inadequate. Our specification
seems clear(ish) that such clusters should not be broken. The problem is
that the specification requires non-trivial updates to shipping products,
and so introduces a dependency for compiler vendors on the Unicode standard
in a manner which was not obvious to me when we approved P2736R2. Obviously
vendors of web browsers and many other products need to consider Unicode
standard updates in their project plans, but it's new for C++
implementations to be coupled to it this way.

Nobody is saying that C++ should be unable to process text though.

>
> On Sat, Jan 6, 2024, 15:02 Jonathan Wakely via SG16 <sg16_at_[hidden]>
> wrote:
>
>>
>>
>> On Sat, 6 Jan 2024, 19:03 Tom Honermann, <tom_at_[hidden]> wrote:
>>
>>> On 1/5/24 11:26 AM, Jonathan Wakely via SG16 wrote:
>>>
>>> -
>>> - *Poll 4: [FR-010-133][FR-021-013]: SG16 recommends resolving
>>> these comments by restricting all references to the Unicode Standard to the
>>> version that corresponds to the referenced version of ISO/IEC 10646.*
>>> Attendees: 9 (1 abstention)
>>> SF
>>> F
>>> N
>>> A
>>> SA
>>> 2
>>> 3
>>> 0
>>> 3
>>> 0
>>> No consensus.
>>> A: It doesn't benefit the community to reference a Unicode
>>> version that is outdated by the time the standard is published.
>>>
>>>
>> Except that it provides a baseline of what's supported as valid syntax.
>> The set of valid code points that can be named by a universal character
>> name in C++23 is unclear. According to the standard, the day that a new
>> unicode standard is published, it should be possible to use any new code
>> points in a C++23 program. Even if one implementation supports that, it
>> creates a portability trap because other "C++23" implementations might not
>> know those character names yet. (But maybe I'm missing some context here,
>> and the relevant parts of the unicode standard that we depend on are stable
>> between versions? That doesn't seem to be true for UAX #29.)
>>
>> Something like recommended practice to use "at least version 15.0.0"
>> would give users a baseline they can rely on, and caution them that relying
>> on anything newer might not be portable.
>>
>> Another way to look at this is just to say that there is no real
>> portability or strict conformance in practice, and you can only ever rely
>> on what your implementations happen to support. So this unstable normative
>> reference isn't a problem. But that seems more fitting for a living
>> standard like the HTML spec, rather than an ISO standard with fixed
>> editions.
>>
>>
>>
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2024-01-06 21:29:40