C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Robin Leroy <eggrobin_at_[hidden]>
Date: Sat, 6 Jan 2024 17:52:08 +0100
Le ven. 5 janv. 2024 à 17:58, Corentin via SG16 <sg16_at_[hidden]> a
écrit :

> The important part is that Unicode and its annexes represent an
> indivisible whole
> so it would not be conforming to, for example, update the properties
> tables but not the
> algorithms that use them.
>
That is a very important point.

There are occasional cases where it is OK to do that kind of
mixing-and-matching, as alluded to in the note in UAX31-C1
<https://www.unicode.org/reports/tr31/#C1> and in the example in UAX31-R1b
<https://www.unicode.org/reports/tr31/#R1b>, or the longer-term discussion
in Section 3.3.2 of UTS #55
<https://www.unicode.org/reports/tr55/#Evolution-Unicode-3>, but indeed in
general one should use the properties corresponding to the algorithm and
definitions; this is certainly the case for the segmentation algorithms.

As alluded to in https://www.unicode.org/versions/index.html#Citations, one
thing that is useful if allowing for older versions is to require a minimum
(you probably do not want someone to unearth a copy of UAX #29 from Unicode
4.0 and use that as part of a C++23 implementation).
This is for instance what Ada does, see AARM22 2.1(17)
<http://www.ada-auth.org/standards/22aarm/html/AA-2-1.html#p17>.
(Note that the relevant text of the Ada standard will change in a future
version to actually say “Unicode”, rather than “documents referenced[ by
ISO 10646]”; see
http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai22s/ai22-0073-1.html?rev=1.4 by
your friendly neighbourhood Unicode liaison officer.)

Le ven. 5 janv. 2024 à 17:47, Mark de Wever via SG16 <sg16_at_[hidden]>
a écrit :

> Recently I wanted to update libc++ to Unicode 15.1.0 and noticed the
> same changes you did. I put this on hold since I need to investigate the
> required ABI tags. Otherwise I would have implemented these changes for
> libc++18.
>
Mark, please contact me if you have any questions or run into any trouble
while implementing that; besides being the liaison officer to SC 22 I am
heavily involved in the maintenance of the implementations of rule-based
segmentation in ICU and ICU4X, and I am the editor for UAX #14 (though not
UAX #29, that is Josh Hadley’s), so I might be able to help—and feedback
from implementers is always useful.

Best regards,

Robin Leroy

Received on 2024-01-06 16:52:28