C++ Logo

sg16

Advanced search

Re: Undated reference to Unicode Standard and UAX #29

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Sun, 7 Jan 2024 12:12:21 +0000
On Sun, 7 Jan 2024 at 12:07, Robin Leroy <eggrobin_at_[hidden]> wrote:

>
> On the specific point mentioned in the title of this email thread:
>
> On Fri, Jan 5, 2024 at 5:54 PM Steve Downey <sdowney_at_[hidden]> wrote:
>
>> Did we actually specify something for C++23 that depends on or provides
>> the breaking algorithms?
>>
> Le ven. 5 janv. 2024 à 17:59, Corentin via SG16 <sg16_at_[hidden]> a
> écrit :
>
>> std::format width estimation requires clustering
>>
>
> [format.string.std], paragraph 13
> <https://eel.is/c++draft/format.string.std#13> reads
>
>> For a sequence of characters in UTF-8, UTF-16, or UTF-32, an
>> implementation should use as its field width the sum of the field widths of
>> the first code point of each extended grapheme cluster. Extended grapheme
>> clusters are defined by UAX #29 of the Unicode Standard. […]
>
>
> That is *should*, not *shall*, so is there really a conformance
> requirement here, regardless of the version meant by the phrase “the
> Unicode Standard”?
> See also this discussion from the 2022-11-02 meeting of SG 16
> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2022.md#november-2nd-2022>
> :
>
> - Hubert asked why the reference for extended grapheme cluster is
> non-normative.
> - Jens replied that he thinks UAX #29
> <https://unicode.org/reports/tr29> is only referenced to satisfy
> normative encouragement for an implementation direction.
> - Charlie expressed agreement with Jens' recollection.
>
>
Yes, that's a good point. If I use the field width of the first code point
in <some cluster that bears a resemblance to an extended grapheme cluster
as described by Unicode> then that's still conforming.

So a best effort, or an older definition of extended grapheme cluster, is
better than nothing.

Received on 2024-01-07 12:12:37