C++ Logo


Advanced search

Re: Referencing the Unicode Standard

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 6 Dec 2022 12:40:21 +0100
On 06/12/2022 11.07, Corentin wrote:
> Thanks all,
> I did update the paper to include the whole standard and get rid of the annexes references (except for UAX31, I think Steve is working on something?)
> for __STDC_ISO_10646__, if we want to say something more meaningful, I'll need help with wording.
> But we need to consider that this thing is usually defined by C and the C standard library.
> And even if we got it to reference Unicode, how is that value useful to users?
> Does it mean \N and UAX31 knows about unicode version X? Does it mean the C library has some support in classification functions? fmt? algorithms?
> If we stick to the current version of the definition ("when stored in an object of type wchar_t, has the same value as the code point of that character"), then
> we only care about whether it is greater or lower than 2003XX

Agreed, the current definition "when stored in an object of type wchar_t"
is unrelated to recent Unicode evolution.

> Jens:
>> It would be helpful to retain a reference to the place where UTF-8 etc. is defined.
> We use UTF-8 as well as other terminology without reference or pre-introduction throughout the standard.
> Do you think we should keep this one specific reference in [depr.locale.stdcvt.req]?

Hm... Ok.

Should we add entries in [intro.defs] along the lines of

-> the UTF-8 encoding form as specified in The Unicode Standard, section X


Same for UTF-16 and UTF-32.
I feel we should make an effort to relate these to the underlying standard.

If we set a precedent that we "inherit" definitions from normative references
without explicit mention, we might end up in a position where we "inherit"
all kinds of undesirable junk from Unicode and other standards we reference.

I don't feel strongly here, but wanted to have that option discussed.


> On Sun, Dec 4, 2022 at 5:46 PM Mark Zeren <mzeren_at_[hidden] <mailto:mzeren_at_[hidden]>> wrote:
> __ __
> [cpp.predefined]
> "The specified year and month are implementation-defined."
> -> "The value is implementation-defined."
> I would prefer if we could repurpose that macro to refer to the supported
> Unicode version instead (by its release year/month).____
> __ __
> [mzeren] Could we say that any date past a well-known, yet to be determined, date refers instead to Unicode?
> ____

Received on 2022-12-06 11:40:27