C++ Logo


Advanced search

Re: SG16 meeting summary for September 28th, 2022

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sun, 2 Oct 2022 23:45:08 +0200
On the second poll, I'll copy the message I sent before the meeting

There are further issues here.
The width of grapheme is independent of encodings.
We are just not forcing implementation not to decode. Is that what we want?
I don't think it is useful.
Most encodings cannot represent any of the wide codepoints, the wideness of
codepoints in shift jis can be derived without doing a full decoding.
Suggested resolution:
For a string decoded to a sequence of unicode codepoints, its width is the
sum of estimated widths of the first code points in its extended grapheme
If the intent is for implementers to throw their hands in the air when the
encoding is not "a unicode encoding", then surely
we want to support UTF-8/16/32 and that's it. UTF-EBCDIC isn't more
important or special than shift-jis and there is no reason for one encoding
to have privileged handling over the other.
More generally, any unicode that can round trip through Unicode should
qualify as Unicode encoding, but I don't think we have a definition of that
Unicode defines Unicode Encoding Form
> A character encoding form that assigns each Unicode scalar value to a
unique code unit sequence
Ie, I don't think the poll solves anything, it just uses a different
terminology to describe the same thing (nothing in iso 10646 leads me to
believe that "ucs encoding scheme" can only designate ucs encoding forms
specified in iso 10646 - in addition of being obscure terminology).
On GB18030, it's a different character set, with its own set of encodings
On Sun, Oct 2, 2022 at 10:51 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:
> The summary for the SG16 meeting held September 28th, 2022 is now
> available.  For those that attended, please review and suggest corrections.
>    - https://github.com/sg16-unicode/sg16-meetings/#september-28th-2022
> Two polls were taken during this meeting.
> The first was for LWG #3767 (codecvt<charN_t, char8_t, mbstate_t>
> incorrectly added to locale <https://cplusplus.github.io/LWG/issue3767>)
> to establish consensus on whether the codecvt facets mentioned in the
> issue are intended to be locale sensitive. The established position has
> been conveyed to LWG via GitHub issue 1310
> <https://github.com/cplusplus/papers/issues/1310>.
> The second was for LWG #3412 (ยง[format.string.std] references to "Unicode
> encoding" unclear <https://cplusplus.github.io/LWG/issue3412>) to
> establish consensus on a direction for a proposed resolution.
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2022-10-02 21:45:22