C++ Logo


Advanced search

[SG16] Agenda for the 2021-10-06 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 1 Oct 2021 13:40:15 -0400
*Please note that there has been a schedule change.* The previously
scheduled telecon for 2021-10-13 has been moved earlier to 2021-10-06.
This change was made to accommodate schedule restrictions for the author
of the two papers on the agenda below. The shared calendar has been
updated (which triggered the sending of new meeting invitations).

SG16 will hold a telecon on Wednesday, October *6th* (not the 13th) at
19:30 UTC (timezone conversion

The agenda is:

  * D2460R0: UTF-16 is standard practice
  * D1885R8: Naming Text Encodings to Demystify Them
      o Discuss and poll issues recently raised on the LEWG and SG16
        mailing lists.

D2460 is first on the agenda because establishing consensus on it will
reduce complications for P1885. We'll plan to spend 30 minutes on D2460
and the remainder of our time on P1885.

D2460R0 seeks to address SG16 issue 9
<https://github.com/sg16-unicode/sg16/issues/9> (Requiring wchar_t to
represent all members of the execution wide character set does not match
existing practice). Please read through the comments in that issue.

P1885 is back on the agenda to discuss issues raised on the LEWG and
SG16 mailing lists. The relevant email threads are linked below; there
have been a lot.

  * SG16: Feedback re: P1885R5: Naming Text Encodings
      o Naming issues (to be deferred to LEWG):
          + "mib" vs "mib_enum" vs something else.
          + Preservation of the "cs" prefix
  * SG16: P1885: Naming text encodings: Curation and provenance of
    aliases <https://lists.isocpp.org/sg16/2021/09/2564.php>
      o Implementation lenience with regard to registered aliases.
      o Ambiguities between encoding "standards".
  * SG16: P1885: Naming text encodings: Encodings in the environment
    versus registered character sets
      o Latitude for implementations to consider slightly divergent
        encodings a match for an IANA registered character set.
      o Latitude for use of encodings such as UTF-8 with wchar_t elements.
      o Whether the IANA registry constitutes a sufficient source of
        identified encodings.
  * SG16: P1885: Naming text encodings: problem+solution re: charsets,
    octets, and wide encodings
      o Encoding schemes vs encoding forms and how to map the IANA
        registry to encodings in C++.
      o Whether the IANA registry is fit for all the purposes for which
        it is being employed.
  * SG16: P1885 polling <https://lists.isocpp.org/sg16/2021/09/2633.php>
      o Relevance of IANA specified encodings to wide literal encoding.
      o Tagging of big endian vs little endian.
  * LEWG: P1885: Text encoding aliases() wording suggestion
      o Wording recommendations courtesy of Tomasz.
  * LEWG: P1885: Naming text encodings: R7 wording feedback
      o Requirements on encoding names.
  * LEWG: New P1885 revision, LEWG feedback applied
      o Discussion largely captured in the threads linked above.

The above threads probe fundamental concerns about the IANA registry and
the goals that P1885 strives to fulfill. It probably isn't realistic to
expect to resolve them all in a single telecon. Given the amount of
discussion that has taken place and the possible perspectives offered,
I'm no longer confident that we have a shared deep understanding of the
design and intent. Specific points I want to cover include the following.

  * Is the IANA registry sufficient and appropriate for the
    identification of both the ordinary and wide literal encodings?
  * How is the IANA registry intended to be applied? Which IANA encoding
    would be considered a match for each of the following cases?
      o Wide literal encoding is UTF-16, sizeof(wchar_t) is 2, CHAR_BIT
        is >= 8, little endian architecture.
      o Wide literal encoding is UTF-16, sizeof(wchar_t) is 1, CHAR_BIT
        is >= 16, architecture endianness is irrelevant since code units
        are a single byte.
      o Wide literal encoding is UTF-16LE, sizeof(wchar_t) is 1,
        CHAR_BIT is >= 8, architecture endianness is irrelevant since
        code units are a single byte.
  * How are conflicts between the IANA registered encoding names and
    other names recognized by implementations to be resolved?

Please feel free to suggest other topics.


Received on 2021-10-01 12:40:23