C++ Logo


Advanced search

Re: Agenda for the 2022-10-19 SG16 telecon​

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 19 Oct 2022 09:22:16 +0200
We had a French NB meeting yesterday (without cheese or wine,
I'm trying to get us to share a draft but in the meantime, here are some of
the SG-16 related NB comments coming our way (there are more!):




The usefulness of that clause is debatable.

Please state in [lex.name] the set of rules/profiles followed by c++
identifiers, rather than listing all of the rules that do not apply.

Only state "C++ conforms to UAX31 by meeting the requirements R1 “Default
Identifiers” and R4 “Equivalent Normalized Identifiers” and remove [uaxid]




The intent of "escaped strings" is unclear and unsuited to many spoken

As it stands, this features escapes


   Invalid code unit sequences

   Non printable characters

   All combining marks and other grapheme extenders

   Whitespaces using different techniques

   escape \ and "

It is not clear that all these behaviours should be regrouped in the same

In particular, escaping all combining codepoints is not suitable for many
languages, especially Korean.

Please consider either:


   Not escaping combining codepoints which are part of a grapheme cluster.

   Removing this feature and replace it by a set of facilities performing
   each of these operations in a future C++ standard

Should we have time:





The range of codepoints to be considered of width 2 is not compatible with
recent and future unicode versions.

Please adopt LWG3780 for C++23, backported to C++20

On Tue, Oct 18, 2022 at 10:31 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> This is your friendly reminder that this meeting is taking place tomorrow.
> The German NB has released a draft of their NB comments. None of those
> require review by SG16. As far as I know, no other NB (other than INCITS
> for the US) has released draft comments yet.
> Any polling conducted tomorrow will be for changes relative to the status
> quo (the balloted document for which N4917 <https://wg21.link/n4917> is
> close enough); not relative to other NB comments (which some of us may or
> may not have knowledge of).
> Tom.
> On 10/14/22 12:20 AM, Tom Honermann via SG16 wrote:
> SG16 will hold a telecon on Wednesday, October 19th, at 19:30 UTC (timezone
> conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20221019T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
> ).
> The agenda is:
> - NB comment processing.
> The details below are a copy of what was originally included with the
> agenda for the October 12th meeting.
> INCITS has made US NB comments available to its members. I reviewed the
> list and identified the following as ones that I believe SG16 should
> establish a position on. There are other comments that are related to
> papers SG16 has previously discussed, but in those cases, I believe the
> concerns raised do not require SG16 input.
> Due to duplicated comments in the list of US comments, it is possible that
> the comment identifiers below will change.
> US-2: [defns.multibyte] <http://eel.is/c++draft/defns.multibyte>
> The notion of an "execution character set" is no longer given prominence
> in the Draft standard, aside from some notes about its relationship to the
> concept as defined by C, and clarifying that certain character encodings
> are unrelated to this character set. This makes it a questionable choice
> for use in the definition of "multibyte character".
> *Proposed change:*
> Change the definition of "multibyte character" to use a character encoding
> with a more definite specification given by the Standard.
> US-38: [format.string.escaped]
> <https://eel.is/c++draft/format.string.escaped>
> The subject subclause describes how characters or strings are "escaped" to
> be formatted more suitably "for debugging or for logging".
> The actual suitability for debugging or for logging depends on the needs
> of the application, and there is a conflict between formatting for human
> readability of the textual content and formatting for clarity and fidelity
> of encoding nuances. Indeed, for the latter, there can still be (for
> stateful encodings) a conflict between formatting for human visual
> inspection versus formatting for machine consumption of the output sequence
> as a C++ string/character literal.
> The current design introduces extensions to the API and to the format
> string syntax that assume that there is one specific default that should be
> chosen "for debugging or for logging". The reasoning behind the chosen
> default and the extensibility of the current design does not appear to be
> sufficiently explored.
> Note 1:
> An example, for Unicode encodings, of a choice between prioritizing
> between human readability of the textual content and visual clarity of
> encoding nuances is in the treatment of characters having Unicode property
> Grapheme_Extend=Yes. The current design favors visual clarity of encoding
> nuances by outputing such characters as escape sequences.
> Note 2:
> For stateful encodings, the lack of return to the initial shift state at
> the end of the sequence cannot be represented using a C++ string/character
> literal unless if a prior shift sequence from the initial shift state is
> rendered via escape sequence(s). It is not clear that scanning forward is
> generally always an option (nor is it clear that doing so is desirable).
> *Proposed change:*
> Narrow the purported scope and affirm the design choices of the default
> behavior:
> Modify "logging" to "technical logging" and spell out the priorities in
> order in the description (this has the benefit of clearly communicating
> intention and providing guidance for implementation choices).
> 1. The output is intended to be a C++ string/character literal that
> reproduces the encoded sequence. (This seems to be taken for granted and
> not made explicit in the current draft.)
> 2. Prefer visually distinguishing between different methods of
> encoding "equivalent" textual content.
> Make any adjustments necessary to the API or the format string syntax
> associated with "escaped" strings to allow for future additions for
> alternative escaping.
> US-64: [uaxid.pattern] <https://eel.is/c++draft/uaxid.pattern>
> The Unicode org has clarified that the pattern whitespace and pattern
> syntax rules apply to the lexing and parsing of computer languages.
> *Proposed change:*
> Replace with "UAX#31 describes how formal languages such as computer
> languages should describe and implement their use of whitespace and
> syntactically significant characters during the processes of lexing and
> parsing. C++ does not claim conformance with this requirement."
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2022-10-19 07:22:31