ISOCPP sg16 List: Re: Agenda for the 2022-10-12 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 11 Oct 2022 14:10:15 -0400

This is your friendly reminder that this meeting is taking place tomorrow.

Tom.

On 10/6/22 5:59 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, October 12th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20221012T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:
>
> * A presentation by Michael Kuperstein regarding i18n and l10n and
> existing practice in the industry.
> * NB comment processing.
>
> INCITS has made US NB comments available to its members. I reviewed
> the list and identified the following as ones that I believe SG16
> should establish a position on. There are other comments that are
> related to papers SG16 has previously discussed, but in those cases, I
> believe the concerns raised do not require SG16 input.
>
> Due to duplicated comments in the list of US comments, it is possible
> that the comment identifiers below will change.
>
>
> US-2: [defns.multibyte] <http://eel.is/c++draft/defns.multibyte>
>
> The notion of an "execution character set" is no longer given
> prominence in the Draft standard, aside from some notes about its
> relationship to the concept as defined by C, and clarifying that
> certain character encodings are unrelated to this character set. This
> makes it a questionable choice for use in the definition of "multibyte
> character".
>
> *Proposed change:*
>
> Change the definition of "multibyte character" to use a character
> encoding with a more definite specification given by the Standard.
>
>
> US-38: [format.string.escaped]
> <https://eel.is/c++draft/format.string.escaped>
>
> The subject subclause describes how characters or strings are
> "escaped" to be formatted more suitably "for debugging or for logging".
>
> The actual suitability for debugging or for logging depends on the
> needs of the application, and there is a conflict between formatting
> for human readability of the textual content and formatting for
> clarity and fidelity of encoding nuances. Indeed, for the latter,
> there can still be (for stateful encodings) a conflict between
> formatting for human visual inspection versus formatting for machine
> consumption of the output sequence as a C++ string/character literal.
>
> The current design introduces extensions to the API and to the format
> string syntax that assume that there is one specific default that
> should be chosen "for debugging or for logging". The reasoning behind
> the chosen default and the extensibility of the current design does
> not appear to be sufficiently explored.
>
> Note 1:
> An example, for Unicode encodings, of a choice between prioritizing
> between human readability of the textual content and visual clarity of
> encoding nuances is in the treatment of characters having Unicode
> property Grapheme_Extend=Yes. The current design favors visual clarity
> of encoding nuances by outputing such characters as escape sequences.
>
> Note 2:
> For stateful encodings, the lack of return to the initial shift state
> at the end of the sequence cannot be represented using a C++
> string/character literal unless if a prior shift sequence from the
> initial shift state is rendered via escape sequence(s). It is not
> clear that scanning forward is generally always an option (nor is it
> clear that doing so is desirable).
>
> *Proposed change:*
>
> Narrow the purported scope and affirm the design choices of the
> default behavior:
> Modify "logging" to "technical logging" and spell out the priorities
> in order in the description (this has the benefit of clearly
> communicating intention and providing guidance for implementation
> choices).
>
> 1. The output is intended to be a C++ string/character literal that
> reproduces the encoded sequence. (This seems to be taken for
> granted and not made explicit in the current draft.)
> 2. Prefer visually distinguishing between different methods of
> encoding "equivalent" textual content.
>
> Make any adjustments necessary to the API or the format string syntax
> associated with "escaped" strings to allow for future additions for
> alternative escaping.
>
>
> US-64: [uaxid.pattern] <https://eel.is/c++draft/uaxid.pattern>
>
> The Unicode org has clarified that the pattern whitespace and pattern
> syntax rules apply to the lexing and parsing of computer languages.
>
> *Proposed change:*
>
> Replace with "UAX#31 describes how formal languages such as computer
> languages should describe and implement their use of whitespace and
> syntactically significant characters during the processes of lexing
> and parsing. C++ does not claim conformance with this requirement."
>
> Tom.
>
>

Received on 2022-10-11 18:10:18