ISOCPP sg16 List: Re: Agenda for the 2022-10-12 SG16 telecon

From: Steve Downey <sdowney_at_[hidden]>
Date: Fri, 7 Oct 2022 19:30:53 -0400

If there are other comments about Annex E we'll have to do work to merge
them in any case. Unless it's to drop Annex E entirely, which I would be
against.

On Fri, Oct 7, 2022 at 7:29 PM Steve Downey <sdowney_at_[hidden]> wrote:

> US-64 was supposed to have a matching paper, but it looks like it got
> folded inline. In any case https://isocpp.org/files/papers/P2653R0.pdf
>
> UAX #31 describes how languages that use or interpret patterns of
> characters such as regular expressions or number formatscomputer languages,
> may describe that syntax with Unicode properties.
> C++ does not do this as part of the language, deferring to library
> components for such usage of patterns. This requirement does not apply to
> C++.
> C++ does not use the methods or properties in UAX #31 to do this. It
> instead uses the whitespace and syntax characters defined in [lex] to
> describe the white space characters and the basic source characters used to
> define syntactic elements other than identifiers.
>
> Wordsmithing more than welcome.
>
> On Thu, Oct 6, 2022 at 6:19 PM Corentin Jabot via SG16 <
> sg16_at_[hidden]> wrote:
>
>> I would prefer to wait for all nb comments to be filled in. In
>> particular, US-38 and US-64 may end up conflicting or overlapping with
>> other NB comments.
>>
>> Thanks,
>> Corentin
>>
>> On Thu, Oct 6, 2022 at 11:59 PM Tom Honermann via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> SG16 will hold a telecon on Wednesday, October 12th, at 19:30 UTC (timezone
>>> conversion
>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20221012T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>>> ).
>>>
>>> The agenda is:
>>>
>>> - A presentation by Michael Kuperstein regarding i18n and l10n and
>>> existing practice in the industry.
>>> - NB comment processing.
>>>
>>> INCITS has made US NB comments available to its members. I reviewed the
>>> list and identified the following as ones that I believe SG16 should
>>> establish a position on. There are other comments that are related to
>>> papers SG16 has previously discussed, but in those cases, I believe the
>>> concerns raised do not require SG16 input.
>>>
>>> Due to duplicated comments in the list of US comments, it is possible
>>> that the comment identifiers below will change.
>>> US-2: [defns.multibyte] <http://eel.is/c++draft/defns.multibyte>
>>>
>>> The notion of an "execution character set" is no longer given prominence
>>> in the Draft standard, aside from some notes about its relationship to the
>>> concept as defined by C, and clarifying that certain character encodings
>>> are unrelated to this character set. This makes it a questionable choice
>>> for use in the definition of "multibyte character".
>>>
>>> *Proposed change:*
>>>
>>> Change the definition of "multibyte character" to use a character
>>> encoding with a more definite specification given by the Standard.
>>> US-38: [format.string.escaped]
>>> <https://eel.is/c++draft/format.string.escaped>
>>>
>>> The subject subclause describes how characters or strings are "escaped"
>>> to be formatted more suitably "for debugging or for logging".
>>>
>>> The actual suitability for debugging or for logging depends on the needs
>>> of the application, and there is a conflict between formatting for human
>>> readability of the textual content and formatting for clarity and fidelity
>>> of encoding nuances. Indeed, for the latter, there can still be (for
>>> stateful encodings) a conflict between formatting for human visual
>>> inspection versus formatting for machine consumption of the output sequence
>>> as a C++ string/character literal.
>>>
>>> The current design introduces extensions to the API and to the format
>>> string syntax that assume that there is one specific default that should be
>>> chosen "for debugging or for logging". The reasoning behind the chosen
>>> default and the extensibility of the current design does not appear to be
>>> sufficiently explored.
>>>
>>> Note 1:
>>> An example, for Unicode encodings, of a choice between prioritizing
>>> between human readability of the textual content and visual clarity of
>>> encoding nuances is in the treatment of characters having Unicode property
>>> Grapheme_Extend=Yes. The current design favors visual clarity of encoding
>>> nuances by outputing such characters as escape sequences.
>>>
>>> Note 2:
>>> For stateful encodings, the lack of return to the initial shift state at
>>> the end of the sequence cannot be represented using a C++ string/character
>>> literal unless if a prior shift sequence from the initial shift state is
>>> rendered via escape sequence(s). It is not clear that scanning forward is
>>> generally always an option (nor is it clear that doing so is desirable).
>>>
>>> *Proposed change:*
>>>
>>> Narrow the purported scope and affirm the design choices of the default
>>> behavior:
>>> Modify "logging" to "technical logging" and spell out the priorities in
>>> order in the description (this has the benefit of clearly communicating
>>> intention and providing guidance for implementation choices).
>>>
>>> 1. The output is intended to be a C++ string/character literal that
>>> reproduces the encoded sequence. (This seems to be taken for granted and
>>> not made explicit in the current draft.)
>>> 2. Prefer visually distinguishing between different methods of
>>> encoding "equivalent" textual content.
>>>
>>> Make any adjustments necessary to the API or the format string syntax
>>> associated with "escaped" strings to allow for future additions for
>>> alternative escaping.
>>> US-64: [uaxid.pattern] <https://eel.is/c++draft/uaxid.pattern>
>>>
>>> The Unicode org has clarified that the pattern whitespace and pattern
>>> syntax rules apply to the lexing and parsing of computer languages.
>>>
>>> *Proposed change:*
>>>
>>> Replace with "UAX#31 describes how formal languages such as computer
>>> languages should describe and implement their use of whitespace and
>>> syntactically significant characters during the processes of lexing and
>>> parsing. C++ does not claim conformance with this requirement."
>>>
>>> Tom.
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2022-10-07 23:31:06