ISOCPP sg16 List: Re: Agenda for the 2022-10-12 SG16 telecon

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sat, 8 Oct 2022 15:16:33 +0200

On Sat, Oct 8, 2022 at 2:38 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 10/8/22 5:38 AM, Corentin Jabot wrote:
>
> At the risk of spoiling
>
> E.2.1 and E.5 (https://eel.is/c++draft/uaxid) are a repeat of [lex.name]
> Everything else in [uaxid] is a list of things C++ does not conform to,
> mostly because none of these things would apply one way or another.
>
> A note (or other form of mention) in [lex.name] stating "C++ conforms to
> UAX31 by meeting the requirements R1 'Default Identifiers' and R4
> 'Equivalent Normalized Identifiers'. Other requirements do not apply."
> would carry the same meaning.
>
> Beside the maintenance burden of an annex that is not information dense,
> and the fact that the conformance information is more useful to have in [
> lex.name], as C++ adds Unicode support (for example unicode regex),
> having an annex per UAX/TR does not seem practical or desirable.
>
> Note that I'm not opposed to mentioning the conformance information, quite
> the opposite, it's very useful information.
> I'm just saying an annex which is by large a list of requirements not met
> is less useful than having the information available where it's relevant,
> in a more concise form.
>
> This seems like a tangential concern that we can address if and when an NB
> comment raising these concerns appears.
>
> Personally, I find it useful to have prose that explains why a requirement
> does not apply.
>
Unfortunately, we probably won't know all the NB comments in the next 2
weeks (For example, France's process runs until the 21 october)
Sure, if there is a nb comment to improve [uaxid] and then one to remove
it, we could treat them in order.
But assuming both nb comments are made and deemed reasonable, there will be
a bunch of work invested to improve something and then remove it.
Not the end of the world, but also not optimal.

> Tom.
>
>
>
>
> On Sat, Oct 8, 2022 at 1:31 AM Steve Downey <sdowney_at_[hidden]> wrote:
>
>> If there are other comments about Annex E we'll have to do work to merge
>> them in any case. Unless it's to drop Annex E entirely, which I would be
>> against.
>>
>> On Fri, Oct 7, 2022 at 7:29 PM Steve Downey <sdowney_at_[hidden]> wrote:
>>
>>> US-64 was supposed to have a matching paper, but it looks like it got
>>> folded inline. In any case https://isocpp.org/files/papers/P2653R0.pdf
>>>
>>> UAX #31 describes how languages that use or interpret patterns of
>>> characters such as regular expressions or number formatscomputer
>>> languages, may describe that syntax with Unicode properties.
>>> C++ does not do this as part of the language, deferring to library
>>> components for such usage of patterns. This requirement does not apply to
>>> C++.
>>> C++ does not use the methods or properties in UAX #31 to do this. It
>>> instead uses the whitespace and syntax characters defined in [lex] to
>>> describe the white space characters and the basic source characters used to
>>> define syntactic elements other than identifiers.
>>>
>>> Wordsmithing more than welcome.
>>>
>>> On Thu, Oct 6, 2022 at 6:19 PM Corentin Jabot via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> I would prefer to wait for all nb comments to be filled in. In
>>>> particular, US-38 and US-64 may end up conflicting or overlapping with
>>>> other NB comments.
>>>>
>>>> Thanks,
>>>> Corentin
>>>>
>>>> On Thu, Oct 6, 2022 at 11:59 PM Tom Honermann via SG16 <
>>>> sg16_at_[hidden]> wrote:
>>>>
>>>>> SG16 will hold a telecon on Wednesday, October 12th, at 19:30 UTC (timezone
>>>>> conversion
>>>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20221012T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>
>>>>> ).
>>>>>
>>>>> The agenda is:
>>>>>
>>>>> - A presentation by Michael Kuperstein regarding i18n and l10n and
>>>>> existing practice in the industry.
>>>>> - NB comment processing.
>>>>>
>>>>> INCITS has made US NB comments available to its members. I reviewed
>>>>> the list and identified the following as ones that I believe SG16 should
>>>>> establish a position on. There are other comments that are related to
>>>>> papers SG16 has previously discussed, but in those cases, I believe the
>>>>> concerns raised do not require SG16 input.
>>>>>
>>>>> Due to duplicated comments in the list of US comments, it is possible
>>>>> that the comment identifiers below will change.
>>>>> US-2: [defns.multibyte] <http://eel.is/c++draft/defns.multibyte>
>>>>>
>>>>> The notion of an "execution character set" is no longer given
>>>>> prominence in the Draft standard, aside from some notes about its
>>>>> relationship to the concept as defined by C, and clarifying that certain
>>>>> character encodings are unrelated to this character set. This makes it a
>>>>> questionable choice for use in the definition of "multibyte character".
>>>>>
>>>>> *Proposed change:*
>>>>>
>>>>> Change the definition of "multibyte character" to use a character
>>>>> encoding with a more definite specification given by the Standard.
>>>>> US-38: [format.string.escaped]
>>>>> <https://eel.is/c++draft/format.string.escaped>
>>>>>
>>>>> The subject subclause describes how characters or strings are
>>>>> "escaped" to be formatted more suitably "for debugging or for logging".
>>>>>
>>>>> The actual suitability for debugging or for logging depends on the
>>>>> needs of the application, and there is a conflict between formatting for
>>>>> human readability of the textual content and formatting for clarity and
>>>>> fidelity of encoding nuances. Indeed, for the latter, there can still be
>>>>> (for stateful encodings) a conflict between formatting for human visual
>>>>> inspection versus formatting for machine consumption of the output sequence
>>>>> as a C++ string/character literal.
>>>>>
>>>>> The current design introduces extensions to the API and to the format
>>>>> string syntax that assume that there is one specific default that should be
>>>>> chosen "for debugging or for logging". The reasoning behind the chosen
>>>>> default and the extensibility of the current design does not appear to be
>>>>> sufficiently explored.
>>>>>
>>>>> Note 1:
>>>>> An example, for Unicode encodings, of a choice between prioritizing
>>>>> between human readability of the textual content and visual clarity of
>>>>> encoding nuances is in the treatment of characters having Unicode property
>>>>> Grapheme_Extend=Yes. The current design favors visual clarity of encoding
>>>>> nuances by outputing such characters as escape sequences.
>>>>>
>>>>> Note 2:
>>>>> For stateful encodings, the lack of return to the initial shift state
>>>>> at the end of the sequence cannot be represented using a C++
>>>>> string/character literal unless if a prior shift sequence from the initial
>>>>> shift state is rendered via escape sequence(s). It is not clear that
>>>>> scanning forward is generally always an option (nor is it clear that doing
>>>>> so is desirable).
>>>>>
>>>>> *Proposed change:*
>>>>>
>>>>> Narrow the purported scope and affirm the design choices of the
>>>>> default behavior:
>>>>> Modify "logging" to "technical logging" and spell out the priorities
>>>>> in order in the description (this has the benefit of clearly communicating
>>>>> intention and providing guidance for implementation choices).
>>>>>
>>>>> 1. The output is intended to be a C++ string/character literal
>>>>> that reproduces the encoded sequence. (This seems to be taken for granted
>>>>> and not made explicit in the current draft.)
>>>>> 2. Prefer visually distinguishing between different methods of
>>>>> encoding "equivalent" textual content.
>>>>>
>>>>> Make any adjustments necessary to the API or the format string syntax
>>>>> associated with "escaped" strings to allow for future additions for
>>>>> alternative escaping.
>>>>> US-64: [uaxid.pattern] <https://eel.is/c++draft/uaxid.pattern>
>>>>>
>>>>> The Unicode org has clarified that the pattern whitespace and pattern
>>>>> syntax rules apply to the lexing and parsing of computer languages.
>>>>>
>>>>> *Proposed change:*
>>>>>
>>>>> Replace with "UAX#31 describes how formal languages such as computer
>>>>> languages should describe and implement their use of whitespace and
>>>>> syntactically significant characters during the processes of lexing and
>>>>> parsing. C++ does not claim conformance with this requirement."
>>>>>
>>>>> Tom.
>>>>> --
>>>>> SG16 mailing list
>>>>> SG16_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>>

Received on 2022-10-08 13:16:46