ISOCPP sg16 List: Re: Agenda for the 2022-10-12 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 10 Oct 2022 00:06:47 -0400

On 10/8/22 9:16 AM, Corentin Jabot via SG16 wrote:
>
>
> On Sat, Oct 8, 2022 at 2:38 PM Tom Honermann <tom_at_[hidden]> wrote:
>
> On 10/8/22 5:38 AM, Corentin Jabot wrote:
>> At the risk of spoiling
>>
>> E.2.1 and E.5 (https://eel.is/c++draft/uaxid) are a repeat of
>> [lex.name <http://lex.name>]
>> Everything else in [uaxid] is a list of things C++ does not
>> conform to, mostly because none of these things would apply one
>> way or another.
>>
>> A note (or other form of mention) in [lex.name <http://lex.name>]
>> stating "C++ conforms to UAX31 by meeting the requirements R1
>> 'Default Identifiers' and R4 'Equivalent Normalized Identifiers'.
>> Other requirements do not apply." would carry the same meaning.
>>
>> Beside the maintenance burden of an annex that is not information
>> dense, and the fact that the conformance information is more
>> useful to have in [lex.name <http://lex.name>], as C++ adds
>> Unicode support (for example unicode regex), having an annex per
>> UAX/TR does not seem practical or desirable.
>>
>> Note that I'm not opposed to mentioning the conformance
>> information, quite the opposite, it's very useful information.
>> I'm just saying an annex which is by large a list of requirements
>> not met is less useful than having the information available
>> where it's relevant, in a more concise form.
>
> This seems like a tangential concern that we can address if and
> when an NB comment raising these concerns appears.
>
> Personally, I find it useful to have prose that explains why a
> requirement does not apply.
>
> Unfortunately, we probably won't know all the NB comments in the next
> 2 weeks (For example, France's process runs until the 21 october)
> Sure, if there is a nb comment to improve [uaxid] and then one to
> remove it, we could treat them in order.
> But assuming both nb comments are made and deemed reasonable, there
> will be a bunch of work invested to improve something and then remove it.
> Not the end of the world, but also not optimal.

While that is a possibility, I would still prefer to get a head start
processing NB comments.

Tom.

> Tom.
>
>>
>>
>>
>> On Sat, Oct 8, 2022 at 1:31 AM Steve Downey <sdowney_at_[hidden]>
>> wrote:
>>
>> If there are other comments about Annex E we'll have to do
>> work to merge them in any case. Unless it's to drop Annex E
>> entirely, which I would be against.
>>
>> On Fri, Oct 7, 2022 at 7:29 PM Steve Downey
>> <sdowney_at_[hidden]> wrote:
>>
>> US-64 was supposed to have a matching paper, but it looks
>> like it got folded inline. In any case
>> https://isocpp.org/files/papers/P2653R0.pdf
>>
>> UAX #31 describes how languages that use or interpret
>> patterns of characters such as regular expressions or
>> number formatscomputer languages, may describe that
>> syntax with Unicode properties.
>> C++ does not do this as part of the language, deferring
>> to library components for such usage of patterns. This
>> requirement does not apply to C++.
>> C++ does not use the methods or properties in UAX #31 to
>> do this. It instead uses the whitespace and syntax
>> characters defined in [lex] to describe the white space
>> characters and the basic source characters used to define
>> syntactic elements other than identifiers.
>>
>> Wordsmithing more than welcome.
>>
>> On Thu, Oct 6, 2022 at 6:19 PM Corentin Jabot via SG16
>> <sg16_at_[hidden]> wrote:
>>
>> I would prefer to wait for all nb comments to be
>> filled in. In particular, US-38 and US-64 may end up
>> conflicting or overlapping with other NB comments.
>>
>> Thanks,
>> Corentin
>>
>> On Thu, Oct 6, 2022 at 11:59 PM Tom Honermann via
>> SG16 <sg16_at_[hidden]> wrote:
>>
>> SG16 will hold a telecon on Wednesday, October
>> 12th, at 19:30 UTC (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20221012T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>
>> The agenda is:
>>
>> * A presentation by Michael Kuperstein
>> regarding i18n and l10n and existing practice
>> in the industry.
>> * NB comment processing.
>>
>> INCITS has made US NB comments available to its
>> members. I reviewed the list and identified the
>> following as ones that I believe SG16 should
>> establish a position on. There are other comments
>> that are related to papers SG16 has previously
>> discussed, but in those cases, I believe the
>> concerns raised do not require SG16 input.
>>
>> Due to duplicated comments in the list of US
>> comments, it is possible that the comment
>> identifiers below will change.
>>
>>
>> US-2: [defns.multibyte]
>> <http://eel.is/c++draft/defns.multibyte>
>>
>> The notion of an "execution character set" is no
>> longer given prominence in the Draft standard,
>> aside from some notes about its relationship to
>> the concept as defined by C, and clarifying that
>> certain character encodings are unrelated to this
>> character set. This makes it a questionable
>> choice for use in the definition of "multibyte
>> character".
>>
>> *Proposed change:*
>>
>> Change the definition of "multibyte character" to
>> use a character encoding with a more definite
>> specification given by the Standard.
>>
>>
>> US-38: [format.string.escaped]
>> <https://eel.is/c++draft/format.string.escaped>
>>
>> The subject subclause describes how characters or
>> strings are "escaped" to be formatted more
>> suitably "for debugging or for logging".
>>
>> The actual suitability for debugging or for
>> logging depends on the needs of the application,
>> and there is a conflict between formatting for
>> human readability of the textual content and
>> formatting for clarity and fidelity of encoding
>> nuances. Indeed, for the latter, there can still
>> be (for stateful encodings) a conflict between
>> formatting for human visual inspection versus
>> formatting for machine consumption of the output
>> sequence as a C++ string/character literal.
>>
>> The current design introduces extensions to the
>> API and to the format string syntax that assume
>> that there is one specific default that should be
>> chosen "for debugging or for logging". The
>> reasoning behind the chosen default and the
>> extensibility of the current design does not
>> appear to be sufficiently explored.
>>
>> Note 1:
>> An example, for Unicode encodings, of a choice
>> between prioritizing between human readability of
>> the textual content and visual clarity of
>> encoding nuances is in the treatment of
>> characters having Unicode property
>> Grapheme_Extend=Yes. The current design favors
>> visual clarity of encoding nuances by outputing
>> such characters as escape sequences.
>>
>> Note 2:
>> For stateful encodings, the lack of return to the
>> initial shift state at the end of the sequence
>> cannot be represented using a C++
>> string/character literal unless if a prior shift
>> sequence from the initial shift state is rendered
>> via escape sequence(s). It is not clear that
>> scanning forward is generally always an option
>> (nor is it clear that doing so is desirable).
>>
>> *Proposed change:*
>>
>> Narrow the purported scope and affirm the design
>> choices of the default behavior:
>> Modify "logging" to "technical logging" and spell
>> out the priorities in order in the description
>> (this has the benefit of clearly communicating
>> intention and providing guidance for
>> implementation choices).
>>
>> 1. The output is intended to be a C++
>> string/character literal that reproduces the
>> encoded sequence. (This seems to be taken for
>> granted and not made explicit in the current
>> draft.)
>> 2. Prefer visually distinguishing between
>> different methods of encoding "equivalent"
>> textual content.
>>
>> Make any adjustments necessary to the API or the
>> format string syntax associated with "escaped"
>> strings to allow for future additions for
>> alternative escaping.
>>
>>
>> US-64: [uaxid.pattern]
>> <https://eel.is/c++draft/uaxid.pattern>
>>
>> The Unicode org has clarified that the pattern
>> whitespace and pattern syntax rules apply to the
>> lexing and parsing of computer languages.
>>
>> *Proposed change:*
>>
>> Replace with "UAX#31 describes how formal
>> languages such as computer languages should
>> describe and implement their use of whitespace
>> and syntactically significant characters during
>> the processes of lexing and parsing. C++ does not
>> claim conformance with this requirement."
>>
>> Tom.
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2022-10-10 04:06:50