ISOCPP sg16 List: Re: Agenda for the 2022-10-12 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Sat, 8 Oct 2022 08:38:34 -0400

On 10/8/22 5:38 AM, Corentin Jabot wrote:
> At the risk of spoiling
>
> E.2.1 and E.5 (https://eel.is/c++draft/uaxid) are a repeat of
> [lex.name <http://lex.name>]
> Everything else in [uaxid] is a list of things C++ does not conform
> to, mostly because none of these things would apply one way or another.
>
> A note (or other form of mention) in [lex.name <http://lex.name>]
> stating "C++ conforms to UAX31 by meeting the requirements R1 'Default
> Identifiers' and R4 'Equivalent Normalized Identifiers'. Other
> requirements do not apply." would carry the same meaning.
>
> Beside the maintenance burden of an annex that is not information
> dense, and the fact that the conformance information is more useful to
> have in [lex.name <http://lex.name>], as C++ adds Unicode support (for
> example unicode regex), having an annex per UAX/TR does not seem
> practical or desirable.
>
> Note that I'm not opposed to mentioning the conformance information,
> quite the opposite, it's very useful information.
> I'm just saying an annex which is by large a list of requirements not
> met is less useful than having the information available where it's
> relevant, in a more concise form.

This seems like a tangential concern that we can address if and when an
NB comment raising these concerns appears.

Personally, I find it useful to have prose that explains why a
requirement does not apply.

Tom.

>
>
>
> On Sat, Oct 8, 2022 at 1:31 AM Steve Downey <sdowney_at_[hidden]> wrote:
>
> If there are other comments about Annex E we'll have to do work to
> merge them in any case. Unless it's to drop Annex E entirely,
> which I would be against.
>
> On Fri, Oct 7, 2022 at 7:29 PM Steve Downey <sdowney_at_[hidden]> wrote:
>
> US-64 was supposed to have a matching paper, but it looks like
> it got folded inline. In any case
> https://isocpp.org/files/papers/P2653R0.pdf
>
> UAX #31 describes how languages that use or interpret
> patterns of characters such as regular expressions or number
> formatscomputer languages, may describe that syntax with
> Unicode properties.
> C++ does not do this as part of the language, deferring to
> library components for such usage of patterns. This
> requirement does not apply to C++.
> C++ does not use the methods or properties in UAX #31 to do
> this. It instead uses the whitespace and syntax characters
> defined in [lex] to describe the white space characters and
> the basic source characters used to define syntactic elements
> other than identifiers.
>
> Wordsmithing more than welcome.
>
> On Thu, Oct 6, 2022 at 6:19 PM Corentin Jabot via SG16
> <sg16_at_[hidden]> wrote:
>
> I would prefer to wait for all nb comments to be filled
> in. In particular, US-38 and US-64 may end up conflicting
> or overlapping with other NB comments.
>
> Thanks,
> Corentin
>
> On Thu, Oct 6, 2022 at 11:59 PM Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
>
> SG16 will hold a telecon on Wednesday, October 12th,
> at 19:30 UTC (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20221012T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:
>
> * A presentation by Michael Kuperstein regarding
> i18n and l10n and existing practice in the industry.
> * NB comment processing.
>
> INCITS has made US NB comments available to its
> members. I reviewed the list and identified the
> following as ones that I believe SG16 should establish
> a position on. There are other comments that are
> related to papers SG16 has previously discussed, but
> in those cases, I believe the concerns raised do not
> require SG16 input.
>
> Due to duplicated comments in the list of US comments,
> it is possible that the comment identifiers below will
> change.
>
>
> US-2: [defns.multibyte]
> <http://eel.is/c++draft/defns.multibyte>
>
> The notion of an "execution character set" is no
> longer given prominence in the Draft standard, aside
> from some notes about its relationship to the concept
> as defined by C, and clarifying that certain character
> encodings are unrelated to this character set. This
> makes it a questionable choice for use in the
> definition of "multibyte character".
>
> *Proposed change:*
>
> Change the definition of "multibyte character" to use
> a character encoding with a more definite
> specification given by the Standard.
>
>
> US-38: [format.string.escaped]
> <https://eel.is/c++draft/format.string.escaped>
>
> The subject subclause describes how characters or
> strings are "escaped" to be formatted more suitably
> "for debugging or for logging".
>
> The actual suitability for debugging or for logging
> depends on the needs of the application, and there is
> a conflict between formatting for human readability of
> the textual content and formatting for clarity and
> fidelity of encoding nuances. Indeed, for the latter,
> there can still be (for stateful encodings) a conflict
> between formatting for human visual inspection versus
> formatting for machine consumption of the output
> sequence as a C++ string/character literal.
>
> The current design introduces extensions to the API
> and to the format string syntax that assume that there
> is one specific default that should be chosen "for
> debugging or for logging". The reasoning behind the
> chosen default and the extensibility of the current
> design does not appear to be sufficiently explored.
>
> Note 1:
> An example, for Unicode encodings, of a choice between
> prioritizing between human readability of the textual
> content and visual clarity of encoding nuances is in
> the treatment of characters having Unicode property
> Grapheme_Extend=Yes. The current design favors visual
> clarity of encoding nuances by outputing such
> characters as escape sequences.
>
> Note 2:
> For stateful encodings, the lack of return to the
> initial shift state at the end of the sequence cannot
> be represented using a C++ string/character literal
> unless if a prior shift sequence from the initial
> shift state is rendered via escape sequence(s). It is
> not clear that scanning forward is generally always an
> option (nor is it clear that doing so is desirable).
>
> *Proposed change:*
>
> Narrow the purported scope and affirm the design
> choices of the default behavior:
> Modify "logging" to "technical logging" and spell out
> the priorities in order in the description (this has
> the benefit of clearly communicating intention and
> providing guidance for implementation choices).
>
> 1. The output is intended to be a C++
> string/character literal that reproduces the
> encoded sequence. (This seems to be taken for
> granted and not made explicit in the current draft.)
> 2. Prefer visually distinguishing between different
> methods of encoding "equivalent" textual content.
>
> Make any adjustments necessary to the API or the
> format string syntax associated with "escaped" strings
> to allow for future additions for alternative escaping.
>
>
> US-64: [uaxid.pattern]
> <https://eel.is/c++draft/uaxid.pattern>
>
> The Unicode org has clarified that the pattern
> whitespace and pattern syntax rules apply to the
> lexing and parsing of computer languages.
>
> *Proposed change:*
>
> Replace with "UAX#31 describes how formal languages
> such as computer languages should describe and
> implement their use of whitespace and syntactically
> significant characters during the processes of lexing
> and parsing. C++ does not claim conformance with this
> requirement."
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2022-10-08 12:38:37