ISOCPP sg16 List: Re: Agenda for the 2022-04-27 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 26 Apr 2022 11:39:06 -0400

On 4/23/22 6:58 PM, Steve Downey wrote:
> My notes indicate that it's narrative text about the implications and
> consequences for putting the three characters into the set. I can have
> that. I don't think we found wording changes.

Yes, that sounds correct.

I'm working to get the minutes from the meeting published today (my
apologies for being so delinquent again). A draft of the P2558R0
discussion follows.

Steve, assuming it is realistic for you to have an updated paper for the
meeting tomorrow, could you 1) cross check your updates with the
requests below, and 2) reply to this thread with an updated
draft/revision once you have one available.

  *

    P2558R0: Add @, $, and ` to the basic character set
    <https://wg21.link/p2558>

      o PBrett asked whether this proposal is an SG16 concern.
      o Tom replied that we are the encoding experts and best equipped
        within the committee to evaluate whether these changes will have
        consequences for source character sets used in practice; our
        affirmation of the proposal will hopefully ease progress through
        other groups.
      o Charlie asked for confirmation that SG22 will review the paper
        as well.
      o Steve replied affirmatively and stated WG14 has already adopted
        the proposal for C.
      o Someone (apologies, the editor neglected to record who) noted
        the existence of P2342: For a Few Punctuators More and that it
        argues that these characters could be used as new operators.
      o Tom responded that P2342 is an example of a paper that is not an
        SG16 concern; once the characters are available in the source
        character set, how they are used is an EWG concern.
      o Hubert observed that this introduces a requirement that these
        characters be encoded as a single code unit.
      o Jens acknowledged and noted that this is required by
        [lex.charset]p6 <http://eel.is/c++draft/lex.charset#6> for all
        characters in the basic literal character set.
      o Jens requested that the paper prose discuss this requirement.
      o Steve agreed to add such prose.
      o Jens stated that he does not know if this requirement would be
        problematic for any existing character sets.
      o Steve replied that there are infrequently used EBCDIC code pages
        that lack them.
      o Jens asked if those code pages also lack other characters.
      o Steve responded that the probably do.
      o Charlie asked if those problematic EBCDIC code pages have unused
        code points that could be used for these characters.
      o Tom suggested that digraphs could be introduced to support those
        code pages if needed.
      o Jens replied that digraphs can't be used inside character or
        string literals.
      o Steve suggested this concern can be addressed if these
        characters start getting used within the standard.
      o Jens reported that the addition of these characters to the basic
        character set makes them ineligible to be specified via a
        /universal-character-name/ (UCN) outside of a character or
        string literal due to [lex.charset]p3
        <http://eel.is/c++draft/lex.charset#3>.
      o PBrett suggested that restriction could be lifted.
      o Charlie stated that existing uses of '$' are probably limited to
        identifiers and symbol renaming via attributes, pragma
        directives, or |asm| labels.
      o Hubert reported they may appear in preprocessor stringization.
      o PBrett reported they may appear in header names as well.
      o Jens noticed that use of a UCN to|#include| a source file named
        with one of these characters would become ill-formed.
      o PBrett reiterated support for lifting restrictions on use of
        UCNs to name characters from the basic character set with the
        rationale that we shouldn't ban things that are only bad ideas.
      o Charlie stated that, if such incompatibilities were to cause
        problems in practice, then ifting that restriction would be the
        obvious solution.
      o Steve suggested such concerns can be addressed after the paper
        is approved; we know programmers want to use these characters.
      o Tom asked Steve if he knows whether the UCN concern was
        discussed in WG14.
      o Steve replied that it was not.
      o Hubert asked when UCNs are translated in identifiers and other
        preprocessor tokens.
      o Jens replied, during translation phase 3 except in quoted
        contexts; so translation occurs as a /preprocessing-token/ is
        formed.
      o PBrett observed that the new UCN restrictions would be limited
        to /h-char/ and /q-char/ sequences since these characters aren't
        otherwise usable outside quoted contexts.
      o Jens replied that they may also appear in a
        /preprocessing-token/ used in stringization.
      o Hubert stated that a UCN specifying one of these characters
        would become ill-formed during stringization.
      o Hubert added that, likewise, use of a UCN to specify '$' in an
        identfier would become ill-formed.
      o Tom asked whether that matters since use of '$' in an identifier
        is already an extension.
      o Jens noted that these characters currently match the "each
        non-whitespace character that cannot be one of the above" case
        of /preprocessing-token/
        <http://eel.is/c++draft/lex.pptoken#nt:preprocessing-token>.
      o Jens stated that this makes such intended use in identifiers
        ill-formed since, after this change, such a character would
        appear as a lone /preprocessing-token/.
      o [ Editor's note: This behavior doesn't seem related to the
        proposed change since, previously, a UCN naming one of these
        characters would also appear as lone /preprocessing-token/. The
        editor is concerned that this portion of the discussion was not
        captured accurately. ]
      o Jens requested that this discussion be included in the paper.
      o PBrett asked whether /h-char/ and /q-char/ sequences remain a
        backward compatibility concern.
      o Jens replied that they do, but that UCNs in such sequneces
        already have implementation-defined behavior; what UCNs mean in
        /h-char/ and /q-char/ sequences isn't really defined.
      o [ Editor's note: Per [lex.phases]p1.3
        <http://eel.is/c++draft/lex.phases#1.3>, UCNs are not recognized
        and replaced in /h-char-sequence/ and /q-char-sequence/
        sequences and, per [lex.header]p1
        <http://eel.is/c++draft/lex.header#1>, these sequences are
        mapped in an implementation-defined manner. ]
      o Tom suggested that it might be worth updating the paper to
        describe how existing implementations behave with respect to
        UCNs in preprocessor stringization and |#include| directives.
      o Jens asked if there are other ways in which these characters
        might plausibly be used today.
      o Tom replied that Objective-C uses '@'.
      o Jens raised the possibility of concerns regarding imposing
        single code unit encoding for these characters on the execution
        character set.
      o Tom asked Hubert if he was aware of any EBCDIC related concerns.
      o Hubert replied that his colleagues in WG14 did not express such
        concerns and that he is confident that they would have if they
        had any.
      o Hubert noted that locales that don't support these characters
        would no longer be strictly conforming but could still be
        supported as extensions.
      o Tom summarized the discussion; there are requests to Steve to
        update the prose for several of the items discussed, but that
        there do not appear to be any objections to the paper direction.

Tom.

>
> On Sat, Apr 23, 2022, 09:04 Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
>
> SG16 will hold a telecon on Wednesday, April 27th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220427T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:http://eel.is/c++draft/lex.header#1
>
> * P2286R7: Formatting Ranges
> <https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html>
> o Review recent updates and confirm direction.
> o (The link is to a draft revision attached to the LWG wiki
> page)
> * P2558R0: Add @, $, and ` to the basic character set
> <https://wg21.link/p2558>
> o Continue review pending the availability of an updated
> revision.
> * D2572R0: std::format() fill character allowances
> <https://rawgit.com/tahonermann/std-proposals/master/d2572r0.html>
> o Continue review pending the availability of an updated
> revision.
>
> LWG has tentatively approved
> <https://github.com/cplusplus/papers/issues/977#issuecomment-1107071392>
> P2286R7 as linked above pending SG16 confirmation of design intent
> and wording for non-Unicode encodings. Please review the SG16
> 2022-01-26 meeting summary
> <https://github.com/sg16-unicode/sg16-meetings#january-26th-2022>
> for previously discussed concerns. A few wording details are still
> being discussed outside of the mailing lists, so the paper
> revision linked above may still receive some updates prior to our
> meeting.
>
> Further review of P2558R0 and D2572R0 are scheduled pending
> updates from the authors to address feedback from the SG16
> 2022-04-13 telecon
> <https://github.com/sg16-unicode/sg16-meetings#april-13th-2022>
> (which I have yet to publish).
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2022-04-26 15:39:10