Date: Tue, 26 Apr 2022 11:39:06 -0400
On 4/23/22 6:58 PM, Steve Downey wrote:
> My notes indicate that it's narrative text about the implications and
> consequences for putting the three characters into the set. I can have
> that. I don't think we found wording changes.
Yes, that sounds correct.
I'm working to get the minutes from the meeting published today (my
apologies for being so delinquent again). A draft of the P2558R0
discussion follows.
Steve, assuming it is realistic for you to have an updated paper for the
meeting tomorrow, could you 1) cross check your updates with the
requests below, and 2) reply to this thread with an updated
draft/revision once you have one available.
*
P2558R0: Add @, $, and ` to the basic character set
<https://wg21.link/p2558>
o PBrett asked whether this proposal is an SG16 concern.
o Tom replied that we are the encoding experts and best equipped
within the committee to evaluate whether these changes will have
consequences for source character sets used in practice; our
affirmation of the proposal will hopefully ease progress through
other groups.
o Charlie asked for confirmation that SG22 will review the paper
as well.
o Steve replied affirmatively and stated WG14 has already adopted
the proposal for C.
o Someone (apologies, the editor neglected to record who) noted
the existence of P2342: For a Few Punctuators More and that it
argues that these characters could be used as new operators.
o Tom responded that P2342 is an example of a paper that is not an
SG16 concern; once the characters are available in the source
character set, how they are used is an EWG concern.
o Hubert observed that this introduces a requirement that these
characters be encoded as a single code unit.
o Jens acknowledged and noted that this is required by
[lex.charset]p6 <http://eel.is/c++draft/lex.charset#6> for all
characters in the basic literal character set.
o Jens requested that the paper prose discuss this requirement.
o Steve agreed to add such prose.
o Jens stated that he does not know if this requirement would be
problematic for any existing character sets.
o Steve replied that there are infrequently used EBCDIC code pages
that lack them.
o Jens asked if those code pages also lack other characters.
o Steve responded that the probably do.
o Charlie asked if those problematic EBCDIC code pages have unused
code points that could be used for these characters.
o Tom suggested that digraphs could be introduced to support those
code pages if needed.
o Jens replied that digraphs can't be used inside character or
string literals.
o Steve suggested this concern can be addressed if these
characters start getting used within the standard.
o Jens reported that the addition of these characters to the basic
character set makes them ineligible to be specified via a
/universal-character-name/ (UCN) outside of a character or
string literal due to [lex.charset]p3
<http://eel.is/c++draft/lex.charset#3>.
o PBrett suggested that restriction could be lifted.
o Charlie stated that existing uses of '$' are probably limited to
identifiers and symbol renaming via attributes, pragma
directives, or |asm| labels.
o Hubert reported they may appear in preprocessor stringization.
o PBrett reported they may appear in header names as well.
o Jens noticed that use of a UCN to|#include| a source file named
with one of these characters would become ill-formed.
o PBrett reiterated support for lifting restrictions on use of
UCNs to name characters from the basic character set with the
rationale that we shouldn't ban things that are only bad ideas.
o Charlie stated that, if such incompatibilities were to cause
problems in practice, then ifting that restriction would be the
obvious solution.
o Steve suggested such concerns can be addressed after the paper
is approved; we know programmers want to use these characters.
o Tom asked Steve if he knows whether the UCN concern was
discussed in WG14.
o Steve replied that it was not.
o Hubert asked when UCNs are translated in identifiers and other
preprocessor tokens.
o Jens replied, during translation phase 3 except in quoted
contexts; so translation occurs as a /preprocessing-token/ is
formed.
o PBrett observed that the new UCN restrictions would be limited
to /h-char/ and /q-char/ sequences since these characters aren't
otherwise usable outside quoted contexts.
o Jens replied that they may also appear in a
/preprocessing-token/ used in stringization.
o Hubert stated that a UCN specifying one of these characters
would become ill-formed during stringization.
o Hubert added that, likewise, use of a UCN to specify '$' in an
identfier would become ill-formed.
o Tom asked whether that matters since use of '$' in an identifier
is already an extension.
o Jens noted that these characters currently match the "each
non-whitespace character that cannot be one of the above" case
of /preprocessing-token/
<http://eel.is/c++draft/lex.pptoken#nt:preprocessing-token>.
o Jens stated that this makes such intended use in identifiers
ill-formed since, after this change, such a character would
appear as a lone /preprocessing-token/.
o [ Editor's note: This behavior doesn't seem related to the
proposed change since, previously, a UCN naming one of these
characters would also appear as lone /preprocessing-token/. The
editor is concerned that this portion of the discussion was not
captured accurately. ]
o Jens requested that this discussion be included in the paper.
o PBrett asked whether /h-char/ and /q-char/ sequences remain a
backward compatibility concern.
o Jens replied that they do, but that UCNs in such sequneces
already have implementation-defined behavior; what UCNs mean in
/h-char/ and /q-char/ sequences isn't really defined.
o [ Editor's note: Per [lex.phases]p1.3
<http://eel.is/c++draft/lex.phases#1.3>, UCNs are not recognized
and replaced in /h-char-sequence/ and /q-char-sequence/
sequences and, per [lex.header]p1
<http://eel.is/c++draft/lex.header#1>, these sequences are
mapped in an implementation-defined manner. ]
o Tom suggested that it might be worth updating the paper to
describe how existing implementations behave with respect to
UCNs in preprocessor stringization and |#include| directives.
o Jens asked if there are other ways in which these characters
might plausibly be used today.
o Tom replied that Objective-C uses '@'.
o Jens raised the possibility of concerns regarding imposing
single code unit encoding for these characters on the execution
character set.
o Tom asked Hubert if he was aware of any EBCDIC related concerns.
o Hubert replied that his colleagues in WG14 did not express such
concerns and that he is confident that they would have if they
had any.
o Hubert noted that locales that don't support these characters
would no longer be strictly conforming but could still be
supported as extensions.
o Tom summarized the discussion; there are requests to Steve to
update the prose for several of the items discussed, but that
there do not appear to be any objections to the paper direction.
Tom.
>
> On Sat, Apr 23, 2022, 09:04 Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
>
> SG16 will hold a telecon on Wednesday, April 27th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220427T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:http://eel.is/c++draft/lex.header#1
>
> * P2286R7: Formatting Ranges
> <https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html>
> o Review recent updates and confirm direction.
> o (The link is to a draft revision attached to the LWG wiki
> page)
> * P2558R0: Add @, $, and ` to the basic character set
> <https://wg21.link/p2558>
> o Continue review pending the availability of an updated
> revision.
> * D2572R0: std::format() fill character allowances
> <https://rawgit.com/tahonermann/std-proposals/master/d2572r0.html>
> o Continue review pending the availability of an updated
> revision.
>
> LWG has tentatively approved
> <https://github.com/cplusplus/papers/issues/977#issuecomment-1107071392>
> P2286R7 as linked above pending SG16 confirmation of design intent
> and wording for non-Unicode encodings. Please review the SG16
> 2022-01-26 meeting summary
> <https://github.com/sg16-unicode/sg16-meetings#january-26th-2022>
> for previously discussed concerns. A few wording details are still
> being discussed outside of the mailing lists, so the paper
> revision linked above may still receive some updates prior to our
> meeting.
>
> Further review of P2558R0 and D2572R0 are scheduled pending
> updates from the authors to address feedback from the SG16
> 2022-04-13 telecon
> <https://github.com/sg16-unicode/sg16-meetings#april-13th-2022>
> (which I have yet to publish).
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> My notes indicate that it's narrative text about the implications and
> consequences for putting the three characters into the set. I can have
> that. I don't think we found wording changes.
Yes, that sounds correct.
I'm working to get the minutes from the meeting published today (my
apologies for being so delinquent again). A draft of the P2558R0
discussion follows.
Steve, assuming it is realistic for you to have an updated paper for the
meeting tomorrow, could you 1) cross check your updates with the
requests below, and 2) reply to this thread with an updated
draft/revision once you have one available.
*
P2558R0: Add @, $, and ` to the basic character set
<https://wg21.link/p2558>
o PBrett asked whether this proposal is an SG16 concern.
o Tom replied that we are the encoding experts and best equipped
within the committee to evaluate whether these changes will have
consequences for source character sets used in practice; our
affirmation of the proposal will hopefully ease progress through
other groups.
o Charlie asked for confirmation that SG22 will review the paper
as well.
o Steve replied affirmatively and stated WG14 has already adopted
the proposal for C.
o Someone (apologies, the editor neglected to record who) noted
the existence of P2342: For a Few Punctuators More and that it
argues that these characters could be used as new operators.
o Tom responded that P2342 is an example of a paper that is not an
SG16 concern; once the characters are available in the source
character set, how they are used is an EWG concern.
o Hubert observed that this introduces a requirement that these
characters be encoded as a single code unit.
o Jens acknowledged and noted that this is required by
[lex.charset]p6 <http://eel.is/c++draft/lex.charset#6> for all
characters in the basic literal character set.
o Jens requested that the paper prose discuss this requirement.
o Steve agreed to add such prose.
o Jens stated that he does not know if this requirement would be
problematic for any existing character sets.
o Steve replied that there are infrequently used EBCDIC code pages
that lack them.
o Jens asked if those code pages also lack other characters.
o Steve responded that the probably do.
o Charlie asked if those problematic EBCDIC code pages have unused
code points that could be used for these characters.
o Tom suggested that digraphs could be introduced to support those
code pages if needed.
o Jens replied that digraphs can't be used inside character or
string literals.
o Steve suggested this concern can be addressed if these
characters start getting used within the standard.
o Jens reported that the addition of these characters to the basic
character set makes them ineligible to be specified via a
/universal-character-name/ (UCN) outside of a character or
string literal due to [lex.charset]p3
<http://eel.is/c++draft/lex.charset#3>.
o PBrett suggested that restriction could be lifted.
o Charlie stated that existing uses of '$' are probably limited to
identifiers and symbol renaming via attributes, pragma
directives, or |asm| labels.
o Hubert reported they may appear in preprocessor stringization.
o PBrett reported they may appear in header names as well.
o Jens noticed that use of a UCN to|#include| a source file named
with one of these characters would become ill-formed.
o PBrett reiterated support for lifting restrictions on use of
UCNs to name characters from the basic character set with the
rationale that we shouldn't ban things that are only bad ideas.
o Charlie stated that, if such incompatibilities were to cause
problems in practice, then ifting that restriction would be the
obvious solution.
o Steve suggested such concerns can be addressed after the paper
is approved; we know programmers want to use these characters.
o Tom asked Steve if he knows whether the UCN concern was
discussed in WG14.
o Steve replied that it was not.
o Hubert asked when UCNs are translated in identifiers and other
preprocessor tokens.
o Jens replied, during translation phase 3 except in quoted
contexts; so translation occurs as a /preprocessing-token/ is
formed.
o PBrett observed that the new UCN restrictions would be limited
to /h-char/ and /q-char/ sequences since these characters aren't
otherwise usable outside quoted contexts.
o Jens replied that they may also appear in a
/preprocessing-token/ used in stringization.
o Hubert stated that a UCN specifying one of these characters
would become ill-formed during stringization.
o Hubert added that, likewise, use of a UCN to specify '$' in an
identfier would become ill-formed.
o Tom asked whether that matters since use of '$' in an identifier
is already an extension.
o Jens noted that these characters currently match the "each
non-whitespace character that cannot be one of the above" case
of /preprocessing-token/
<http://eel.is/c++draft/lex.pptoken#nt:preprocessing-token>.
o Jens stated that this makes such intended use in identifiers
ill-formed since, after this change, such a character would
appear as a lone /preprocessing-token/.
o [ Editor's note: This behavior doesn't seem related to the
proposed change since, previously, a UCN naming one of these
characters would also appear as lone /preprocessing-token/. The
editor is concerned that this portion of the discussion was not
captured accurately. ]
o Jens requested that this discussion be included in the paper.
o PBrett asked whether /h-char/ and /q-char/ sequences remain a
backward compatibility concern.
o Jens replied that they do, but that UCNs in such sequneces
already have implementation-defined behavior; what UCNs mean in
/h-char/ and /q-char/ sequences isn't really defined.
o [ Editor's note: Per [lex.phases]p1.3
<http://eel.is/c++draft/lex.phases#1.3>, UCNs are not recognized
and replaced in /h-char-sequence/ and /q-char-sequence/
sequences and, per [lex.header]p1
<http://eel.is/c++draft/lex.header#1>, these sequences are
mapped in an implementation-defined manner. ]
o Tom suggested that it might be worth updating the paper to
describe how existing implementations behave with respect to
UCNs in preprocessor stringization and |#include| directives.
o Jens asked if there are other ways in which these characters
might plausibly be used today.
o Tom replied that Objective-C uses '@'.
o Jens raised the possibility of concerns regarding imposing
single code unit encoding for these characters on the execution
character set.
o Tom asked Hubert if he was aware of any EBCDIC related concerns.
o Hubert replied that his colleagues in WG14 did not express such
concerns and that he is confident that they would have if they
had any.
o Hubert noted that locales that don't support these characters
would no longer be strictly conforming but could still be
supported as extensions.
o Tom summarized the discussion; there are requests to Steve to
update the prose for several of the items discussed, but that
there do not appear to be any objections to the paper direction.
Tom.
>
> On Sat, Apr 23, 2022, 09:04 Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
>
> SG16 will hold a telecon on Wednesday, April 27th, at 19:30 UTC
> (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220427T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>
> The agenda is:http://eel.is/c++draft/lex.header#1
>
> * P2286R7: Formatting Ranges
> <https://wiki.edg.com/pub/Wg21telecons2022/LibraryWorkingGroup/p2286r7.html>
> o Review recent updates and confirm direction.
> o (The link is to a draft revision attached to the LWG wiki
> page)
> * P2558R0: Add @, $, and ` to the basic character set
> <https://wg21.link/p2558>
> o Continue review pending the availability of an updated
> revision.
> * D2572R0: std::format() fill character allowances
> <https://rawgit.com/tahonermann/std-proposals/master/d2572r0.html>
> o Continue review pending the availability of an updated
> revision.
>
> LWG has tentatively approved
> <https://github.com/cplusplus/papers/issues/977#issuecomment-1107071392>
> P2286R7 as linked above pending SG16 confirmation of design intent
> and wording for non-Unicode encodings. Please review the SG16
> 2022-01-26 meeting summary
> <https://github.com/sg16-unicode/sg16-meetings#january-26th-2022>
> for previously discussed concerns. A few wording details are still
> being discussed outside of the mailing lists, so the paper
> revision linked above may still receive some updates prior to our
> meeting.
>
> Further review of P2558R0 and D2572R0 are scheduled pending
> updates from the authors to address feedback from the SG16
> 2022-04-13 telecon
> <https://github.com/sg16-unicode/sg16-meetings#april-13th-2022>
> (which I have yet to publish).
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2022-04-26 15:39:10