Date: Tue, 14 Oct 2025 15:23:07 -0400
On 10/14/25 2:45 PM, Alisdair Meredith wrote:
>
>
>> On Oct 14, 2025, at 1:19 PM, Tom Honermann <tom_at_[hidden]> wrote:
>>
>> On 10/14/25 9:37 AM, Alisdair Meredith wrote:
>>> Catching up with the posted resolutions…
>>>
>>> As submitter of US 8-021, the intent was to disambiguate the text as
>>> it is not clear to me
>>> whether clause 5 or 15 dominates. I am happy with any resolution
>>> that picks a direction,
>>> I just want to be clear in the text. It may be that adding the
>>> CA-022 example is sufficient.
>>
>> The example from CA-022 is helpful, but also quite subtle.
>> #<U+3000>else is a /conditionally-supported-directive
>> <https://eel.is/c++draft/cpp.pre#nt:conditionally-supported-directive>/ which,
>> with Clang's extension to treat U+3000 as whitespace, should be (but
>> isn't) treated as a #else directive (and therefore not part of the
>> skipped group). GCC and MSVC diagnose the U+3000 character in the
>> following example (with <U+3000> replaced by the actual character),
>> but all of GCC, MSVC, Clang, and EDG consider lines 2-4 as part of
>> the same skipped group. https://godbolt.org/z/7rErvncMz.
>>
>> #if 0
>> blah
>> #<U+3000>else
>> blah
>> #endif
>>
>> That example doesn't help elucidate what happens with "other"
>> pp-tokens that appear in /text-line
>> <https://eel.is/c++draft/cpp.pre#nt:text-line>/s.
>>
>> If the intent is to clarify what the standard currently specifies, I
>> think it would be useful to include some additional examples. Perhaps
>> something like this example that involves a non-identifier character
>> in /text-line/. GCC rejects both occurrences of ⊥, Clang rejects the
>> one in the skipped group, EDG and MSVC accept both occurrences.
>> https://godbolt.org/z/5j3z88dcz.
>>
>> #define M(x)
>> #if 0
>> ⊥
>> #else
>> M(⊥)
>> #endif
>>
>
> Yes — the intent of the comment is “make the ill-formed vs well-former
> interpretation clear”,
> and the proposed resolution was simply a starting point — I come from
> an era where you
> must provide a resolution for every comment that you file, expecting
> the committee to do
> the work if the resolution does not adequately or correctly address
> the stated issue.
I have much appreciation for that era :)
Perhaps a good path forward is to request a CWG issue that describes the
status quo for these pp-tokens with acknowledgement to [lex.pptoken]p1
<https://eel.is/c++draft/lex.pptoken#1> and [cpp.pre]p5
<https://eel.is/c++draft/cpp.pre#5> and how the current restrictions
apply differently to the various productions of /group-part
<https://eel.is/c++draft/cpp.pre#nt:group-part>/. The examples we've
been collecting can then demonstrate the existing implementation
divergence and conformance issues.
Tom.
>>>
>>> Similarly with USv63-155, my concern is that there was library
>>> wording that *did* affect
>>> the core language for literal encodings. Corentin has explained to
>>> me that
>>> "of one of the execution character sets” binds to just the locale
>>> specific encoding and
>>> not to both encodings (literal *and* locale-specific) so I will file
>>> an editorial PR to make
>>> that clearer.
>>>
>>> One that is clarified, the locale-specific encoding turns out to be
>>> purely a library term,
>>> so my intent is now to move that part of the specification up into
>>> the library introduction,
>>> moving text the other way, so that there is no confusion about
>>> binding Core language
>>> to library specification. That part might be a C++29 effort to do
>>> right though, as an
>>> issue.
>>
>> I think that makes sense. The "or a locale-specific encoding of one
>> of the execution character sets" phrasing in [lex.charset]p5
>> <https://eel.is/c++draft/lex.charset#5> does seem odd given that
>> [character.seq.general]p(1.2)
>> <https://eel.is/c++draft/character.seq.general#1.2> already defines
>> the execution character set terms as locale specific (which is a
>> defined term in [character.seq.general]p(1.1)
>> <https://eel.is/c++draft/character.seq.general#1.1>). Perhaps "_Each_
>> literal encoding and _each_ locale-specific encoding of _each_ of the
>> execution character sets ..." would be an improvement.
>>
>
> Thanks — that reads better than what I was going with which was simply
> to reverse the
> order of the terms, so that no accidental misread was possible. I may
> still do that in
> addition to adding `each` for belt-and-braces though — I will see how
> it looks on the page.
>>
>> Tom.
>>
>>>
>>>
>>> tldr; sorry I missed the meeting rejecting my comments, but I
>>> (mostly) agree the those
>>> resolutions :)
>>>
>>> AlisdairM
>>>
>>>> On Oct 13, 2025, at 11:58 PM, Tom Honermann via SG16
>>>> <sg16_at_[hidden]> wrote:
>>>>
>>>> My rough notes for this meeting are available on the WG21 wiki here
>>>> <https://wiki.edg.com/bin/view/Wg21telecons2025/SG16Teleconference2025-10-08>.
>>>> The GH issues for US 8-021
>>>> <https://github.com/cplusplus/nbballot/issues/595>, CA-022
>>>> <https://github.com/cplusplus/nbballot/issues/596>, and US 63-115
>>>> <https://github.com/cplusplus/nbballot/issues/693> have been updated.
>>>>
>>>> We'll conclude discussion for US 189-304
>>>> <https://github.com/cplusplus/nbballot/issues/879> and US 5-018
>>>> <https://github.com/cplusplus/nbballot/issues/592> during the
>>>> 20205-10-22 SG16 meeting; that will be our last opportunity for
>>>> review prior to the SG21 meeting in Kona.
>>>>
>>>> Tom.
>>>>
>>>> On 10/8/25 1:12 AM, Tom Honermann via SG16 wrote:
>>>>>
>>>>> SG16 will hold a meeting *today*, Wednesday, October 8th, at 19:30
>>>>> UTC (timezone conversion
>>>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20251008T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>>>>
>>>>> If you need a .ics file to import into your calendar, you can
>>>>> download it here
>>>>> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/F348AE24-5E93-4596-98A9-4AC123F1BCC0.ics?export>.
>>>>>
>>>>> The C++26 CD has been published! You've all read every last page
>>>>> in detail and submitted comments through your NB for the slight
>>>>> imperfections you think you have found. Those comments have been
>>>>> compiled, collated, and crunched to produce the classy compendium
>>>>> that is N5028 <https://wg21.link/n5028>. It is now time to see if
>>>>> your fellow WG21 members agree with your findings!
>>>>>
>>>>> * US 8-021 <https://github.com/cplusplus/nbballot/issues/595>:
>>>>> 5.5p1 [lex.pptoken] Allow skipping non-preprocessing tokens
>>>>> via #if
>>>>> * CA-022 <https://github.com/cplusplus/nbballot/issues/596>:
>>>>> 5.5p1 [lex.pptoken] Add example for ill-formed
>>>>> non-preprocessing tokens
>>>>> * US 63-115 <https://github.com/cplusplus/nbballot/issues/693>:
>>>>> 16.3.3.3.4.1p01.2 [character.seq.general] Move
>>>>> [character.seq.general] to the core wording
>>>>> * US 189-304 <https://github.com/cplusplus/nbballot/issues/879>:
>>>>> 31.12.6.1, 31.12.6.5.6, 31.12.6.5.7, D.22.2 rename
>>>>> filesystem::path methods
>>>>> * US 5-018 <https://github.com/cplusplus/nbballot/issues/592>: 5
>>>>> [lex] Define "whitespace character"
>>>>>
>>>>> If you are aware of other NB comments that SG16 should review,
>>>>> please let me know. The above are all the ones I found that looked
>>>>> relevant.
>>>>>
>>>>> *US 8-021* and *CA-022* both pose the question of whether a
>>>>> preprocessing token consisting of a "other" character (e.g., a
>>>>> non-whitespace character that is not used for lexical purposes
>>>>> outside of literals; [lex.pptoken]
>>>>> <https://eel.is/c++draft/lex.pptoken>) renders the program
>>>>> ill-formed if it appears within the group of a conditional
>>>>> inclusion directive whose control condition evaluates to false
>>>>> ([cpp.cond] <https://eel.is/c++draft/cpp.cond>). Implementations
>>>>> tend to think so (https://godbolt.org/z/rahj3jvTo), presumably
>>>>> because they like [cpp.pre]p5
>>>>> <https://eel.is/c++draft/cpp.pre#5>, though GCC appears to be
>>>>> slightly concussed by the example. CA-022 in particular requests
>>>>> the addition of an example or additional elaboration if the intent
>>>>> is that the following is ill-formed.
>>>>>
>>>>> #if 0
>>>>> ` // Ok?
>>>>> " // Ok?
>>>>> # \N{IDEOGRAPHIC SPACE} else // Ok?
>>>>> #endif
>>>>>
>>>>> *US 63-115* is a request to move the definitions of the execution
>>>>> character set and execution wide-character set from
>>>>> [character.seq.general]p(1.2)
>>>>> <https://eel.is/c++draft/character.seq.general#1.2> in the
>>>>> standard library wording to [lex.charset]
>>>>> <https://eel.is/c++draft/lex.charset> in the core language
>>>>> wording. That is where they were prior to acceptance of P2314R4
>>>>> (Character sets and encodings) <https://wg21.link/p2314r4> in
>>>>> C++23. The 2021-02-24
>>>>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#february-24th-2021>
>>>>> and 2021-03-10
>>>>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#march-10th-2021>
>>>>> SG16 meeting summaries contain the discussions of that paper. The
>>>>> discussion indicates that these terms were retained for
>>>>> compatibility with the C standard.
>>>>>
>>>>> *US 189-304* asserts that the system_encoded_string() member
>>>>> function ([fs.path.native.obs]
>>>>> <https://eel.is/c++draft/fs.path.native.obs>) and its generic
>>>>> sibling added to std::filesystem::path by P2319R5 (Prevent path
>>>>> presentation problems) <https://wg21.link/p2319r5> are misnamed
>>>>> because 1) the C++ standard does not have a definition for "system
>>>>> encoding", and 2) colloquial use of that term would lead readers
>>>>> to an unintended encoding; raise your hand if you are familiar
>>>>> with the AreFileApisANSI()
>>>>> <https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-arefileapisansi>
>>>>> Win32 function and its relevance to std::filesystem::path (see the
>>>>> mysterious note in [fs.path.type.cvt]p(2.1)
>>>>> <https://eel.is/c++draft/fs.path.type.cvt#2.1>). The proposed
>>>>> change includes a few options to choose from, but we don't have to
>>>>> limit ourselves to the options suggested by the /wise/ and
>>>>> /learned/ author of that comment.
>>>>>
>>>>> *US 5-018* argues for adopting P3657R0 (A Grammar for Whitespace
>>>>> Characters) <https://wg21.link/p3657r0> and its proposed changes
>>>>> for the specification of whitespace characters in lexical
>>>>> analysis. I believe Alisdair still intends to produce a new
>>>>> revision of this paper, but has been busy with other higher
>>>>> priority papers. Unless he shows up excited to share a shiny new
>>>>> revision, we'll postpone discussion of this comment and the
>>>>> associated paper until next time.
>>>>>
>>>>> Tom.
>>>>>
>>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>> Link to this post: http://lists.isocpp.org/sg16/2025/10/4626.php
>>>
>
>
>
>> On Oct 14, 2025, at 1:19 PM, Tom Honermann <tom_at_[hidden]> wrote:
>>
>> On 10/14/25 9:37 AM, Alisdair Meredith wrote:
>>> Catching up with the posted resolutions…
>>>
>>> As submitter of US 8-021, the intent was to disambiguate the text as
>>> it is not clear to me
>>> whether clause 5 or 15 dominates. I am happy with any resolution
>>> that picks a direction,
>>> I just want to be clear in the text. It may be that adding the
>>> CA-022 example is sufficient.
>>
>> The example from CA-022 is helpful, but also quite subtle.
>> #<U+3000>else is a /conditionally-supported-directive
>> <https://eel.is/c++draft/cpp.pre#nt:conditionally-supported-directive>/ which,
>> with Clang's extension to treat U+3000 as whitespace, should be (but
>> isn't) treated as a #else directive (and therefore not part of the
>> skipped group). GCC and MSVC diagnose the U+3000 character in the
>> following example (with <U+3000> replaced by the actual character),
>> but all of GCC, MSVC, Clang, and EDG consider lines 2-4 as part of
>> the same skipped group. https://godbolt.org/z/7rErvncMz.
>>
>> #if 0
>> blah
>> #<U+3000>else
>> blah
>> #endif
>>
>> That example doesn't help elucidate what happens with "other"
>> pp-tokens that appear in /text-line
>> <https://eel.is/c++draft/cpp.pre#nt:text-line>/s.
>>
>> If the intent is to clarify what the standard currently specifies, I
>> think it would be useful to include some additional examples. Perhaps
>> something like this example that involves a non-identifier character
>> in /text-line/. GCC rejects both occurrences of ⊥, Clang rejects the
>> one in the skipped group, EDG and MSVC accept both occurrences.
>> https://godbolt.org/z/5j3z88dcz.
>>
>> #define M(x)
>> #if 0
>> ⊥
>> #else
>> M(⊥)
>> #endif
>>
>
> Yes — the intent of the comment is “make the ill-formed vs well-former
> interpretation clear”,
> and the proposed resolution was simply a starting point — I come from
> an era where you
> must provide a resolution for every comment that you file, expecting
> the committee to do
> the work if the resolution does not adequately or correctly address
> the stated issue.
I have much appreciation for that era :)
Perhaps a good path forward is to request a CWG issue that describes the
status quo for these pp-tokens with acknowledgement to [lex.pptoken]p1
<https://eel.is/c++draft/lex.pptoken#1> and [cpp.pre]p5
<https://eel.is/c++draft/cpp.pre#5> and how the current restrictions
apply differently to the various productions of /group-part
<https://eel.is/c++draft/cpp.pre#nt:group-part>/. The examples we've
been collecting can then demonstrate the existing implementation
divergence and conformance issues.
Tom.
>>>
>>> Similarly with USv63-155, my concern is that there was library
>>> wording that *did* affect
>>> the core language for literal encodings. Corentin has explained to
>>> me that
>>> "of one of the execution character sets” binds to just the locale
>>> specific encoding and
>>> not to both encodings (literal *and* locale-specific) so I will file
>>> an editorial PR to make
>>> that clearer.
>>>
>>> One that is clarified, the locale-specific encoding turns out to be
>>> purely a library term,
>>> so my intent is now to move that part of the specification up into
>>> the library introduction,
>>> moving text the other way, so that there is no confusion about
>>> binding Core language
>>> to library specification. That part might be a C++29 effort to do
>>> right though, as an
>>> issue.
>>
>> I think that makes sense. The "or a locale-specific encoding of one
>> of the execution character sets" phrasing in [lex.charset]p5
>> <https://eel.is/c++draft/lex.charset#5> does seem odd given that
>> [character.seq.general]p(1.2)
>> <https://eel.is/c++draft/character.seq.general#1.2> already defines
>> the execution character set terms as locale specific (which is a
>> defined term in [character.seq.general]p(1.1)
>> <https://eel.is/c++draft/character.seq.general#1.1>). Perhaps "_Each_
>> literal encoding and _each_ locale-specific encoding of _each_ of the
>> execution character sets ..." would be an improvement.
>>
>
> Thanks — that reads better than what I was going with which was simply
> to reverse the
> order of the terms, so that no accidental misread was possible. I may
> still do that in
> addition to adding `each` for belt-and-braces though — I will see how
> it looks on the page.
>>
>> Tom.
>>
>>>
>>>
>>> tldr; sorry I missed the meeting rejecting my comments, but I
>>> (mostly) agree the those
>>> resolutions :)
>>>
>>> AlisdairM
>>>
>>>> On Oct 13, 2025, at 11:58 PM, Tom Honermann via SG16
>>>> <sg16_at_[hidden]> wrote:
>>>>
>>>> My rough notes for this meeting are available on the WG21 wiki here
>>>> <https://wiki.edg.com/bin/view/Wg21telecons2025/SG16Teleconference2025-10-08>.
>>>> The GH issues for US 8-021
>>>> <https://github.com/cplusplus/nbballot/issues/595>, CA-022
>>>> <https://github.com/cplusplus/nbballot/issues/596>, and US 63-115
>>>> <https://github.com/cplusplus/nbballot/issues/693> have been updated.
>>>>
>>>> We'll conclude discussion for US 189-304
>>>> <https://github.com/cplusplus/nbballot/issues/879> and US 5-018
>>>> <https://github.com/cplusplus/nbballot/issues/592> during the
>>>> 20205-10-22 SG16 meeting; that will be our last opportunity for
>>>> review prior to the SG21 meeting in Kona.
>>>>
>>>> Tom.
>>>>
>>>> On 10/8/25 1:12 AM, Tom Honermann via SG16 wrote:
>>>>>
>>>>> SG16 will hold a meeting *today*, Wednesday, October 8th, at 19:30
>>>>> UTC (timezone conversion
>>>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20251008T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>).
>>>>>
>>>>> If you need a .ics file to import into your calendar, you can
>>>>> download it here
>>>>> <https://documents.isocpp.org/remote.php/dav/public-calendars/R7imgS2LJD9xfeWN/F348AE24-5E93-4596-98A9-4AC123F1BCC0.ics?export>.
>>>>>
>>>>> The C++26 CD has been published! You've all read every last page
>>>>> in detail and submitted comments through your NB for the slight
>>>>> imperfections you think you have found. Those comments have been
>>>>> compiled, collated, and crunched to produce the classy compendium
>>>>> that is N5028 <https://wg21.link/n5028>. It is now time to see if
>>>>> your fellow WG21 members agree with your findings!
>>>>>
>>>>> * US 8-021 <https://github.com/cplusplus/nbballot/issues/595>:
>>>>> 5.5p1 [lex.pptoken] Allow skipping non-preprocessing tokens
>>>>> via #if
>>>>> * CA-022 <https://github.com/cplusplus/nbballot/issues/596>:
>>>>> 5.5p1 [lex.pptoken] Add example for ill-formed
>>>>> non-preprocessing tokens
>>>>> * US 63-115 <https://github.com/cplusplus/nbballot/issues/693>:
>>>>> 16.3.3.3.4.1p01.2 [character.seq.general] Move
>>>>> [character.seq.general] to the core wording
>>>>> * US 189-304 <https://github.com/cplusplus/nbballot/issues/879>:
>>>>> 31.12.6.1, 31.12.6.5.6, 31.12.6.5.7, D.22.2 rename
>>>>> filesystem::path methods
>>>>> * US 5-018 <https://github.com/cplusplus/nbballot/issues/592>: 5
>>>>> [lex] Define "whitespace character"
>>>>>
>>>>> If you are aware of other NB comments that SG16 should review,
>>>>> please let me know. The above are all the ones I found that looked
>>>>> relevant.
>>>>>
>>>>> *US 8-021* and *CA-022* both pose the question of whether a
>>>>> preprocessing token consisting of a "other" character (e.g., a
>>>>> non-whitespace character that is not used for lexical purposes
>>>>> outside of literals; [lex.pptoken]
>>>>> <https://eel.is/c++draft/lex.pptoken>) renders the program
>>>>> ill-formed if it appears within the group of a conditional
>>>>> inclusion directive whose control condition evaluates to false
>>>>> ([cpp.cond] <https://eel.is/c++draft/cpp.cond>). Implementations
>>>>> tend to think so (https://godbolt.org/z/rahj3jvTo), presumably
>>>>> because they like [cpp.pre]p5
>>>>> <https://eel.is/c++draft/cpp.pre#5>, though GCC appears to be
>>>>> slightly concussed by the example. CA-022 in particular requests
>>>>> the addition of an example or additional elaboration if the intent
>>>>> is that the following is ill-formed.
>>>>>
>>>>> #if 0
>>>>> ` // Ok?
>>>>> " // Ok?
>>>>> # \N{IDEOGRAPHIC SPACE} else // Ok?
>>>>> #endif
>>>>>
>>>>> *US 63-115* is a request to move the definitions of the execution
>>>>> character set and execution wide-character set from
>>>>> [character.seq.general]p(1.2)
>>>>> <https://eel.is/c++draft/character.seq.general#1.2> in the
>>>>> standard library wording to [lex.charset]
>>>>> <https://eel.is/c++draft/lex.charset> in the core language
>>>>> wording. That is where they were prior to acceptance of P2314R4
>>>>> (Character sets and encodings) <https://wg21.link/p2314r4> in
>>>>> C++23. The 2021-02-24
>>>>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#february-24th-2021>
>>>>> and 2021-03-10
>>>>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#march-10th-2021>
>>>>> SG16 meeting summaries contain the discussions of that paper. The
>>>>> discussion indicates that these terms were retained for
>>>>> compatibility with the C standard.
>>>>>
>>>>> *US 189-304* asserts that the system_encoded_string() member
>>>>> function ([fs.path.native.obs]
>>>>> <https://eel.is/c++draft/fs.path.native.obs>) and its generic
>>>>> sibling added to std::filesystem::path by P2319R5 (Prevent path
>>>>> presentation problems) <https://wg21.link/p2319r5> are misnamed
>>>>> because 1) the C++ standard does not have a definition for "system
>>>>> encoding", and 2) colloquial use of that term would lead readers
>>>>> to an unintended encoding; raise your hand if you are familiar
>>>>> with the AreFileApisANSI()
>>>>> <https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-arefileapisansi>
>>>>> Win32 function and its relevance to std::filesystem::path (see the
>>>>> mysterious note in [fs.path.type.cvt]p(2.1)
>>>>> <https://eel.is/c++draft/fs.path.type.cvt#2.1>). The proposed
>>>>> change includes a few options to choose from, but we don't have to
>>>>> limit ourselves to the options suggested by the /wise/ and
>>>>> /learned/ author of that comment.
>>>>>
>>>>> *US 5-018* argues for adopting P3657R0 (A Grammar for Whitespace
>>>>> Characters) <https://wg21.link/p3657r0> and its proposed changes
>>>>> for the specification of whitespace characters in lexical
>>>>> analysis. I believe Alisdair still intends to produce a new
>>>>> revision of this paper, but has been busy with other higher
>>>>> priority papers. Unless he shows up excited to share a shiny new
>>>>> revision, we'll postpone discussion of this comment and the
>>>>> associated paper until next time.
>>>>>
>>>>> Tom.
>>>>>
>>>>>
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>> Link to this post: http://lists.isocpp.org/sg16/2025/10/4626.php
>>>
>
Received on 2025-10-14 19:23:12
