Date: Wed, 9 Jul 2025 23:40:20 +0200
On 09.07.25 22:21, Jāāā Gustedt wrote:
> Richard,
>
> on Wed, 9 Jul 2025 12:49:26 -0700 you (Richard Smith
> <richardsmith_at_[hidden]>) wrote:
>
>> In addition to # and ##, in the basic source character set there's
>> also ` @ $ \ which satisfy the "single character that's not the start
>> of another preprocessing-token" rule. EDG rejects all of those, GCC
>> and MSVC reject all but $, clang accepts all.
>
> If I understand the C standard correctly these are also all code
> points that are not space and that cannot participate in valid
> tokens. These are accepted in C up to phase 6. As long as they don't
> reach phase 7, for example because they are eaten by a macro parameter
> that is then ignored, all is fine. AFAIU in C this behavior is
> intentional and there is no UB, only syntax errors in phase 7. This
> allows implementations to extend the syntax and integrate for example
> new punctuators in their extensions.
I think C++ matches that expectation: All of the above examples are
valid preprocessing-tokens that are not tokens (and thus are syntax
errors if (and only if) they reach phase 7).
> I know that C++ is more strict here, but I never really understood the
> purpose of that.
I don't think there is a difference for ` @ $, but there is one for
# and ## regarding C vs. C++. Namely, that # (or ##) is a valid token
in phase 7 for C, but not so for C++. See ISO C 6.4p3 (carrying
all of "punctuators" to phase 7) vs. ISO C++ [lex.token], which excludes
preprocessing-operator defined in [lex.operator].
Arguably, it makes sense to keep # and ## away from phase 7,
since those are purely relevant for the preprocessor.
Is the C behavior here an accident of insufficient differentiation
in the "punctuators" grammar non-terminal, or is the appearance
of # and ## in phase 7 for C intentional?
I don't think C++ should bend over backwards to carry arcane
spellings forward to phase 7 just for the questionable benefit of
attributes. If you need some odd spelling there, use a string-literal.
Jens
> Richard,
>
> on Wed, 9 Jul 2025 12:49:26 -0700 you (Richard Smith
> <richardsmith_at_[hidden]>) wrote:
>
>> In addition to # and ##, in the basic source character set there's
>> also ` @ $ \ which satisfy the "single character that's not the start
>> of another preprocessing-token" rule. EDG rejects all of those, GCC
>> and MSVC reject all but $, clang accepts all.
>
> If I understand the C standard correctly these are also all code
> points that are not space and that cannot participate in valid
> tokens. These are accepted in C up to phase 6. As long as they don't
> reach phase 7, for example because they are eaten by a macro parameter
> that is then ignored, all is fine. AFAIU in C this behavior is
> intentional and there is no UB, only syntax errors in phase 7. This
> allows implementations to extend the syntax and integrate for example
> new punctuators in their extensions.
I think C++ matches that expectation: All of the above examples are
valid preprocessing-tokens that are not tokens (and thus are syntax
errors if (and only if) they reach phase 7).
> I know that C++ is more strict here, but I never really understood the
> purpose of that.
I don't think there is a difference for ` @ $, but there is one for
# and ## regarding C vs. C++. Namely, that # (or ##) is a valid token
in phase 7 for C, but not so for C++. See ISO C 6.4p3 (carrying
all of "punctuators" to phase 7) vs. ISO C++ [lex.token], which excludes
preprocessing-operator defined in [lex.operator].
Arguably, it makes sense to keep # and ## away from phase 7,
since those are purely relevant for the preprocessor.
Is the C behavior here an accident of insufficient differentiation
in the "punctuators" grammar non-terminal, or is the appearance
of # and ## in phase 7 for C intentional?
I don't think C++ should bend over backwards to carry arcane
spellings forward to phase 7 just for the questionable benefit of
attributes. If you need some odd spelling there, use a string-literal.
Jens
Received on 2025-07-09 21:40:29