Date: Sun, 16 Apr 2023 18:42:25 +0200
Hey folks.
Both C and C++ recently added $, @, and ` to their basic character set.
But they did it slightly differently
* In C++, they were added to the basic character set, and as such they
cannot appear as UCN
outside of string and character literals.
* In C, UCN are handled the same way inside and outside of literals and
because \u0024 for example appear in source code inside of strings, $, @
and ` were not added to the basic character set, instead saying
> The basic character set, @ (U+0040 Commercial At), $ (U+0024 Dollar
Sign), and ‘ (U+0060 Grave Accent, "Backtick") shall be present and each
character shall be encoded as a single byte
Now the reason C++ added them to the basic character set was arguably to
open the door for these things to be used as grammar elements maybe one day
- as they are in objective C for example.
If they ever are used as punctuators in C, it may be awkward for
implementers to support punctuators that are expressed in the form of UCNs.
NSLog(\u0040"What?");
Currently, they cannot appear outside of string literals... except that
several implementations
support $ in identifiers.
int \u0024 = 0; would end up being valid C and invalid C++ (with that
unfortunate compiler extension).
Which is not a great situation.
Did C consider aligning itself to C++ here, ie to:
- to allow any UCNs in string literals
- Not allow \u0024 \uu0060 and u0040 outside of string literals?
I think that's the best way to avoid this weird incompatibility and avoid
pushing us into a corner of we ever want to do something interesting with @
or $
Sorry for not spotting that earlier.
Thanks,
Corentin
Both C and C++ recently added $, @, and ` to their basic character set.
But they did it slightly differently
* In C++, they were added to the basic character set, and as such they
cannot appear as UCN
outside of string and character literals.
* In C, UCN are handled the same way inside and outside of literals and
because \u0024 for example appear in source code inside of strings, $, @
and ` were not added to the basic character set, instead saying
> The basic character set, @ (U+0040 Commercial At), $ (U+0024 Dollar
Sign), and ‘ (U+0060 Grave Accent, "Backtick") shall be present and each
character shall be encoded as a single byte
Now the reason C++ added them to the basic character set was arguably to
open the door for these things to be used as grammar elements maybe one day
- as they are in objective C for example.
If they ever are used as punctuators in C, it may be awkward for
implementers to support punctuators that are expressed in the form of UCNs.
NSLog(\u0040"What?");
Currently, they cannot appear outside of string literals... except that
several implementations
support $ in identifiers.
int \u0024 = 0; would end up being valid C and invalid C++ (with that
unfortunate compiler extension).
Which is not a great situation.
Did C consider aligning itself to C++ here, ie to:
- to allow any UCNs in string literals
- Not allow \u0024 \uu0060 and u0040 outside of string literals?
I think that's the best way to avoid this weird incompatibility and avoid
pushing us into a corner of we ever want to do something interesting with @
or $
Sorry for not spotting that earlier.
Thanks,
Corentin
Received on 2023-04-16 16:42:38