C++ Logo


Advanced search

[wg14/wg21 liaison] Inconsistent handling of universal character names.

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sun, 16 Apr 2023 18:42:25 +0200
Hey folks.

Both C and C++ recently added $, @, and ` to their basic character set.
But they did it slightly differently

* In C++, they were added to the basic character set, and as such they
cannot appear as UCN
  outside of string and character literals.

* In C, UCN are handled the same way inside and outside of literals and
because \u0024 for example appear in source code inside of strings, $, @
and ` were not added to the basic character set, instead saying
> The basic character set, @ (U+0040 Commercial At), $ (U+0024 Dollar
Sign), and ‘ (U+0060 Grave Accent, "Backtick") shall be present and each
character shall be encoded as a single byte

Now the reason C++ added them to the basic character set was arguably to
open the door for these things to be used as grammar elements maybe one day
- as they are in objective C for example.
If they ever are used as punctuators in C, it may be awkward for
implementers to support punctuators that are expressed in the form of UCNs.


Currently, they cannot appear outside of string literals... except that
several implementations
support $ in identifiers.

int \u0024 = 0; would end up being valid C and invalid C++ (with that
unfortunate compiler extension).

Which is not a great situation.

Did C consider aligning itself to C++ here, ie to:

   - to allow any UCNs in string literals
   - Not allow \u0024 \uu0060 and u0040 outside of string literals?

I think that's the best way to avoid this weird incompatibility and avoid
pushing us into a corner of we ever want to do something interesting with @
or $

Sorry for not spotting that earlier.


Received on 2023-04-16 16:42:38