C++ Logo

liaison

Advanced search

[wg14/wg21 liaison] Inconsistent handling of universal character names.

From: Corentin <corentin.jabot_at_[hidden]>
Date: Sun, 16 Apr 2023 18:42:25 +0200
Hey folks.

Both C and C++ recently added $, @, and ` to their basic character set.
But they did it slightly differently

* In C++, they were added to the basic character set, and as such they
cannot appear as UCN
  outside of string and character literals.

* In C, UCN are handled the same way inside and outside of literals and
because \u0024 for example appear in source code inside of strings, $, @
and ` were not added to the basic character set, instead saying
> The basic character set, @ (U+0040 Commercial At), $ (U+0024 Dollar
Sign), and ‘ (U+0060 Grave Accent, "Backtick") shall be present and each
character shall be encoded as a single byte

Now the reason C++ added them to the basic character set was arguably to
open the door for these things to be used as grammar elements maybe one day
- as they are in objective C for example.
If they ever are used as punctuators in C, it may be awkward for
implementers to support punctuators that are expressed in the form of UCNs.

NSLog(\u0040"What?");


Currently, they cannot appear outside of string literals... except that
several implementations
support $ in identifiers.

int \u0024 = 0; would end up being valid C and invalid C++ (with that
unfortunate compiler extension).

Which is not a great situation.

Did C consider aligning itself to C++ here, ie to:

   - to allow any UCNs in string literals
   - Not allow \u0024 \uu0060 and u0040 outside of string literals?

I think that's the best way to avoid this weird incompatibility and avoid
pushing us into a corner of we ever want to do something interesting with @
or $

Sorry for not spotting that earlier.

Thanks,
Corentin

Received on 2023-04-16 16:42:38