liaison: [wg14/wg21 liaison] adding punctuator tokens

From: Jens Gustedt <jens.gustedt_at_[hidden]>
Date: Thu, 15 Apr 2021 10:43:51 +0200

Hi everybody,
unfortunately for all the discussions over the years that I tried to
initiate about this subject, it seems that I never (even in recent
discussions with some of you) had technical feedback why adding new
punctuators to C (or C++) would not be possible.

What I did get where "we don't like that", "I never would use that" or
"over my dead body" comments, but (at least that is my impression) I
never had somebody pointing at a real technical difficulty.

So what is this all about? This is about moving C and C++ into the
21st century. We have roughly 32 years of standardized C, now. Imagine
in 32 years from now people would still not be able to use normal
technical characters in their preferred programming language.

Also, this is not about forcing implementations that do not have these
characters in their source or execution character set, to integrate
them or to change anything. But this is about implementations that
already have these characters in their extended source character set
(in particular those that use UTF-8, UTF-16 or UTF-32) to accept these
characters as punctuator tokens.

The "only" thing that implementations that have such a character would
have to add is to produce the correct punctuator token in translation
phase 3. So for example if the extended source character set has the
character `∷` (codepoint x2237) this should result in a token that is
the same or equivalent to `::` (two adjacent colons). Whatever they do
then for the rare case of stringification of these tokens I wouldn't
care much, as long as they are consistent with themselves.

There is no conflict with existing C implementations or C source code
out there that I know of. (And my guess is that for C++ this should be
the same.)

    - All implementations have to accept these characters already in
      their extended source and execution character set, because in
      string or character literals they can be entered with the
      `\u2237` or `\U00002237` notation. So at the worst,
      stringification of such a token will result in some weird `¿`
      character in a wide string, no big deal.

    - Codepoints that are punctuators are not allowed in identifiers,
      so currently these characters can only be used in (wide) string
      or character literals.

A positive side effect for implementations that have these characters
and for the users that use them would be that parsing of C and C++
would become easier.

      - For C (and probably also C++) adding `⟦` would avoid the
        ambiguity that we are about to introduce between array bounds,
        attributes and lambdas.

      - For C++, having `‹` and `›` could resolve the ambiguity of `<`
        and `>` as relational operators and as template delimiters, or
        the even weirder semantic clash between two closing templates
        `››` and a `>>` token.

So I am very eager to hear your opinion about *technical* difficulties
with this, but I'd also very much appreciate if we could not expand
this to a general culture war about personal preferences, other
languages than English in sources, or the keys that you have on your
keyboard.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

Received on 2021-04-15 03:43:57