C++ Logo

sg16

Advanced search

[SG16] Characters literals in preprocessor conditionals

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 11 Jun 2020 17:40:36 +0200
Hey.

I did some research of the use of #if <boolean expression involving
character literals> on the open source libraries in vcpkg (aggregating 90
millions lines of C and C++ code)

There are 3 use cases:

   - Detect ASCII vs EBCDIC at compile time, using different methods
   including #if 'A' == '\301' which I am not sure how it works ( a more
   comment attempt is #if 'A' == 65)
   - Comparing with a #define, most frequent pattern being to compare with
   a path separator, as a means of compile time configuration.
   - Trying to detect other encodings

In total this feature is used less than 80 times (lots of duplication)
My method is crude so my numbers are not precise (but that's the ballpark)

I found one use of #if (L'\0' - 1 < 0), which is fun, and #if
WCHAR_PATH_SEPARATOR != L'/' no other use of prefixes.

There is a very strong expectation that these literals are in the execution
encoding.
My tests with various compilers present on godbolt confirm that
implementations do convert these characters literals as they would in phase
5.

All but one of these usages seem to be in C libraries, and C headers.

The wording states:

This includes interpreting character-literal
<http://eel.is/c++draft/lex.ccon#nt:character-literal>s, which may involve
converting escape sequences into execution character set members.
<http://eel.is/c++draft/cpp#cond-12.sentence-4>

Whether the numeric value for these character-literal
<http://eel.is/c++draft/lex.ccon#nt:character-literal>s matches the value
obtained when an identical character-literal
<http://eel.is/c++draft/lex.ccon#nt:character-literal> occurs in an
expression (other than within a #if or #elif directive) is
implementation-defined. <http://eel.is/c++draft/cpp#cond-12.sentence-5>


I am not sure that deprecating this feature is necessary, even if the C++
use case is better covered by https://wg21.link/p1885r2.
But, it does seem valuable to modify the wording to specify that these
literals are interpreted as they would be in C expressions, as these are
used by libraries intended to be portable. If the detection is wrong the
code compiles nevertheless and has the wrong runtime behavior.

Corentin.

Received on 2020-06-11 10:43:56