Hey.
I did some research of the use of #if <boolean expression involving character literals> on the open source libraries in vcpkg (aggregating 90 millions lines of C and C++ code)
There are 3 use cases:
- Detect ASCII vs EBCDIC at compile time, using different methods including #if 'A' == '\301' which I am not sure how it works ( a more comment attempt is #if 'A' == 65)
- Comparing with a #define, most frequent pattern being to compare with a path separator, as a means of compile time configuration.
- Trying to detect other encodings
In total this feature is used less than 80 times (lots of duplication)
My method is crude so my numbers are not precise (but that's the ballpark)
I found one use of #if (L'\0' - 1 < 0), which is fun, and #if WCHAR_PATH_SEPARATOR != L'/' no other use of prefixes.
There is a very strong expectation that these literals are in the execution encoding.
My tests with various compilers present on godbolt confirm that implementations do convert these characters literals as they would in phase 5.
All but one of these usages seem to be in C libraries, and C headers.
The wording states:
This includes interpreting
character-literals, which may involve converting escape sequences into execution character set members
. Whether the numeric value for these
character-literals matches the value obtained when an identical
character-literal occurs in an expression (other than within a
#if or
#elif directive) is
implementation-defined
.
I am not sure that deprecating this feature is necessary, even if the C++ use case is better covered by https://wg21.link/p1885r2.
But, it does seem valuable to modify the wording to specify that these literals are interpreted as they would be in C expressions, as these are used by libraries intended to be portable. If the detection is wrong the code compiles nevertheless and has the wrong runtime behavior.
Corentin.