C++ Logo

SG16

Advanced search

Subject: Re: Characters literals in preprocessor conditionals
From: Tom Honermann (tom_at_[hidden])
Date: 2020-06-11 10:53:56


Copying the WG21/WG14 liason list.  For context, refer to C++
[cpp.cond]p12 <http://eel.is/c++draft/cpp.cond#12> and the value of
character literals in conditional preprocessing directives.  Quote:

> Whether the numeric value for these character-literals matches the
> value obtained when an identical character-literal occurs in an
> expression (other than within a #if or #elif directive) is
> implementation-defined.
> [ Note: Thus, the constant expression in the following #if directive
> and if statement ([stmt.if]) is not guaranteed to evaluate to the same
> value in these two contexts:
>
> #if 'z' - 'a' == 25
> if ('z' - 'a' == 25)
>
> — end note
>  ]

On 6/11/20 11:40 AM, Corentin via SG16 wrote:
> Hey.
>
> I did some research of the use of #if <boolean expression involving
> character literals> on the open source libraries in vcpkg (aggregating
> 90 millions lines of C and C++ code)
Nice!
>
> There are 3 use cases:
>
> * Detect ASCII vs EBCDIC at compile time, using different methods
> including #if 'A' == '\301'which I am not sure how it works ( a
> more comment attempt is #if 'A' == 65)
>
\301 == 0xC1 == 'A' in EBCDIC code pages.
>
> * Comparing with a #define, most frequent pattern being to
> compare with a path separator, as a means of compile time
> configuration.
> * Trying to detect other encodings
>
> In total this feature is used less than 80 times (lots of duplication)
> My method is crude so my numbers are not precise (but that's the ballpark)
>
> I found one use of #if (L'\0' - 1 < 0), which is fun, and #if
> WCHAR_PATH_SEPARATOR != L'/' no other use of prefixes.
>
> There is a very strong expectation that these literals are in the
> execution encoding.
Good!
> My tests with various compilers present on godbolt confirm that
> implementations do convert these characters literals as they would in
> phase 5.
>
> All but one of these usages seem to be in C libraries, and C headers.
>
> The wording states:
>
> This includes interpreting character-literal
> <http://eel.is/c++draft/lex.ccon#nt:character-literal>s, which may
> involve converting escape sequences into execution character set
> members. <http://eel.is/c++draft/cpp#cond-12.sentence-4>
> Whether the numeric value for these character-literal
> <http://eel.is/c++draft/lex.ccon#nt:character-literal>s matches the
> value obtained when an identical character-literal
> <http://eel.is/c++draft/lex.ccon#nt:character-literal> occurs in an
> expression (other than within a #if or #elif directive) is
> implementation-defined. <http://eel.is/c++draft/cpp#cond-12.sentence-5>
>
>
> I am not sure that deprecating this feature is necessary, even if the
> C++ use case is better covered by https://wg21.link/p1885r2.
Agreed.
> But, it does seem valuable to modify the wording to specify that these
> literals are interpreted as they would be in C expressions, as these
> are used by libraries intended to be portable. If the detection is
> wrong the code compiles nevertheless and has the wrong runtime behavior.

Agreed.  It would be interesting to know if WG14 members have some
historical perspective on this being implementation-defined. Perhaps
Aaron can assist in getting an answer to that question. Copying the
liason list...

Tom.

>
> Corentin.
>



SG16 list run by sg16-owner@lists.isocpp.org