ISOCPP sg16 List: Re: User-defined compile time messages; P2741R0 and P2758R0

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 3 Feb 2023 00:26:58 -0500

On 2/3/23 12:16 AM, Tom Honermann via SG16 wrote:
>
> SG16 currently has two papers in its queue that seek to allow code
> evaluated in constant expression context to produce text to be
> displayed to a (compiler) user at compile time.
>
> * P2741R0 <https://wg21.link/p2741r0>: user-generated static_assert
> messages
> * P2758R0 <https://wg21.link/p2758r0>: Emitting messages at compile time
>
> JF is planning for EWG to review these in Issaquah. Since SG16 will
> not be meeting in Issaquah, our options are limited for providing
> recommendations prior to that initial EWG review. JF has asked that we
> conduct a review via mailing list to collect input prior to EWG's
> review. SG16 will still review post-Issaquah as we usually would in
> order to provide a recommendation for EWG to consider in a future
> telecon or in Varna.
>
> Please respond to this message with your thoughts on these papers
> before Monday, February 6th, if possible; I know that is short notice.
> I will plan to prepare a summary for EWG.
>
> These papers overlap considerably. The authors have already been
> encouraged to collaborate on a joint proposal.
>
> Tom.
>
The following discusses encoding related concerns that I would like to
see further explored in these papers.

The standard does not currently specify any features that enable text
produced during translation phase 7 to be presented to a user at compile
time. Related features include static_assert(..., "text"), #error
token(s), #warning token(s), [[deprecated("text")]], and
[[nodiscard("text")]]; see P2246 (Character encoding of diagnostic text)
<https://wg21.link/p2246> as a prior paper that previously clarified
requirements on an implementation for presenting text at compile time.
All of these features operate on either preprocessor tokens or string
literals; both of which have a well-defined encoding at compile time;
the source file encoding.

The standard specifies an (implementation-defined) encoding for ordinary
character and string literals evaluated during translation phase 7; the
/ordinary literal encoding/. In general though, no requirements are
placed on text in char-based storage at run time other than as
pre-conditions and post-conditions for a few standard library functions
that consume and/or produce text in the encoding of the execution
character set (or UTF-8 for std::filesystem::u8path()). The above papers
will require text produced during constant evaluation that is passed to
the proposed interfaces to be in a well-defined encoding known at
compile time so that the compiler can convert it as necessary to the
encoding used for compile time messages. This rules out the encoding of
the execution character set since it is locale dependent and therefore
unknown at compile time.

P2741R0 assumes use of the /ordinary literal encoding/. This is a
reasonable option, but also a choice that should be intentionally made
and specified in the wording. However, this isn't the only option.

P2758R0 presents some compelling examples of actual diagnostics that
demonstrate opportunity for improvement. Barry's proposed solution is to
enable library authors to emit custom diagnostics for programmer errors
detectable at compile time. That's great! But how well does that work in
a scenario where a, for example, Chinese speaking programmer using a
compiler with localized diagnostic messages is cross-compiling a project
for an EBCDIC-based target where the ordinary literal encoding is set to
IBM-1047? If we limit these new interfaces to just the ordinary literal
encoding, we'll prevent the possibility of emitting diagnostics suited
to the user's language and locale preferences.

There are at least a couple of (non-exclusive) options we can consider:

1. We can specify these interfaces to work only with well-known
    encodings via char8_t, char16_t, and/or char32_t. Doing so would
    avoid the character encoding limitations of legacy encodings used as
    the ordinary literal encoding.
2. We can expose preferred language and locale settings so that custom
    diagnostic messages can be produced that honor them.

Tom.

Received on 2023-02-03 05:27:01