On 2/3/23 12:16 AM, Tom Honermann via SG16 wrote:

SG16 currently has two papers in its queue that seek to allow code evaluated in constant expression context to produce text to be displayed to a (compiler) user at compile time.

P2741R0: user-generated static_assert messages

P2758R0: Emitting messages at compile time

JF is planning for EWG to review these in Issaquah. Since SG16 will not be meeting in Issaquah, our options are limited for providing recommendations prior to that initial EWG review. JF has asked that we conduct a review via mailing list to collect input prior to EWG's review. SG16 will still review post-Issaquah as we usually would in order to provide a recommendation for EWG to consider in a future telecon or in Varna.

Please respond to this message with your thoughts on these papers before Monday, February 6th, if possible; I know that is short notice. I will plan to prepare a summary for EWG.

These papers overlap considerably. The authors have already been encouraged to collaborate on a joint proposal.

Tom.

The following discusses encoding related concerns that I would like to see further explored in these papers.

The standard does not currently specify any features that enable text produced during translation phase 7 to be presented to a user at compile time. Related features include static_assert(..., "text"), #error token(s), #warning token(s), [[deprecated("text")]], and [[nodiscard("text")]]; see P2246 (Character encoding of diagnostic text) as a prior paper that previously clarified requirements on an implementation for presenting text at compile time. All of these features operate on either preprocessor tokens or string literals; both of which have a well-defined encoding at compile time; the source file encoding.

The standard specifies an (implementation-defined) encoding for ordinary character and string literals evaluated during translation phase 7; the ordinary literal encoding. In general though, no requirements are placed on text in char-based storage at run time other than as pre-conditions and post-conditions for a few standard library functions that consume and/or produce text in the encoding of the execution character set (or UTF-8 for std::filesystem::u8path()). The above papers will require text produced during constant evaluation that is passed to the proposed interfaces to be in a well-defined encoding known at compile time so that the compiler can convert it as necessary to the encoding used for compile time messages. This rules out the encoding of the execution character set since it is locale dependent and therefore unknown at compile time.

P2741R0 assumes use of the ordinary literal encoding. This is a reasonable option, but also a choice that should be intentionally made and specified in the wording. However, this isn't the only option.

P2758R0 presents some compelling examples of actual diagnostics that demonstrate opportunity for improvement. Barry's proposed solution is to enable library authors to emit custom diagnostics for programmer errors detectable at compile time. That's great! But how well does that work in a scenario where a, for example, Chinese speaking programmer using a compiler with localized diagnostic messages is cross-compiling a project for an EBCDIC-based target where the ordinary literal encoding is set to IBM-1047? If we limit these new interfaces to just the ordinary literal encoding, we'll prevent the possibility of emitting diagnostics suited to the user's language and locale preferences.

There are at least a couple of (non-exclusive) options we can consider:

We can specify these interfaces to work only with well-known encodings via char8_t, char16_t, and/or char32_t. Doing so would avoid the character encoding limitations of legacy encodings used as the ordinary literal encoding.
We can expose preferred language and locale settings so that custom diagnostic messages can be produced that honor them.

Tom.