sg16: Re: [SG16] String literals and diagnostics

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Mon, 25 Jan 2021 22:34:59 +0100

On 25/01/2021 14.01, Corentin via SG16 wrote:
> So the question really is: is there an intermediate step wherein the string is converted to the execution encoding in phase 5?
> There is currently nothing in the standard that says that does not happen, all string-literal presumably go through phase 5.

Yes. So any change in this area probably needs EWG input.

> And so, the status-quo leads to implementation divergence such that a fix is needed: GCC does the useful thing while MSVC/ICC do the standard conforming thing https://godbolt.org/z/MEsbY5 <https://godbolt.org/z/MEsbY5>

> We also need to consider possible evolutions of the language, notably
> * diagnostic or compiler output constructed from constant expressions at compile time wg21.link/p0596r1
> * reflection on attributes https://wg21.link/p1887r1 <https://wg21.link/p1887r1>
> * attribute using constant expressions parameters, although I don't know if that has been proposed
>
> so, we can imagine something like
>
> static_assert(false, std::format(...)); which would be neat indeed.
> At this point, we would be very much past phase 5 and it becomes critical to have a good model indeed.

The model would be to perform the transcoding to execution character set
when a (runtime) object for the string (literal) is created by the compiler.
This is the same moment when we turn the memory of a compile-time std::vector<T>
into a runtime data structure.
It might be hard to differentiate a compile-time string in a std::string
from some compile-time bytes that happen to exist in a std::vector<char>,
though. In order to make this right, we probably need some machinery
to say "here comes a string".

> * Redefine deprecated, nodiscard, static_assert, etc to take a new grammar , say "diagnostic-string-literal", which would follow all the rules of string literals (concatenation, escape sequence and so forth), but would NOT be converted to the execution encoding at any point. Note that this does not introduce a new encoding, things stay utf-8.
> * In the future, static_assert and attributes can accept other forms which would take constant expressions of u8string_view (or so I hope, see wg21.link/p1953r0). Because all of these things require compiler support anyway, parsing has no ambiguity)
> * In this model, reflecting on [[deprecated("foo")]] would give a utf8 string back, because we decided to make these strings magic for convenience and backward compact

Sounds about right, minus the "UTF-8" parts, which are private parts of the compiler
not specified by the standard.

> I'm planning to put all of that in a paper but I would like to hear your thoughts before doing so.

Since you need phase 7 context to know when to transcode and when not, some of the phase 5+6
machinery probably needs to move to phase 7.

Jens

Received on 2021-01-25 15:35:11