sg16: Re: [SG16] String literals and diagnostics

From: Corentin <corentin.jabot_at_[hidden]>
Date: Mon, 25 Jan 2021 23:13:52 +0100

On Mon, Jan 25, 2021 at 10:35 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 25/01/2021 14.01, Corentin via SG16 wrote:
> > So the question really is: is there an intermediate step wherein the
> string is converted to the execution encoding in phase 5?
> > There is currently nothing in the standard that says that does not
> happen, all string-literal presumably go through phase 5.
>
> Yes. So any change in this area probably needs EWG input.
>
> > And so, the status-quo leads to implementation divergence such that a
> fix is needed: GCC does the useful thing while MSVC/ICC do the standard
> conforming thing https://godbolt.org/z/MEsbY5 <
> https://godbolt.org/z/MEsbY5>
>
> > We also need to consider possible evolutions of the language, notably
> > * diagnostic or compiler output constructed from constant expressions at
> compile time wg21.link/p0596r1
> > * reflection on attributes https://wg21.link/p1887r1 <
> https://wg21.link/p1887r1>
> > * attribute using constant expressions parameters, although I don't know
> if that has been proposed
> >
> > so, we can imagine something like
> >
> > static_assert(false, std::format(...)); which would be neat indeed.
> > At this point, we would be very much past phase 5 and it becomes
> critical to have a good model indeed.
>
> The model would be to perform the transcoding to execution character set
> when a (runtime) object for the string (literal) is created by the
> compiler.
> This is the same moment when we turn the memory of a compile-time
> std::vector<T>
> into a runtime data structure.
> It might be hard to differentiate a compile-time string in a std::string
> from some compile-time bytes that happen to exist in a std::vector<char>,
> though. In order to make this right, we probably need some machinery
> to say "here comes a string".
>

I think this would be a terrible idea because it's observable, ie that
function would return widely different result depending on the execution
encoding:

constexpr int count_codepoints(std::string_view);

>
> > * Redefine deprecated, nodiscard, static_assert, etc to take a new
> grammar , say "diagnostic-string-literal", which would follow all the rules
> of string literals (concatenation, escape sequence and so forth), but would
> NOT be converted to the execution encoding at any point. Note that this
> does not introduce a new encoding, things stay utf-8.
> > * In the future, static_assert and attributes can accept other forms
> which would take constant expressions of u8string_view (or so I hope, see
> wg21.link/p1953r0). Because all of these things require compiler support
> anyway, parsing has no ambiguity)
> > * In this model, reflecting on [[deprecated("foo")]] would give a utf8
> string back, because we decided to make these strings magic for
> convenience and backward compact
>
> Sounds about right, minus the "UTF-8" parts, which are private parts of
> the compiler
> not specified by the standard.
>

The encoding of the strings returned by reflection would have to be
specified - the compiler might have to do some conversion from its
representation

>
> > I'm planning to put all of that in a paper but I would like to hear your
> thoughts before doing so.
>
> Since you need phase 7 context to know when to transcode and when not,
> some of the phase 5+6
> machinery probably needs to move to phase 7.
>

Right, we can't actually distinguish the context in phase 5+6 yet.
Gosh, for some reason I didn't identify this issue.

If at phase 7 we want string literals except in specific contexts... does
that mean that the wording would have to operate some sort of reversal?
Moving phase 5 after 7 seems like major surgery, especially as we
established that concatenation and encoding are certainly best left in the
same step.
Wouldn't it require to identify all the places where a string-literal may
appear, which is probably quite a few?

>
> Jens
>

Received on 2021-01-25 16:14:05