C++ Logo

SG16

Advanced search

Subject: Re: String literals and diagnostics
From: Steve Downey (sdowney_at_[hidden])
Date: 2021-01-25 17:28:14


To be clear, it's OK if we can't mandate that, but it would be nice if the
standard was clear that doing something awful wasn't required.

On Mon, Jan 25, 2021, 18:26 Steve Downey <sdowney_at_[hidden]> wrote:

> User Story: As a compiler and library user, I would like to be able to
> stuff the text of the message that arrived in an email from the CI build
> into opengrok and find the source code that emitted it.
> The equivalent case of building within emacs/vscode/vi is largely solved
> already.
>
> On Mon, Jan 25, 2021, 17:46 Corentin via SG16 <sg16_at_[hidden]>
> wrote:
>
>>
>>
>> On Mon, Jan 25, 2021 at 11:13 PM Corentin <corentin.jabot_at_[hidden]>
>> wrote:
>>
>>>
>>>
>>> On Mon, Jan 25, 2021 at 10:35 PM Jens Maurer <Jens.Maurer_at_[hidden]>
>>> wrote:
>>>
>>>> On 25/01/2021 14.01, Corentin via SG16 wrote:
>>>> > So the question really is: is there an intermediate step wherein the
>>>> string is converted to the execution encoding in phase 5?
>>>> > There is currently nothing in the standard that says that does not
>>>> happen, all string-literal presumably go through phase 5.
>>>>
>>>> Yes. So any change in this area probably needs EWG input.
>>>>
>>>> > And so, the status-quo leads to implementation divergence such that a
>>>> fix is needed: GCC does the useful thing while MSVC/ICC do the standard
>>>> conforming thing https://godbolt.org/z/MEsbY5 <
>>>> https://godbolt.org/z/MEsbY5>
>>>>
>>>> > We also need to consider possible evolutions of the language, notably
>>>> > * diagnostic or compiler output constructed from constant expressions
>>>> at compile time wg21.link/p0596r1
>>>> > * reflection on attributes https://wg21.link/p1887r1 <
>>>> https://wg21.link/p1887r1>
>>>> > * attribute using constant expressions parameters, although I don't
>>>> know if that has been proposed
>>>> >
>>>> > so, we can imagine something like
>>>> >
>>>> > static_assert(false, std::format(...)); which would be neat indeed.
>>>> > At this point, we would be very much past phase 5 and it becomes
>>>> critical to have a good model indeed.
>>>>
>>>> The model would be to perform the transcoding to execution character set
>>>> when a (runtime) object for the string (literal) is created by the
>>>> compiler.
>>>> This is the same moment when we turn the memory of a compile-time
>>>> std::vector<T>
>>>> into a runtime data structure.
>>>> It might be hard to differentiate a compile-time string in a std::string
>>>> from some compile-time bytes that happen to exist in a
>>>> std::vector<char>,
>>>> though. In order to make this right, we probably need some machinery
>>>> to say "here comes a string".
>>>>
>>>
>>> I think this would be a terrible idea because it's observable, ie that
>>> function would return widely different result depending on the execution
>>> encoding:
>>>
>>> constexpr int count_codepoints(std::string_view);
>>>
>>>>
>>>> > * Redefine deprecated, nodiscard, static_assert, etc to take a new
>>>> grammar , say "diagnostic-string-literal", which would follow all the rules
>>>> of string literals (concatenation, escape sequence and so forth), but would
>>>> NOT be converted to the execution encoding at any point. Note that this
>>>> does not introduce a new encoding, things stay utf-8.
>>>> > * In the future, static_assert and attributes can accept other
>>>> forms which would take constant expressions of u8string_view (or so I hope,
>>>> see wg21.link/p1953r0). Because all of these things require compiler
>>>> support anyway, parsing has no ambiguity)
>>>> > * In this model, reflecting on [[deprecated("foo")]] would give a
>>>> utf8 string back, because we decided to make these strings magic for
>>>> convenience and backward compact
>>>>
>>>> Sounds about right, minus the "UTF-8" parts, which are private parts of
>>>> the compiler
>>>> not specified by the standard.
>>>>
>>>
>>> The encoding of the strings returned by reflection would have to be
>>> specified - the compiler might have to do some conversion from its
>>> representation
>>>
>>>>
>>>> > I'm planning to put all of that in a paper but I would like to hear
>>>> your thoughts before doing so.
>>>>
>>>> Since you need phase 7 context to know when to transcode and when not,
>>>> some of the phase 5+6
>>>> machinery probably needs to move to phase 7.
>>>>
>>>
>>> Right, we can't actually distinguish the context in phase 5+6 yet.
>>> Gosh, for some reason I didn't identify this issue.
>>>
>>> If at phase 7 we want string literals except in specific contexts...
>>> does that mean that the wording would have to operate some sort of reversal?
>>> Moving phase 5 after 7 seems like major surgery, especially as we
>>> established that concatenation and encoding are certainly best left in the
>>> same step.
>>> Wouldn't it require to identify all the places where a string-literal
>>> may appear, which is probably quite a few?
>>>
>>
>> Actually....
>> It seems (you will certainly correct me If i am wrong) that
>> string-literal only need to be converted to the execution encoding in
>> places where a (primary-)expression is expected.
>> Neither the preprocessor, nor user defined literals, nor
>> linkage specification, nor assembler statements, nor import names should go
>> through phase 5
>> It would seem then than there are only one place where encoding to
>> execution encoding should happen so maybe there would be a way to move part
>> of phase 5 to [expr.prim.literal], but then we are back to the
>> issue of how to separate hex escape sequence replacement from encoding.
>> I guess we could replace \x42 by \x{42} or something like that
>>
>>
>>
>>>
>>>
>>>
>>>>
>>>> Jens
>>>>
>>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>



SG16 list run by sg16-owner@lists.isocpp.org