C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] P1208R3 / source_location

From: Corentin <corentin.jabot_at_[hidden]>
Date: Tue, 19 Feb 2019 14:37:09 -1000
It kinda is but the compiler can get a useful encoding from the source code
but not from the source file, in the general case.
It's mostly an issue with filesystems with no or poor encoding support.

I don't believe the observable behavior will be widely different from
__FILE__ and __func__ in practical terms



On Tue, 19 Feb 2019 at 14:31 Robert Douglas <rwdougla_at_[hidden]> wrote:

> So filename and functionname would neccessarily have different encodings?
> Does that not seem awful?
>
> On Tue, Feb 19, 2019, 6:25 PM Axel Naumann <Axel.Naumann_at_[hidden]> wrote:
>
>> Thanks everyone, this is what I'll take to Core.
>> Axel.
>>
>> On 19.02.19 13:58, Corentin wrote:
>>
>> After talking with Tom, I'd like to modify function_name to be a NTMBS as
>> it is something we can actually guarantee and I don't think __func__
>> should constrain the design of source location. It would consistent with
>> thTstatisfy the NB comment (whose resolution was adopted in that direction
>> this morning)
>>
>> Tom convinced me that filename cannot and should not be a NTMBS
>>
>>
>> On Tue, 19 Feb 2019 at 13:22 Robert Douglas <rwdougla_at_[hidden]> wrote:
>>
>>> Agree.
>>>
>>> On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann <tom_at_[hidden]> wrote:
>>>
>>>> On 2/18/19 1:17 PM, Robert Douglas wrote:
>>>>
>>>> Historical footnote, these are intended to be as drop-in as possible
>>>> for existing facilities. __FILE__ is a "character string literal," which
>>>> gets it's null termination in phase 7. Since we are accessing these at
>>>> run-time, we should thus expect these to be NTBS. Changes to this
>>>> expectation would be a deviation from these being a drop-in replacement to
>>>> __FILE__ and __func__. Note that [dcl.fct.def.general]
>>>> p 8 defines __func__ as an implementation-defined string as if static
>>>> const char __func__[] = "function-name "; which implies, also, an
>>>> NTBS. This is the reasoning for NTBS. To do otherwise, would deviate this
>>>> feature from __FILE__ and __func__, which it is designed to replace.
>>>>
>>>> Agreed. Certainly guaranteeing that these have a null terminator is
>>>> required given that file_name() returns const char*. I don't agree with
>>>> associating these with NTMBSs though since multi-byte has encoding
>>>> implications.
>>>>
>>>> Tom.
>>>>
>>>>
>>>>
>>>> On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot_at_[hidden]>
>>>> wrote:
>>>>
>>>>> Quick reply : display only, no expectation the file can be open, or
>>>>> exists, or is a file. It's purely informative. But expectation it can be
>>>>> displayed, the main use cases being logging. Otherwise I agree with you.
>>>>>
>>>>> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom_at_[hidden]> wrote:
>>>>>
>>>>>>
>>>>>> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot_at_[hidden]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Very good points.
>>>>>> Wouldn't it be sufficient to specify that the strings are NTMBS
>>>>>> encoded using the execution character set?
>>>>>>
>>>>>> source_location currently avoids making any assumption about how
>>>>>> these strings are formed, including that they are derived from a source
>>>>>> file.
>>>>>> So since the value is implementation-defined, so should be the way
>>>>>> it's constructed.
>>>>>> However, it is reasonable to assume that these things are valid text
>>>>>> and therefore have a known encoding.
>>>>>>
>>>>>> Adding Tom, because this is borderline SG16 territory.
>>>>>>
>>>>>>
>>>>>> This isn’t borderline as we have (recently) requested review of
>>>>>> anything involving file names.
>>>>>>
>>>>>>
>>>>>>
>>>>>> @Tom: Do you want to see source_location this week knowing that I'd
>>>>>> hope it would get through LWG before the end of the week?
>>>>>> Or do you think having function_name / filename as
>>>>>> multi-bytes strings encoded using the execution character set is reasonable?
>>>>>> The alternative I see are
>>>>>>
>>>>>> - Leave it unspecified
>>>>>> - Force a specific character set... which the world is not ready
>>>>>> for
>>>>>>
>>>>>> I think there is a higher level question to answer. Are the provided
>>>>>> file names display only, or should one expect to be able to open the file
>>>>>> using the provided name?
>>>>>>
>>>>>> If they are display only, then we can specify an encoding for them
>>>>>> similarly to what is done for member functions of std::filesystem::path. In
>>>>>> this case, we must explicitly acknowledge that the names do not roundtrip
>>>>>> through the filesystem (though typically will in practice). Note that, on Windows,
>>>>>> file names cannot be represented accurately using char based strings, so
>>>>>> unless we want to add wchar_t support, these names will be technically
>>>>>> display only.
>>>>>>
>>>>>> If they are potentially not display only, then we can’t associate an
>>>>>> encoding and the names are bags-of-bytes. This is a limitation of POSIX.
>>>>>> But then we need wchar_t support for Windows.
>>>>>>
>>>>>> In San Diego, the guidance we gave for the stacktrace proposal is
>>>>>> that file names are implementation defined bags-of-bytes. If we
>>>>>> advised otherwise for source location, we would be giving inconsistent
>>>>>> guidance.
>>>>>>
>>>>>> I think we should discuss this in SG16 this week. Not necessarily to
>>>>>> propose changes for the proposal, but to solidify our collective thinking
>>>>>> around file names.
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Corentin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann_at_[hidden]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Robert,
>>>>>>>
>>>>>>> Regarding your P1208R3:
>>>>>>>
>>>>>>> Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>>>>>>
>>>>>>> Not-so-nit: a NB comment on the reflection TS asks to not use NTBS
>>>>>>> but
>>>>>>> NTMBS and "Where NTBS is mentioned in the document under ballot, the
>>>>>>> encoding used for the string’s value is unspecified." Jens agrees
>>>>>>> that
>>>>>>> the proposed solution should be applied: "Specify that the strings
>>>>>>> are
>>>>>>> first formed using the basic source character set (with
>>>>>>> universal-character-names as necessary) then mapped in the manner
>>>>>>> applied to string literals with no encoding prefix in phases 5 and 6
>>>>>>> of
>>>>>>> translation."
>>>>>>>
>>>>>>> I would very much hope that both changes are also applied to
>>>>>>> P1208R3. I
>>>>>>> call this out explicitly in our recommended NB comment response
>>>>>>> paper.
>>>>>>>
>>>>>>> Cheers, Axel.
>>>>>>>
>>>>>>
>>>>
>>

Received on 2019-02-20 01:37:25