C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] P1208R3 / source_location
From: Steve Downey (sdowney_at_[hidden])
Date: 2019-02-19 18:35:32


They come from different sources. Function name is in the source encoding,
and the compiler knows that encoding. The file name is in the host
environment, and we don't know, and in some cases can't know, what that is.
Display names may not work for opening a file again.

On Tue, Feb 19, 2019, 14:31 Robert Douglas <rwdougla_at_[hidden]> wrote:

> So filename and functionname would neccessarily have different encodings?
> Does that not seem awful?
>
> On Tue, Feb 19, 2019, 6:25 PM Axel Naumann <Axel.Naumann_at_[hidden]> wrote:
>
>> Thanks everyone, this is what I'll take to Core.
>> Axel.
>>
>> On 19.02.19 13:58, Corentin wrote:
>>
>> After talking with Tom, I'd like to modify function_name to be a NTMBS as
>> it is something we can actually guarantee and I don't think __func__
>> should constrain the design of source location. It would consistent with
>> thTstatisfy the NB comment (whose resolution was adopted in that direction
>> this morning)
>>
>> Tom convinced me that filename cannot and should not be a NTMBS
>>
>>
>> On Tue, 19 Feb 2019 at 13:22 Robert Douglas <rwdougla_at_[hidden]> wrote:
>>
>>> Agree.
>>>
>>> On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann <tom_at_[hidden]> wrote:
>>>
>>>> On 2/18/19 1:17 PM, Robert Douglas wrote:
>>>>
>>>> Historical footnote, these are intended to be as drop-in as possible
>>>> for existing facilities. __FILE__ is a "character string literal," which
>>>> gets it's null termination in phase 7. Since we are accessing these at
>>>> run-time, we should thus expect these to be NTBS. Changes to this
>>>> expectation would be a deviation from these being a drop-in replacement to
>>>> __FILE__ and __func__. Note that [dcl.fct.def.general]
>>>> p 8 defines __func__ as an implementation-defined string as if static
>>>> const char __func__[] = "function-name "; which implies, also, an
>>>> NTBS. This is the reasoning for NTBS. To do otherwise, would deviate this
>>>> feature from __FILE__ and __func__, which it is designed to replace.
>>>>
>>>> Agreed. Certainly guaranteeing that these have a null terminator is
>>>> required given that file_name() returns const char*. I don't agree with
>>>> associating these with NTMBSs though since multi-byte has encoding
>>>> implications.
>>>>
>>>> Tom.
>>>>
>>>>
>>>>
>>>> On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot_at_[hidden]>
>>>> wrote:
>>>>
>>>>> Quick reply : display only, no expectation the file can be open, or
>>>>> exists, or is a file. It's purely informative. But expectation it can be
>>>>> displayed, the main use cases being logging. Otherwise I agree with you.
>>>>>
>>>>> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom_at_[hidden]> wrote:
>>>>>
>>>>>>
>>>>>> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot_at_[hidden]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Very good points.
>>>>>> Wouldn't it be sufficient to specify that the strings are NTMBS
>>>>>> encoded using the execution character set?
>>>>>>
>>>>>> source_location currently avoids making any assumption about how
>>>>>> these strings are formed, including that they are derived from a source
>>>>>> file.
>>>>>> So since the value is implementation-defined, so should be the way
>>>>>> it's constructed.
>>>>>> However, it is reasonable to assume that these things are valid text
>>>>>> and therefore have a known encoding.
>>>>>>
>>>>>> Adding Tom, because this is borderline SG16 territory.
>>>>>>
>>>>>>
>>>>>> This isn’t borderline as we have (recently) requested review of
>>>>>> anything involving file names.
>>>>>>
>>>>>>
>>>>>>
>>>>>> @Tom: Do you want to see source_location this week knowing that I'd
>>>>>> hope it would get through LWG before the end of the week?
>>>>>> Or do you think having function_name / filename as
>>>>>> multi-bytes strings encoded using the execution character set is reasonable?
>>>>>> The alternative I see are
>>>>>>
>>>>>> - Leave it unspecified
>>>>>> - Force a specific character set... which the world is not ready
>>>>>> for
>>>>>>
>>>>>> I think there is a higher level question to answer. Are the provided
>>>>>> file names display only, or should one expect to be able to open the file
>>>>>> using the provided name?
>>>>>>
>>>>>> If they are display only, then we can specify an encoding for them
>>>>>> similarly to what is done for member functions of std::filesystem::path. In
>>>>>> this case, we must explicitly acknowledge that the names do not roundtrip
>>>>>> through the filesystem (though typically will in practice). Note that, on Windows,
>>>>>> file names cannot be represented accurately using char based strings, so
>>>>>> unless we want to add wchar_t support, these names will be technically
>>>>>> display only.
>>>>>>
>>>>>> If they are potentially not display only, then we can’t associate an
>>>>>> encoding and the names are bags-of-bytes. This is a limitation of POSIX.
>>>>>> But then we need wchar_t support for Windows.
>>>>>>
>>>>>> In San Diego, the guidance we gave for the stacktrace proposal is
>>>>>> that file names are implementation defined bags-of-bytes. If we
>>>>>> advised otherwise for source location, we would be giving inconsistent
>>>>>> guidance.
>>>>>>
>>>>>> I think we should discuss this in SG16 this week. Not necessarily to
>>>>>> propose changes for the proposal, but to solidify our collective thinking
>>>>>> around file names.
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Corentin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann_at_[hidden]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Robert,
>>>>>>>
>>>>>>> Regarding your P1208R3:
>>>>>>>
>>>>>>> Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>>>>>>
>>>>>>> Not-so-nit: a NB comment on the reflection TS asks to not use NTBS
>>>>>>> but
>>>>>>> NTMBS and "Where NTBS is mentioned in the document under ballot, the
>>>>>>> encoding used for the string’s value is unspecified." Jens agrees
>>>>>>> that
>>>>>>> the proposed solution should be applied: "Specify that the strings
>>>>>>> are
>>>>>>> first formed using the basic source character set (with
>>>>>>> universal-character-names as necessary) then mapped in the manner
>>>>>>> applied to string literals with no encoding prefix in phases 5 and 6
>>>>>>> of
>>>>>>> translation."
>>>>>>>
>>>>>>> I would very much hope that both changes are also applied to
>>>>>>> P1208R3. I
>>>>>>> call this out explicitly in our recommended NB comment response
>>>>>>> paper.
>>>>>>>
>>>>>>> Cheers, Axel.
>>>>>>>
>>>>>>
>>>>
>> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>



SG16 list run by herb.sutter at gmail.com