C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] P1208R3 / source_location
From: Steve Downey (sdowney_at_[hidden])
Date: 2019-02-20 12:17:09


I believe it will work as well as it does today, which unfortunately
occasionally means quite badly. Guidance is to not name your source files
outside the basic printable character set, I suppose.

On Wed, Feb 20, 2019, 03:14 Robert Douglas <rwdougla_at_[hidden]> wrote:

> Will this prevent usage of printf ("func: %s", sl.function_name ()); ?
>
> On Tue, Feb 19, 2019, 6:37 PM Corentin <corentin.jabot_at_[hidden]> wrote:
>
>> It kinda is but the compiler can get a useful encoding from the source
>> code but not from the source file, in the general case.
>> It's mostly an issue with filesystems with no or poor encoding support.
>>
>> I don't believe the observable behavior will be widely different from
>> __FILE__ and __func__ in practical terms
>>
>>
>>
>> On Tue, 19 Feb 2019 at 14:31 Robert Douglas <rwdougla_at_[hidden]> wrote:
>>
>>> So filename and functionname would neccessarily have different
>>> encodings? Does that not seem awful?
>>>
>>> On Tue, Feb 19, 2019, 6:25 PM Axel Naumann <Axel.Naumann_at_[hidden]> wrote:
>>>
>>>> Thanks everyone, this is what I'll take to Core.
>>>> Axel.
>>>>
>>>> On 19.02.19 13:58, Corentin wrote:
>>>>
>>>> After talking with Tom, I'd like to modify function_name to be a
>>>> NTMBS as it is something we can actually guarantee and I don't think
>>>> __func__ should constrain the design of source location. It would
>>>> consistent with thTstatisfy the NB comment (whose resolution was adopted in
>>>> that direction this morning)
>>>>
>>>> Tom convinced me that filename cannot and should not be a NTMBS
>>>>
>>>>
>>>> On Tue, 19 Feb 2019 at 13:22 Robert Douglas <rwdougla_at_[hidden]> wrote:
>>>>
>>>>> Agree.
>>>>>
>>>>> On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann <tom_at_[hidden]>
>>>>> wrote:
>>>>>
>>>>>> On 2/18/19 1:17 PM, Robert Douglas wrote:
>>>>>>
>>>>>> Historical footnote, these are intended to be as drop-in as possible
>>>>>> for existing facilities. __FILE__ is a "character string literal," which
>>>>>> gets it's null termination in phase 7. Since we are accessing these at
>>>>>> run-time, we should thus expect these to be NTBS. Changes to this
>>>>>> expectation would be a deviation from these being a drop-in replacement to
>>>>>> __FILE__ and __func__. Note that [dcl.fct.def.general]
>>>>>> p 8 defines __func__ as an implementation-defined string as if static
>>>>>> const char __func__[] = "function-name "; which implies, also, an
>>>>>> NTBS. This is the reasoning for NTBS. To do otherwise, would deviate this
>>>>>> feature from __FILE__ and __func__, which it is designed to replace.
>>>>>>
>>>>>> Agreed. Certainly guaranteeing that these have a null terminator is
>>>>>> required given that file_name() returns const char*. I don't agree with
>>>>>> associating these with NTMBSs though since multi-byte has encoding
>>>>>> implications.
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot_at_[hidden]>
>>>>>> wrote:
>>>>>>
>>>>>>> Quick reply : display only, no expectation the file can be open, or
>>>>>>> exists, or is a file. It's purely informative. But expectation it can be
>>>>>>> displayed, the main use cases being logging. Otherwise I agree with you.
>>>>>>>
>>>>>>> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom_at_[hidden]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot_at_[hidden]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Very good points.
>>>>>>>> Wouldn't it be sufficient to specify that the strings are NTMBS
>>>>>>>> encoded using the execution character set?
>>>>>>>>
>>>>>>>> source_location currently avoids making any assumption about how
>>>>>>>> these strings are formed, including that they are derived from a source
>>>>>>>> file.
>>>>>>>> So since the value is implementation-defined, so should be the way
>>>>>>>> it's constructed.
>>>>>>>> However, it is reasonable to assume that these things are valid
>>>>>>>> text and therefore have a known encoding.
>>>>>>>>
>>>>>>>> Adding Tom, because this is borderline SG16 territory.
>>>>>>>>
>>>>>>>>
>>>>>>>> This isn’t borderline as we have (recently) requested review of
>>>>>>>> anything involving file names.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> @Tom: Do you want to see source_location this week knowing that I'd
>>>>>>>> hope it would get through LWG before the end of the week?
>>>>>>>> Or do you think having function_name / filename as
>>>>>>>> multi-bytes strings encoded using the execution character set is reasonable?
>>>>>>>> The alternative I see are
>>>>>>>>
>>>>>>>> - Leave it unspecified
>>>>>>>> - Force a specific character set... which the world is not
>>>>>>>> ready for
>>>>>>>>
>>>>>>>> I think there is a higher level question to answer. Are the
>>>>>>>> provided file names display only, or should one expect to be able to open
>>>>>>>> the file using the provided name?
>>>>>>>>
>>>>>>>> If they are display only, then we can specify an encoding for them
>>>>>>>> similarly to what is done for member functions of std::filesystem::path. In
>>>>>>>> this case, we must explicitly acknowledge that the names do not roundtrip
>>>>>>>> through the filesystem (though typically will in practice). Note that, on Windows,
>>>>>>>> file names cannot be represented accurately using char based strings, so
>>>>>>>> unless we want to add wchar_t support, these names will be technically
>>>>>>>> display only.
>>>>>>>>
>>>>>>>> If they are potentially not display only, then we can’t associate
>>>>>>>> an encoding and the names are bags-of-bytes. This is a limitation of POSIX.
>>>>>>>> But then we need wchar_t support for Windows.
>>>>>>>>
>>>>>>>> In San Diego, the guidance we gave for the stacktrace proposal is
>>>>>>>> that file names are implementation defined bags-of-bytes. If we
>>>>>>>> advised otherwise for source location, we would be giving inconsistent
>>>>>>>> guidance.
>>>>>>>>
>>>>>>>> I think we should discuss this in SG16 this week. Not necessarily
>>>>>>>> to propose changes for the proposal, but to solidify our collective
>>>>>>>> thinking around file names.
>>>>>>>>
>>>>>>>> Tom.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Corentin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann_at_[hidden]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Robert,
>>>>>>>>>
>>>>>>>>> Regarding your P1208R3:
>>>>>>>>>
>>>>>>>>> Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>>>>>>>>
>>>>>>>>> Not-so-nit: a NB comment on the reflection TS asks to not use NTBS
>>>>>>>>> but
>>>>>>>>> NTMBS and "Where NTBS is mentioned in the document under ballot,
>>>>>>>>> the
>>>>>>>>> encoding used for the string’s value is unspecified." Jens agrees
>>>>>>>>> that
>>>>>>>>> the proposed solution should be applied: "Specify that the strings
>>>>>>>>> are
>>>>>>>>> first formed using the basic source character set (with
>>>>>>>>> universal-character-names as necessary) then mapped in the manner
>>>>>>>>> applied to string literals with no encoding prefix in phases 5 and
>>>>>>>>> 6 of
>>>>>>>>> translation."
>>>>>>>>>
>>>>>>>>> I would very much hope that both changes are also applied to
>>>>>>>>> P1208R3. I
>>>>>>>>> call this out explicitly in our recommended NB comment response
>>>>>>>>> paper.
>>>>>>>>>
>>>>>>>>> Cheers, Axel.
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>



SG16 list run by herb.sutter at gmail.com