C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] P1208R3 / source_location
From: Axel Naumann (Axel.Naumann_at_[hidden])
Date: 2019-02-19 18:34:26


Hi,

I believe this awfulness reflects reality.

Use ASCII printable characters and all will be fine? :)

Axel.

On 19.02.19 14:31, Robert Douglas wrote:
> So filename and functionname would neccessarily have different
> encodings? Does that not seem awful?
>
> On Tue, Feb 19, 2019, 6:25 PM Axel Naumann <Axel.Naumann_at_[hidden]
> <mailto:Axel.Naumann_at_[hidden]>> wrote:
>
> Thanks everyone, this is what I'll take to Core.
> Axel.
>
> On 19.02.19 13:58, Corentin wrote:
>> After talking with Tom, I'd like to modify function_name to be a
>> NTMBS as it is something we can actually guarantee and I don't
>> think __func__ should constrain the design of source location. It
>> would consistent with thTstatisfy the NB comment (whose
>> resolution was adopted in that direction this morning)
>>
>> Tom convinced me that filename cannot and should not be a NTMBS
>>
>>
>> On Tue, 19 Feb 2019 at 13:22 Robert Douglas <rwdougla_at_[hidden]
>> <mailto:rwdougla_at_[hidden]>> wrote:
>>
>> Agree.
>>
>> On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann
>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>
>> On 2/18/19 1:17 PM, Robert Douglas wrote:
>>> Historical footnote, these are intended to be as drop-in
>>> as possible for existing facilities. __FILE__ is a
>>> "character string literal," which gets it's null
>>> termination in phase 7. Since we are accessing these at
>>> run-time, we should thus expect these to be NTBS.
>>> Changes to this expectation would be a deviation from
>>> these being a drop-in replacement to __FILE__ and
>>> __func__. Note that [dcl.fct.def.general]
>>>  p 8 defines __func__ as an implementation-defined
>>> string as if static const char __func__[] =
>>> "function-name "; which implies, also, an NTBS. This is
>>> the reasoning for NTBS. To do otherwise, would deviate
>>> this feature from __FILE__ and __func__, which it is
>>> designed to replace.
>>
>> Agreed.  Certainly guaranteeing that these have a null
>> terminator is required given that file_name() returns
>> const char*.  I don't agree with associating these with
>> NTMBSs though since multi-byte has encoding implications.
>>
>> Tom.
>>
>>>
>>>
>>> On Mon, Feb 18, 2019 at 11:20 AM Corentin
>>> <corentin.jabot_at_[hidden]
>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>
>>> Quick reply : display only, no expectation the file
>>> can be open, or exists, or is a file. It's purely
>>> informative. But expectation it can be displayed,
>>> the main use cases being logging. Otherwise I agree
>>> with you.
>>>
>>> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann
>>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>>
>>>
>>> On Feb 18, 2019, at 10:04 AM, Corentin
>>> <corentin.jabot_at_[hidden]
>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>
>>>>
>>>> Very good points. 
>>>> Wouldn't it be sufficient to specify that the
>>>> strings are NTMBS encoded using the execution
>>>> character set?
>>>> source_location currently avoids making any
>>>> assumption about how these strings are formed,
>>>> including that they are derived from a source file.
>>>> So since the value is implementation-defined,
>>>> so should be the way it's constructed. 
>>>> However, it is reasonable to assume that these
>>>> things are valid text and therefore have a
>>>> known encoding.
>>>>
>>>> Adding Tom, because this is borderline SG16
>>>> territory. 
>>>
>>> This isn’t borderline as we have (recently)
>>> requested review of anything involving file names. 
>>>
>>>>
>>>>
>>>> @Tom: Do you want to see source_location this
>>>> week knowing that I'd hope it would get through
>>>> LWG before the end of the week?
>>>> Or do you think having function_name / filename
>>>> as multi-bytes strings encoded using the
>>>> execution character set is reasonable?
>>>> The alternative I see are
>>>>
>>>> * Leave it unspecified
>>>> * Force a specific character set... which the
>>>> world is not ready for
>>>>
>>> I think there is a higher level question to
>>> answer. Are the provided file names display
>>> only, or should one expect to be able to open
>>> the file using the provided name?
>>>
>>> If they are display only, then we can specify an
>>> encoding for them similarly to what is done for
>>> member functions of std::filesystem::path. In
>>> this case, we must explicitly acknowledge that
>>> the names do not roundtrip through the
>>> filesystem (though typically will in practice).
>>> Note that, on Windows, file names cannot be
>>> represented accurately using char based strings,
>>> so unless we want to add wchar_t support, these
>>> names will be technically display only. 
>>>
>>> If they are potentially not display only, then
>>> we can’t associate an encoding and the names are
>>> bags-of-bytes. This is a limitation of POSIX.
>>> But then we need wchar_t support for Windows. 
>>>
>>> In San Diego, the guidance we gave for the
>>> stacktrace proposal is that file names are
>>>  implementation defined bags-of-bytes. If we
>>> advised otherwise for source location, we would
>>> be giving inconsistent guidance. 
>>>
>>> I think we should discuss this in SG16 this
>>> week. Not necessarily to propose changes for the
>>> proposal, but to solidify our collective
>>> thinking around file names. 
>>>
>>> Tom. 
>>>>
>>>> Thanks, 
>>>> Corentin
>>>>
>>>>
>>>>
>>>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann
>>>> <Axel.Naumann_at_[hidden]
>>>> <mailto:Axel.Naumann_at_[hidden]>> wrote:
>>>>
>>>> Hi Robert,
>>>>
>>>> Regarding your P1208R3:
>>>>
>>>> Nit: it's titled "D1208R3", it doesn't
>>>> mention email addresses.
>>>>
>>>> Not-so-nit: a NB comment on the reflection
>>>> TS asks to not use NTBS but
>>>> NTMBS and "Where NTBS is mentioned in the
>>>> document under ballot, the
>>>> encoding used for the string’s value is
>>>> unspecified." Jens agrees that
>>>> the proposed solution should be applied:
>>>> "Specify that the strings are
>>>> first formed using the basic source
>>>> character set (with
>>>> universal-character-names as necessary)
>>>> then mapped in the manner
>>>> applied to string literals with no encoding
>>>> prefix in phases 5 and 6 of
>>>> translation."
>>>>
>>>> I would very much hope that both changes
>>>> are also applied to P1208R3. I
>>>> call this out explicitly in our recommended
>>>> NB comment response paper.
>>>>
>>>> Cheers, Axel.
>>>>
>>
>



SG16 list run by herb.sutter at gmail.com