C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] P1208R3 / source_location
From: Robert Douglas (rwdougla_at_[hidden])
Date: 2019-02-18 12:17:44


Historical footnote, these are intended to be as drop-in as possible for
existing facilities. __FILE__ is a "character string literal," which gets
it's null termination in phase 7. Since we are accessing these at run-time,
we should thus expect these to be NTBS. Changes to this expectation would
be a deviation from these being a drop-in replacement to __FILE__ and
__func__. Note that [dcl.fct.def.general]
 p 8 defines __func__ as an implementation-defined string as if static
const char __func__[] = "function-name "; which implies, also, an NTBS.
This is the reasoning for NTBS. To do otherwise, would deviate this feature
from __FILE__ and __func__, which it is designed to replace.

On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot_at_[hidden]> wrote:

> Quick reply : display only, no expectation the file can be open, or
> exists, or is a file. It's purely informative. But expectation it can be
> displayed, the main use cases being logging. Otherwise I agree with you.
>
> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom_at_[hidden]> wrote:
>
>>
>> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot_at_[hidden]> wrote:
>>
>>
>> Very good points.
>> Wouldn't it be sufficient to specify that the strings are NTMBS encoded
>> using the execution character set?
>>
>> source_location currently avoids making any assumption about how these
>> strings are formed, including that they are derived from a source file.
>> So since the value is implementation-defined, so should be the way it's
>> constructed.
>> However, it is reasonable to assume that these things are valid text and
>> therefore have a known encoding.
>>
>> Adding Tom, because this is borderline SG16 territory.
>>
>>
>> This isn’t borderline as we have (recently) requested review of anything
>> involving file names.
>>
>>
>>
>> @Tom: Do you want to see source_location this week knowing that I'd hope
>> it would get through LWG before the end of the week?
>> Or do you think having function_name / filename as multi-bytes strings
>> encoded using the execution character set is reasonable?
>> The alternative I see are
>>
>> - Leave it unspecified
>> - Force a specific character set... which the world is not ready for
>>
>> I think there is a higher level question to answer. Are the provided file
>> names display only, or should one expect to be able to open the file using
>> the provided name?
>>
>> If they are display only, then we can specify an encoding for them
>> similarly to what is done for member functions of std::filesystem::path. In
>> this case, we must explicitly acknowledge that the names do not roundtrip
>> through the filesystem (though typically will in practice). Note that, on Windows,
>> file names cannot be represented accurately using char based strings, so
>> unless we want to add wchar_t support, these names will be technically
>> display only.
>>
>> If they are potentially not display only, then we can’t associate an
>> encoding and the names are bags-of-bytes. This is a limitation of POSIX.
>> But then we need wchar_t support for Windows.
>>
>> In San Diego, the guidance we gave for the stacktrace proposal is that
>> file names are implementation defined bags-of-bytes. If we advised
>> otherwise for source location, we would be giving inconsistent guidance.
>>
>> I think we should discuss this in SG16 this week. Not necessarily to
>> propose changes for the proposal, but to solidify our collective thinking
>> around file names.
>>
>> Tom.
>>
>>
>> Thanks,
>> Corentin
>>
>>
>>
>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann_at_[hidden]> wrote:
>>
>>> Hi Robert,
>>>
>>> Regarding your P1208R3:
>>>
>>> Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>>
>>> Not-so-nit: a NB comment on the reflection TS asks to not use NTBS but
>>> NTMBS and "Where NTBS is mentioned in the document under ballot, the
>>> encoding used for the string’s value is unspecified." Jens agrees that
>>> the proposed solution should be applied: "Specify that the strings are
>>> first formed using the basic source character set (with
>>> universal-character-names as necessary) then mapped in the manner
>>> applied to string literals with no encoding prefix in phases 5 and 6 of
>>> translation."
>>>
>>> I would very much hope that both changes are also applied to P1208R3. I
>>> call this out explicitly in our recommended NB comment response paper.
>>>
>>> Cheers, Axel.
>>>
>>



SG16 list run by herb.sutter at gmail.com