C++ Logo


Advanced search

Subject: Re: [SG16-Unicode] P1208R3 / source_location
From: Tom Honermann (tom_at_[hidden])
Date: 2019-02-19 17:17:07

On 2/18/19 1:17 PM, Robert Douglas wrote:
> Historical footnote, these are intended to be as drop-in as possible
> for existing facilities. __FILE__ is a "character string literal,"
> which gets it's null termination in phase 7. Since we are accessing
> these at run-time, we should thus expect these to be NTBS. Changes to
> this expectation would be a deviation from these being a drop-in
> replacement to __FILE__ and __func__. Note that [dcl.fct.def.general]
>  p 8 defines __func__ as an implementation-defined string as if static
> const char __func__[] = "function-name "; which implies, also, an
> NTBS. This is the reasoning for NTBS. To do otherwise, would deviate
> this feature from __FILE__ and __func__, which it is designed to replace.

Agreed.  Certainly guaranteeing that these have a null terminator is
required given that file_name() returns const char*.  I don't agree with
associating these with NTMBSs though since multi-byte has encoding


> On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot_at_[hidden]
> <mailto:corentin.jabot_at_[hidden]>> wrote:
> Quick reply : display only, no expectation the file can be open,
> or exists, or is a file. It's purely informative. But expectation
> it can be displayed, the main use cases being logging. Otherwise I
> agree with you.
> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> On Feb 18, 2019, at 10:04 AM, Corentin
> <corentin.jabot_at_[hidden] <mailto:corentin.jabot_at_[hidden]>>
> wrote:
>> Very good points.
>> Wouldn't it be sufficient to specify that the strings are
>> NTMBS encoded using the execution character set?
>> source_location currently avoids making any assumption about
>> how these strings are formed, including that they are derived
>> from a source file.
>> So since the value is implementation-defined, so should be
>> the way it's constructed.
>> However, it is reasonable to assume that these things are
>> valid text and therefore have a known encoding.
>> Adding Tom, because this is borderline SG16 territory.
> This isn’t borderline as we have (recently) requested review
> of anything involving file names.
>> @Tom: Do you want to see source_location this week knowing
>> that I'd hope it would get through LWG before the end of the
>> week?
>> Or do you think having function_name / filename as
>> multi-bytes strings encoded using the execution character set
>> is reasonable?
>> The alternative I see are
>> * Leave it unspecified
>> * Force a specific character set... which the world is not
>> ready for
> I think there is a higher level question to answer. Are the
> provided file names display only, or should one expect to be
> able to open the file using the provided name?
> If they are display only, then we can specify an encoding for
> them similarly to what is done for member functions of
> std::filesystem::path. In this case, we must explicitly
> acknowledge that the names do not roundtrip through the
> filesystem (though typically will in practice). Note that, on
> Windows, file names cannot be represented accurately using
> char based strings, so unless we want to add wchar_t support,
> these names will be technically display only.
> If they are potentially not display only, then we can’t
> associate an encoding and the names are bags-of-bytes. This is
> a limitation of POSIX. But then we need wchar_t support for
> Windows.
> In San Diego, the guidance we gave for the stacktrace proposal
> is that file names are  implementation defined bags-of-bytes.
> If we advised otherwise for source location, we would be
> giving inconsistent guidance.
> I think we should discuss this in SG16 this week. Not
> necessarily to propose changes for the proposal, but to
> solidify our collective thinking around file names.
> Tom.
>> Thanks,
>> Corentin
>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann
>> <Axel.Naumann_at_[hidden] <mailto:Axel.Naumann_at_[hidden]>> wrote:
>> Hi Robert,
>> Regarding your P1208R3:
>> Nit: it's titled "D1208R3", it doesn't mention email
>> addresses.
>> Not-so-nit: a NB comment on the reflection TS asks to not
>> use NTBS but
>> NTMBS and "Where NTBS is mentioned in the document under
>> ballot, the
>> encoding used for the string’s value is unspecified."
>> Jens agrees that
>> the proposed solution should be applied: "Specify that
>> the strings are
>> first formed using the basic source character set (with
>> universal-character-names as necessary) then mapped in
>> the manner
>> applied to string literals with no encoding prefix in
>> phases 5 and 6 of
>> translation."
>> I would very much hope that both changes are also applied
>> to P1208R3. I
>> call this out explicitly in our recommended NB comment
>> response paper.
>> Cheers, Axel.

SG16 list run by herb.sutter at gmail.com