C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] P1208R3 / source_location

From: Robert Douglas <rwdougla_at_[hidden]>
Date: Tue, 19 Feb 2019 17:22:09 -0600
Agree.

On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 2/18/19 1:17 PM, Robert Douglas wrote:
>
> Historical footnote, these are intended to be as drop-in as possible for
> existing facilities. __FILE__ is a "character string literal," which gets
> it's null termination in phase 7. Since we are accessing these at run-time,
> we should thus expect these to be NTBS. Changes to this expectation would
> be a deviation from these being a drop-in replacement to __FILE__ and
> __func__. Note that [dcl.fct.def.general]
> p 8 defines __func__ as an implementation-defined string as if static
> const char __func__[] = "function-name "; which implies, also, an NTBS.
> This is the reasoning for NTBS. To do otherwise, would deviate this feature
> from __FILE__ and __func__, which it is designed to replace.
>
> Agreed. Certainly guaranteeing that these have a null terminator is
> required given that file_name() returns const char*. I don't agree with
> associating these with NTMBSs though since multi-byte has encoding
> implications.
>
> Tom.
>
>
>
> On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot_at_[hidden]>
> wrote:
>
>> Quick reply : display only, no expectation the file can be open, or
>> exists, or is a file. It's purely informative. But expectation it can be
>> displayed, the main use cases being logging. Otherwise I agree with you.
>>
>> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom_at_[hidden]> wrote:
>>
>>>
>>> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot_at_[hidden]> wrote:
>>>
>>>
>>> Very good points.
>>> Wouldn't it be sufficient to specify that the strings are NTMBS encoded
>>> using the execution character set?
>>>
>>> source_location currently avoids making any assumption about how these
>>> strings are formed, including that they are derived from a source file.
>>> So since the value is implementation-defined, so should be the way it's
>>> constructed.
>>> However, it is reasonable to assume that these things are valid text and
>>> therefore have a known encoding.
>>>
>>> Adding Tom, because this is borderline SG16 territory.
>>>
>>>
>>> This isn’t borderline as we have (recently) requested review of anything
>>> involving file names.
>>>
>>>
>>>
>>> @Tom: Do you want to see source_location this week knowing that I'd hope
>>> it would get through LWG before the end of the week?
>>> Or do you think having function_name / filename as multi-bytes strings
>>> encoded using the execution character set is reasonable?
>>> The alternative I see are
>>>
>>> - Leave it unspecified
>>> - Force a specific character set... which the world is not ready for
>>>
>>> I think there is a higher level question to answer. Are the provided
>>> file names display only, or should one expect to be able to open the file
>>> using the provided name?
>>>
>>> If they are display only, then we can specify an encoding for them
>>> similarly to what is done for member functions of std::filesystem::path. In
>>> this case, we must explicitly acknowledge that the names do not roundtrip
>>> through the filesystem (though typically will in practice). Note that, on Windows,
>>> file names cannot be represented accurately using char based strings, so
>>> unless we want to add wchar_t support, these names will be technically
>>> display only.
>>>
>>> If they are potentially not display only, then we can’t associate an
>>> encoding and the names are bags-of-bytes. This is a limitation of POSIX.
>>> But then we need wchar_t support for Windows.
>>>
>>> In San Diego, the guidance we gave for the stacktrace proposal is that
>>> file names are implementation defined bags-of-bytes. If we advised
>>> otherwise for source location, we would be giving inconsistent guidance.
>>>
>>> I think we should discuss this in SG16 this week. Not necessarily to
>>> propose changes for the proposal, but to solidify our collective thinking
>>> around file names.
>>>
>>> Tom.
>>>
>>>
>>> Thanks,
>>> Corentin
>>>
>>>
>>>
>>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann_at_[hidden]> wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> Regarding your P1208R3:
>>>>
>>>> Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>>>
>>>> Not-so-nit: a NB comment on the reflection TS asks to not use NTBS but
>>>> NTMBS and "Where NTBS is mentioned in the document under ballot, the
>>>> encoding used for the string’s value is unspecified." Jens agrees that
>>>> the proposed solution should be applied: "Specify that the strings are
>>>> first formed using the basic source character set (with
>>>> universal-character-names as necessary) then mapped in the manner
>>>> applied to string literals with no encoding prefix in phases 5 and 6 of
>>>> translation."
>>>>
>>>> I would very much hope that both changes are also applied to P1208R3. I
>>>> call this out explicitly in our recommended NB comment response paper.
>>>>
>>>> Cheers, Axel.
>>>>
>>>
>

Received on 2019-02-20 00:22:22