C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] P1208R3 / source_location

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 19 Feb 2019 18:13:06 -0500
On 2/18/19 12:51 PM, Axel Naumann wrote:
> Hi Tom,
>
>> In San Diego, the guidance we gave for the stacktrace proposal is that
>> file names are implementation defined bags-of-bytes.
> How does that compare to
>
>>> "Specify that the strings are
>>> first formed using the basic source character set (with
>>> universal-character-names as necessary) then mapped in the manner
>>> applied to string literals with no encoding prefix in phases 5
>>> and 6 of translation."
> Or in other words, what does the wording look like for SG16's guidance?

We didn't offer wording for the stack trace proposal previously. But in
reviewing our notes, I do see that we explicitly requested aligning
behavior with source_location, so that's good!
(http://wiki.edg.com/bin/view/Wg21sandiego2018/D0881R3)

The existing wording for __FILE__
(http://eel.is/c++draft/cpp.predefined#1.3) doesn't specify encoding,
but null termination is implied. The wording currently in P0881R3 for
std::stack_frame::source_file() is likewise lax, though null termination
doesn't matter since the return type is std::string. Not a lot to draw
on for existing wording.

I think the wording above goes beyond what is required. If we start
with the assumption that filenames are (some times on some platforms)
sequences of bytes without an associated encoding, then trying to
describe such names in terms of the basic source character becomes a
little strange. I think we're best off just stating that the contents
are implementation defined and leaving it to implementations to be sensible.

Tom.

>
> Axel.
>
> On 18.02.19 07:16, Tom Honermann wrote:
>> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot_at_[hidden]
>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>
>>> Very good points.
>>> Wouldn't it be sufficient to specify that the strings are NTMBS
>>> encoded using the execution character set?
>>> source_location currently avoids making any assumption about how these
>>> strings are formed, including that they are derived from a source file.
>>> So since the value is implementation-defined, so should be the way
>>> it's constructed.
>>> However, it is reasonable to assume that these things are valid text
>>> and therefore have a known encoding.
>>>
>>> Adding Tom, because this is borderline SG16 territory.
>> This isn’t borderline as we have (recently) requested review of anything
>> involving file names.
>>
>>>
>>> @Tom: Do you want to see source_location this week knowing that I'd
>>> hope it would get through LWG before the end of the week?
>>> Or do you think having function_name / filename as multi-bytes strings
>>> encoded using the execution character set is reasonable?
>>> The alternative I see are
>>>
>>> * Leave it unspecified
>>> * Force a specific character set... which the world is not ready for
>> I think there is a higher level question to answer. Are the provided
>> file names display only, or should one expect to be able to open the
>> file using the provided name?
>>
>> If they are display only, then we can specify an encoding for them
>> similarly to what is done for member functions of std::filesystem::path.
>> In this case, we must explicitly acknowledge that the names do not
>> roundtrip through the filesystem (though typically will in practice).
>> Note that, on Windows, file names cannot be represented accurately using
>> char based strings, so unless we want to add wchar_t support, these
>> names will be technically display only.
>>
>> If they are potentially not display only, then we can’t associate an
>> encoding and the names are bags-of-bytes. This is a limitation of POSIX.
>> But then we need wchar_t support for Windows.
>>
>> In San Diego, the guidance we gave for the stacktrace proposal is that
>> file names are implementation defined bags-of-bytes. If we advised
>> otherwise for source location, we would be giving inconsistent guidance.
>>
>> I think we should discuss this in SG16 this week. Not necessarily to
>> propose changes for the proposal, but to solidify our collective
>> thinking around file names.
>>
>> Tom.
>>> Thanks,
>>> Corentin
>>>
>>>
>>>
>>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann_at_[hidden]
>>> <mailto:Axel.Naumann_at_[hidden]>> wrote:
>>>
>>> Hi Robert,
>>>
>>> Regarding your P1208R3:
>>>
>>> Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>>
>>> Not-so-nit: a NB comment on the reflection TS asks to not use NTBS but
>>> NTMBS and "Where NTBS is mentioned in the document under ballot, the
>>> encoding used for the string’s value is unspecified." Jens agrees that
>>> the proposed solution should be applied: "Specify that the strings are
>>> first formed using the basic source character set (with
>>> universal-character-names as necessary) then mapped in the manner
>>> applied to string literals with no encoding prefix in phases 5 and
>>> 6 of
>>> translation."
>>>
>>> I would very much hope that both changes are also applied to
>>> P1208R3. I
>>> call this out explicitly in our recommended NB comment response paper.
>>>
>>> Cheers, Axel.
>>>

Received on 2019-02-20 00:13:12