C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] P1208R3 / source_location
From: Tom Honermann (tom_at_[hidden])
Date: 2019-02-20 12:37:24


On 2/20/19 8:14 AM, Robert Douglas wrote:
> Will this prevent usage of printf ("func: %s", sl.function_name ());   ?

Can you elaborate on why you think this might be an issue? printf
expects an NTMBS.  Some interesting results might be produced for
functions with names outside the basic source character set, but how
those are handled are necessarily implementation defined.

void f\U0001F412() {}

Tom.

>
> On Tue, Feb 19, 2019, 6:37 PM Corentin <corentin.jabot_at_[hidden]
> <mailto:corentin.jabot_at_[hidden]>> wrote:
>
> It kinda is but the compiler can get a useful encoding from the
> source code but not from the source file, in the general case.
> It's mostly an issue with filesystems with no or poor encoding
> support.
>
> I don't believe the observable behavior will be widely different
> from __FILE__ and __func__ in practical terms
>
>
>
> On Tue, 19 Feb 2019 at 14:31 Robert Douglas <rwdougla_at_[hidden]
> <mailto:rwdougla_at_[hidden]>> wrote:
>
> So filename and functionname would neccessarily have different
> encodings? Does that not seem awful?
>
> On Tue, Feb 19, 2019, 6:25 PM Axel Naumann
> <Axel.Naumann_at_[hidden] <mailto:Axel.Naumann_at_[hidden]>> wrote:
>
> Thanks everyone, this is what I'll take to Core.
> Axel.
>
> On 19.02.19 13:58, Corentin wrote:
>> After talking with Tom, I'd like to modify function_name
>> to be a NTMBS as it is something we can actually
>> guarantee and I don't think __func__ should constrain the
>> design of source location. It would consistent with
>> thTstatisfy the NB comment (whose resolution was adopted
>> in that direction this morning)
>>
>> Tom convinced me that filename cannot and should not be a
>> NTMBS
>>
>>
>> On Tue, 19 Feb 2019 at 13:22 Robert Douglas
>> <rwdougla_at_[hidden] <mailto:rwdougla_at_[hidden]>> wrote:
>>
>> Agree.
>>
>> On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann
>> <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>
>> On 2/18/19 1:17 PM, Robert Douglas wrote:
>>> Historical footnote, these are intended to be as
>>> drop-in as possible for existing facilities.
>>> __FILE__ is a "character string literal," which
>>> gets it's null termination in phase 7. Since we
>>> are accessing these at run-time, we should thus
>>> expect these to be NTBS. Changes to this
>>> expectation would be a deviation from these
>>> being a drop-in replacement to __FILE__ and
>>> __func__. Note that [dcl.fct.def.general]
>>>  p 8 defines __func__ as an
>>> implementation-defined string as if static const
>>> char __func__[] = "function-name "; which
>>> implies, also, an NTBS. This is the reasoning
>>> for NTBS. To do otherwise, would deviate this
>>> feature from __FILE__ and __func__, which it is
>>> designed to replace.
>>
>> Agreed.  Certainly guaranteeing that these have a
>> null terminator is required given that
>> file_name() returns const char*.  I don't agree
>> with associating these with NTMBSs though since
>> multi-byte has encoding implications.
>>
>> Tom.
>>
>>>
>>>
>>> On Mon, Feb 18, 2019 at 11:20 AM Corentin
>>> <corentin.jabot_at_[hidden]
>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>
>>> Quick reply : display only, no expectation
>>> the file can be open, or exists, or is a
>>> file. It's purely informative. But
>>> expectation it can be displayed, the main
>>> use cases being logging. Otherwise I agree
>>> with you.
>>>
>>> On Mon, Feb 18, 2019, 7:16 AM Tom Honermann
>>> <tom_at_[hidden]
>>> <mailto:tom_at_[hidden]>> wrote:
>>>
>>>
>>> On Feb 18, 2019, at 10:04 AM, Corentin
>>> <corentin.jabot_at_[hidden]
>>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>>>
>>>>
>>>> Very good points.
>>>> Wouldn't it be sufficient to specify
>>>> that the strings are NTMBS encoded
>>>> using the execution character set?
>>>> source_location currently avoids making
>>>> any assumption about how these strings
>>>> are formed, including that they are
>>>> derived from a source file.
>>>> So since the value is
>>>> implementation-defined, so should be
>>>> the way it's constructed.
>>>> However, it is reasonable to assume
>>>> that these things are valid text and
>>>> therefore have a known encoding.
>>>>
>>>> Adding Tom, because this is borderline
>>>> SG16 territory.
>>>
>>> This isn’t borderline as we have
>>> (recently) requested review of anything
>>> involving file names.
>>>
>>>>
>>>>
>>>> @Tom: Do you want to see
>>>> source_location this week knowing that
>>>> I'd hope it would get through LWG
>>>> before the end of the week?
>>>> Or do you think having function_name /
>>>> filename as multi-bytes strings encoded
>>>> using the execution character set is
>>>> reasonable?
>>>> The alternative I see are
>>>>
>>>> * Leave it unspecified
>>>> * Force a specific character set...
>>>> which the world is not ready for
>>>>
>>> I think there is a higher level question
>>> to answer. Are the provided file names
>>> display only, or should one expect to be
>>> able to open the file using the provided
>>> name?
>>>
>>> If they are display only, then we can
>>> specify an encoding for them similarly
>>> to what is done for member functions of
>>> std::filesystem::path. In this case, we
>>> must explicitly acknowledge that the
>>> names do not roundtrip through the
>>> filesystem (though typically will in
>>> practice). Note that, on Windows, file
>>> names cannot be represented accurately
>>> using char based strings, so unless we
>>> want to add wchar_t support, these names
>>> will be technically display only.
>>>
>>> If they are potentially not display
>>> only, then we can’t associate an
>>> encoding and the names are
>>> bags-of-bytes. This is a limitation of
>>> POSIX. But then we need wchar_t support
>>> for Windows.
>>>
>>> In San Diego, the guidance we gave for
>>> the stacktrace proposal is that file
>>> names are  implementation defined
>>> bags-of-bytes. If we advised otherwise
>>> for source location, we would be giving
>>> inconsistent guidance.
>>>
>>> I think we should discuss this in SG16
>>> this week. Not necessarily to propose
>>> changes for the proposal, but to
>>> solidify our collective thinking around
>>> file names.
>>>
>>> Tom.
>>>>
>>>> Thanks,
>>>> Corentin
>>>>
>>>>
>>>>
>>>> On Mon, 18 Feb 2019 at 03:56 Axel
>>>> Naumann <Axel.Naumann_at_[hidden]
>>>> <mailto:Axel.Naumann_at_[hidden]>> wrote:
>>>>
>>>> Hi Robert,
>>>>
>>>> Regarding your P1208R3:
>>>>
>>>> Nit: it's titled "D1208R3", it
>>>> doesn't mention email addresses.
>>>>
>>>> Not-so-nit: a NB comment on the
>>>> reflection TS asks to not use NTBS but
>>>> NTMBS and "Where NTBS is mentioned
>>>> in the document under ballot, the
>>>> encoding used for the string’s
>>>> value is unspecified." Jens agrees that
>>>> the proposed solution should be
>>>> applied: "Specify that the strings are
>>>> first formed using the basic source
>>>> character set (with
>>>> universal-character-names as
>>>> necessary) then mapped in the manner
>>>> applied to string literals with no
>>>> encoding prefix in phases 5 and 6 of
>>>> translation."
>>>>
>>>> I would very much hope that both
>>>> changes are also applied to P1208R3. I
>>>> call this out explicitly in our
>>>> recommended NB comment response paper.
>>>>
>>>> Cheers, Axel.
>>>>
>>
>



SG16 list run by herb.sutter at gmail.com