On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann <tom@honermann.net> wrote:

On 2/18/19 1:17 PM, Robert Douglas wrote:

Historical footnote, these are intended to be as drop-in as possible for existing facilities. __FILE__ is a "character string literal," which gets it's null termination in phase 7. Since we are accessing these at run-time, we should thus expect these to be NTBS. Changes to this expectation would be a deviation from these being a drop-in replacement to __FILE__ and __func__. Note that [dcl.fct.def.general]

p 8 defines __func__ as an implementation-defined string as if static const char __func__[] = "function-name "; which implies, also, an NTBS. This is the reasoning for NTBS. To do otherwise, would deviate this feature from __FILE__ and __func__, which it is designed to replace.

Agreed. Certainly guaranteeing that these have a null terminator is required given that file_name() returns const char*. I don't agree with associating these with NTMBSs though since multi-byte has encoding implications.

Tom.

On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot@gmail.com> wrote:

Quick reply : display only, no expectation the file can be open, or exists, or is a file. It's purely informative. But expectation it can be displayed, the main use cases being logging. Otherwise I agree with you.

On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom@honermann.net> wrote:

On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot@gmail.com> wrote:

Very good points.

Wouldn't it be sufficient to specify that the strings are NTMBS encoded using the execution character set?

source_location currently avoids making any assumption about how these strings are formed, including that they are derived from a source file.

So since the value is implementation-defined, so should be the way it's constructed.

However, it is reasonable to assume that these things are valid text and therefore have a known encoding.

Adding Tom, because this is borderline SG16 territory.

This isn’t borderline as we have (recently) requested review of anything involving file names.

@Tom: Do you want to see source_location this week knowing that I'd hope it would get through LWG before the end of the week?

Or do you think having function_name / filename as multi-bytes strings encoded using the execution character set is reasonable?

The alternative I see are

Leave it unspecified

Force a specific character set... which the world is not ready for

I think there is a higher level question to answer. Are the provided file names display only, or should one expect to be able to open the file using the provided name?

If they are display only, then we can specify an encoding for them similarly to what is done for member functions of std::filesystem::path. In this case, we must explicitly acknowledge that the names do not roundtrip through the filesystem (though typically will in practice). Note that, on Windows, file names cannot be represented accurately using char based strings, so unless we want to add wchar_t support, these names will be technically display only.

If they are potentially not display only, then we can’t associate an encoding and the names are bags-of-bytes. This is a limitation of POSIX. But then we need wchar_t support for Windows.

In San Diego, the guidance we gave for the stacktrace proposal is that file names are implementation defined bags-of-bytes. If we advised otherwise for source location, we would be giving inconsistent guidance.

I think we should discuss this in SG16 this week. Not necessarily to propose changes for the proposal, but to solidify our collective thinking around file names.

Tom.

Thanks,

Corentin

On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann@cern.ch> wrote:

Hi Robert,

Regarding your P1208R3:

Nit: it's titled "D1208R3", it doesn't mention email addresses.

Not-so-nit: a NB comment on the reflection TS asks to not use NTBS but
NTMBS and "Where NTBS is mentioned in the document under ballot, the
encoding used for the string’s value is unspecified." Jens agrees that
the proposed solution should be applied: "Specify that the strings are
first formed using the basic source character set (with
universal-character-names as necessary) then mapped in the manner
applied to string literals with no encoding prefix in phases 5 and 6 of
translation."

I would very much hope that both changes are also applied to P1208R3. I
call this out explicitly in our recommended NB comment response paper.

Cheers, Axel.