Very good points.
Wouldn't it be sufficient to specify that the strings are NTMBS encoded using the execution character set?
source_location currently avoids making any assumption about how these strings are formed, including that they are derived from a source file.
So since the value is implementation-defined, so should be the way it's constructed.
However, it is reasonable to assume that these things are valid text and therefore have a known encoding.
Adding Tom, because this is borderline SG16 territory.
This isn’t borderline as we have (recently) requested review of anything involving file names.
@Tom: Do you want to see source_location this week knowing that I'd hope it would get through LWG before the end of the week?
Or do you think having function_name / filename as multi-bytes strings encoded using the execution character set is reasonable?
The alternative I see are
- Leave it unspecified
- Force a specific character set... which the world is not ready for
I think there is a higher level question to answer. Are the provided file names display only, or should one expect to be able to open the file using the provided name?
If they are display only, then we can specify an encoding for them similarly to what is done for member functions of std::filesystem::path. In this case, we must explicitly acknowledge that the names do not roundtrip through the filesystem (though typically will in practice). Note that, on Windows, file names cannot be represented accurately using char based strings, so unless we want to add wchar_t support, these names will be technically display only.
If they are potentially not display only, then we can’t associate an encoding and the names are bags-of-bytes. This is a limitation of POSIX. But then we need wchar_t support for Windows.
In San Diego, the guidance we gave for the stacktrace proposal is that file names are implementation defined bags-of-bytes. If we advised otherwise for source location, we would be giving inconsistent guidance.
I think we should discuss this in SG16 this week. Not necessarily to propose changes for the proposal, but to solidify our collective thinking around file names.
Regarding your P1208R3:
Nit: it's titled "D1208R3", it doesn't mention email addresses.
Not-so-nit: a NB comment on the reflection TS asks to not use NTBS but
NTMBS and "Where NTBS is mentioned in the document under ballot, the
encoding used for the string’s value is unspecified." Jens agrees that
the proposed solution should be applied: "Specify that the strings are
first formed using the basic source character set (with
universal-character-names as necessary) then mapped in the manner
applied to string literals with no encoding prefix in phases 5 and 6 of
I would very much hope that both changes are also applied to P1208R3. I
call this out explicitly in our recommended NB comment response paper.