SG16 will hold a meeting tomorrow on Wednesday, July 31st, at 19:30 UTC (timezone conversion).
The agenda follows.
- P3068R2: Allowing exception throwing in constant-evaluation.
- LWG issue 4087: Standard exception messages have unspecified encoding.
LEWG has requested that we review P3068R2 with respect to std::exception and related types and encoding concerns for the message provided by the what() member function. The concerns are effectively the same as those reported in LWG 4087, but in the special case of constant evaluation.
We discussed LWG 4087 during the 2024-06-12 SG16 meeting. Unfortunately, I still haven't published the meeting summary for that meeting (work, life, burnout), so that link isn't helpful right now. I'll respond to this email with a copy of the (excellent) minutes that Eddie Nolan took for that meeting. We spent much of that meeting discovering what the status quo is with regard to the standard wording. We didn't poll any direction. The status quo appears to be:
- what() returns an implementation-defined NTBS per [exception]p5.
- what() permits return of an NTMBS per [exception]p6.
- The NTMBS encoding is dependent on the C++ locale; it is the encoding that the std::codecvt<wchar_t, char, std::mbstate_t> facet uses on the char side of the conversion per the reference in [exception]p6.
- There is no guarantee that the C++ locale has not changed in between construction of an exception object and a call to what() for that same object.
- The postconditions of the std::exception copy constructor and assignment operator and the constructors of the exception classes declared in <stdexcept> all require that what() return a pointer to an exact copy of the what_arg string provided when the exception object was constructed; no transcoding is permitted. The postconditions of std::filesystem::filesystem_error are similar per [fs.filesystem.error.members].
- We might be able to strengthen the requirements for handling of encodings for std::filesystem::filesystem_error::what() specifically; normative encouragement is present per [fs.filesystem.error.members]p7.
The status quo suggests that, for the purposes of std::format(), the string returned by what() should be treated as containing (possibly ill-formed) text in the NTMBS encoding of the current C++ locale (or perhaps an explicitly provided std::locale argument).
With respect to P3068R2, there is currently no notion of a locale dependent NTMBS encoding during constant evaluation. We'll need to discuss the ramifications of this, presumably identify an encoding to use instead (presumably the ordinary literal encoding), and determine how to adjust wording accordingly.
Here are the rough minutes from the 2024-06-12 SG16 meeting for reference. Thank you again to Eddie Nolan for capturing these!
Attendance:
- (SD) Steve Downey
- (JM) Jens Maurer
- (TH) Tom Honermann
- (VZ) Victor Zverovich
- (MD) Mark de Wever
- (BG) Braden Ganetsky
- (NO) Nathan Owen
SD: Our agenda is to discuss the LWG issues. We'll be discussing
4070, 4087, and 4090.
SD: (reading the issue) If CharT is char, path::value_type is
wchar_t, and the literal encoding is UTF-8, then the escaped path
is transcoded from the native encoding for wide character strings
to UTF-8 with maximal subparts of ill-formed subsequences
substituted with u+fffd replacement character per the Unicode
Standard [...]. Otherwise, transcoding is implementation-defined.
This seems to mean that the Unicode substitutions are only done
for an escaped path, i.e. when the ? option is used. Otherwise,
the form of transcoding is completely implementation-defined.
However, this makes no sense. An escaped string will have no
ill-formed subsequences, because they will already have been
replaced
So only unescaped strings can have ill-formed sequences by the
time we do transcoding to char, but whether or not any u+fffd
substitution occurs is just implementation-defined.
I believe we want to specify the substitutions are done when
transcoding an unescaped path (and it doesn't matter whether we
specify it for escaped paths, because it's a no-op if escaping
happens first, as is apparently intended).
It does matter whether we escape first or perform substitutions
first. If we escape first then every code unit in an ill-formed
sequence is individually escaped as \x{hex-digit-sequence}. So an
ill-formed sequence of two wchar_t values will be escaped as two
\x{...} strings, which are then transcoded to UTF-8. If we
transcode (with substitutions first) then the entire ill-formed
sequence is replaced with a single replacement character, which
will then be escaped as \x{fffd}. SG16 should be asked to confirm
that escaping first is intended, so that an escaped string shows
the original invalid code units. For a non-escaped string, we want
the ill-formed sequence to be formatted as �, which the proposed
resolution tries to ensure.
VZ: As an author of the paper I'd like to confirm that it's indeed
intended to first do escaping and then do transcoding. That's why
the wording is that. I agree with Jonathan that it misses the
important bit that for non-escaped paths. I think the resolution
is mostly correct, except I think Tom commented in the email that
the second part of the resolution, which is new to me, is a little
bit incorrect. I think we want is to kind of invert the condition
there, but this does something completely different.
TH: I think what we want to say there is just "and the literal
encoding is not UTF-8". wchar_t encoding is still implementation
defined so there's still an implementation defined aspect there. I
don't think we need to add anything to the
implementation-definedness.
SD: What we're saying is that if you're fully in Unicode, there's
no implementation defined behavior, we're completely mandating the
behavior.
TH: Specifically, when the literal encoding is UTF-8.
SD: And this is an index entry that just links back, so it's just
trying to identify-- this is just an index entry, there isn't any
larger context. It's trying to describe it well enough so someone
looking at the implementation-defined behaviors can find it.
SD: I'll admit I haven't really thought about this a lot.
VZ: I agree with Tom, this is a mistake in the table of
implementation defined behavior. It should do what Tom says-- we
should replace, "not converting from wchar_t to UTF-8" with "when
the literal encoding is not UTF-8." And the first part, I think,
is correct.
SD: Okay. So in the text of the standard itself we want to
basically strike "escaped path" and replace it with that
"(possibly escaped) string"
VZ: That part is fine.
SD: But defining the implementation-defined behavior is not
correct.
VZ: They should just take the wording as it is and put it in the
index.
SD: All the wordings in that index are very short summaries of
what the implementation-defined category is.
JM: It just tries to give a headline. It shouldn't be wrong but
it's not necessarily complete. As long as we satisfy that, it's
good enough.
TH: Viktor, it sounds like you have a good handle on it. Want to
paste the recommendation in chat?
VZ: That's what I'm typing.
VZ (in chat): "the literal encoding is not UTF-8" instead of "not
converting from wchar_t to UTF-8"
SD: That probably covers any interesting case.
JM: It's not fully right because it's not implementation-defined
only if CharT is char and path::value_type is wchar_t.
presumably if CharT is char16_t, everything's also implementation
defined but I don't know whether that's a possibility.
JM: And it talks about literal encoding when it might want to talk
about ordinary literal encodings. Is it talking about the literal
encoding for wide strings or for char? But that question's not on
our plate.
JM: Because the wide literal encoding could be UTF-16 or
something, or UCS-2 or whatever.
JM: So I like Viktor's suggestion for the implementation defined
behavior index.
JM: We already have "literal encoding" in the normative text, so
if it's ambiguous there it should be the same ambiguity in the
implementation-defined index.
TH: Should it say "ordinary literal encoding?"
JM: Maybe but that's not the question of this issue.
SD: There are many places we've already made this mistake.
Cleaning it up should be a one-time thing where we go through and
clarify whether we actually mean ordinary or literal encoding. The
sense I'm getting is that we want to change
JM: All we're doing here is correctly quoting the normative text.
SD: So the resolution is we accept the resolution for clause 1,
and for the second part accept Viktor's recommendation.
JM: And what was the concern why this doesn't work? Because what
we have here is the text about the literal encoding thing, right?
Let me see Tom's email.
SD: This doesn't constrain what an implementation can define it to
be-- they could perfectly well convert to UTF-8 when the ordinary
literal encoding's not UTF-8 but it's up to implementations to
serve their users.
JM: Yes, okay, great. So, Tom, are you happy with not introducing
"ordinary" for the sake of quoting the text, or should we make
this a bigger issue?
TH: No, I'm fine with that, like Steve said, we can do a separate
cleanup issue or file an LWG issue.
JM: So the green text should be "and the literal encoding is not
UTF-8."
TH: Yes, that sounds good.
SD: Moving on to 4087.
VZ: std::exception is a few remaining standard types that isn't
formattable. I looked into it and found the problem that we don't
actually specify what encoding the string returned by what() is
in. We just say that it's something that can be converted to
wstring somehow. Which is very vague. So it's impossible to
implement a formatter properly because you don't know the encoding
to convert from or whether conversion is needed. I gave an example
with a path, but it's a more general problem-- path is one of the
most obvious and outrageous cases because, as part of the path,
you can get the filename. So the exception encoding has one
encoding and you get a filename in a possibly different encoding
and try to format it with the literal encoding and you get three
encodings in one message-- simply a mess. My proposed resolution
is incomplete-- it's just a first attempt to propose something to
start the discussion. I'm saying it should probably be compatible
with the ordinary literal encoding. That's what people normally
do, combine it with literal strings and output. I had an email
forwarded to SG16 which had 4 options which nicely summarize what
we can choose. I think Tom, separately suggested using the locale
encoding.
SD: I would expect this, barring any external constraints, to be
in the current execution encoding. Which isn't necessarily the
literal encoding. That is a common source of broken text, but that
is the state of the world. If I'm handed a char* and no other
information, it's the execution encoding.
VZ: At the very least we should specify the encoding. Now it
doesn't say anything.
JM: Fully agreed.
SD: Especially because this is instructing end users what they
should be stuffing in these things.
TH: Does multibyte not imply the locale encoding?
JM: It doesn't.
TH: Because we have the association with mbstowcs.
TH: This has always been very vaguely specified.
SD: Does NTBS include multibyte?
JM: Yes. Well, no, wait. The other way around, I thought. Wait
wait wait. A null-terminated byte string, NTBS, is a char sequence
whose highest addressed element with defined content has value 0.
NO other element has value 0. An NTMBS is an NTBS, that has a
sequence of valid multibyte characters. So an NTBS is everything,
an NTMBS is one that has valid multibyte characters.
TH: Whatever those are.
JM: Now the question is what this NTBS-- it's in the C standard.
TH: mbstowcs.
JM: The mbstowcs function converts a sequence of multibyte
characters that begins in the initial shift state-- it just
returns a null-terminated byte sequence-- the conversion function
into a sequence of corresponding wide characters. Each is
converted as if by a call to mbtowc function. Except that the
conversion state of the mbtowc function is not affected. So for
the specific conversion it defers to the other function.
SD: This does seem overall to be just a whole class of interesting
ways of producing broken text. The whole what() facility,
assembling user-specified data with string literals and doing
something to them in the hopes that someone can reconstruct
something intelligible.
JM: The heading for this mbtowc function says, the behavior of the
multibyte character functions is affected by the LC_CTYPE category
of the current locale. Apparently LC_CTYPE can change what a
multibyte character sequence is. That means, essentially, the
definition of what a multibyte character string is is dependent on
the LC_CTYPE locale category because the definition of a multibyte
character sequence says it must be a valid sequence, and the
locale tells me what's valid and what's not. Presumably that means
it's actually the global locale or thread-related locale.
SD: Or in our current terminology, the execution encoding.
JM: Which is unfortunate, because usually C++ tries to make the
local explicit in the interface. In iostreams you can imbue the
locale of your choice, you don't need global state which is broken
by design.
SD: Plus the built in race condition during the exception.
TH: Passing in a locale wouldn't work because the message is
constructed much earlier.
JM: You want to pass it in at the place the exception is
generated, not when the what function is called.
TH: But the locale could have changed when you invoke what().
JM: At least it's well-defined. If you call what() and can't make
sense out of it, then it's your fault. But it's hypothetical,
because there's no way to pass a locale at the point where the
filesystem is generated. Is there locale stuff on filesystem?
TH: There is.
VZ: No, it just says "a system-specific encoding."
TH: I think some of the functions do actually take a locale, it's
used to do a conversion to the encoding you're talking about.
JM: The example in the issue where file size is being queried
doesn't seem like somewhere a locale fits in.
SD: This is an example of the general problem.
JM: Looking at the example, there's nowhere to pass in a locale.
No one expects to pass a locale to a file size query function.
TH: The only way you get non-mojibake out is if the global locale
was consistent from the time the message was created to when it
was received.
VZ: Did we figure out that NTBS is always in the global locale?
JM: NTMBS is in the global locale.
SD: Unless it's specifically a string literal.
VZ: But what we have is NTBS.
TH: The remarks say NTMBS but the text says NTBS. It's not
consistent.
JM: The returns says NTBS, which is any kind of null-terminated
byte sequence. The remarks say, we already told you earlier that
NTBMS is a valid NTBS. That just gives permission to the
implementation to give you an NTMBS as opposed to just an NTBS.
What we can do is, for the case of an NTMBS, where we already say
it's suitable for conversion and display as a wstring, we might
want to clarify that it was suitable at the time of construction
of the exception for wstring, because it needs to evaluate the LCC
type at construction time, not when you call what. That's what's
missing from the remarks, otherwise we already have the
cross-reference to CVT so we already know what's happening. For
the returns part, which are the minimum requirements, we haven't
solved anything. So far the standard has even refrained from
telling you it must be a multibyte string if multibyte strings are
on your platform. An implementation can return an ASCII only
string even if it could return a multibyte string. We have two
things we should do. One is to clarify the remarks with respect to
when the suitable for conversion and display holds. That holds
only immediately after construction and not later (or we restrict
changing the LCC type or whatever). What we do for plain NTBS's--
I don't know. Maybe the best thing is not to talk about it.
TH: For solving Viktor's problem, we have two concerns. The file
path, incorporating it into a message. That's one issue. As for
taking this NTBS that comes out and getting it formatted, we can
specify "as if using the C function. We can just specify to use
the global locale and you get what you get. Sometimes there might
be some weird translations.
VZ: To clarify, by global locale we mean global C locale? There's
multiple sets of locales. LCC locale and C locale. You can
separately set both of them, they're unrelated.
SD: You can change various parts of the locale bits independently.
JM: No, we're talking about the function call set_locale. There's
a C variant of the global set_locale, that takes a C variant, and
there's an equivalent for the C++ locale structures, which sets an
independent locale state.
TH: It may also set the C locale.
JM: But it's not required to. At the start of the program you can
expect that they're the same. The question is, which one do we
take. Presumably the C++ locale.
TH: Except that we have the reference to the conversion functions
which are C-based and use the C locale.
JM: Where?
TH: Maybe I misunderstood before. mbstowcs?
JM: That's not what we do here. The actual cross reference is to
the codecvt facet. The class codecvt is for use when converting
from one character encoding to another... . We have ctype,
wchar_t, and mbstate. Presumably wchar_t is the internal encoding,
the external encoding is char, and the state_t is an mbstate which
is a transformation. codecvt converts between the native character
set for ordinary and wide characters.
TH: This might be a case where it'd be good to try to ... some
implementations and set the C and C++ locales differently and see
which one you get.
JM: Where do you want to get what?
TH: Produce an exception object but have locale set.. but we don't
specify
JM: We want to specify transcoding..
TH: There is transcoding of file paths on the Windows side.
JM: This text talks about the OS-dependent current encoding for
path names which in this case is CP1251.
VZ: I think path is a red herring because it has its own unrelated
transcoding. What we need to specify is, what's the target
encoding for exception? And specify what the output of the path
method should be converted into. What Tom is suggestion is to look
at what path does, that's not correct.
SD: For anyone producing this string, what should they be trying
to do? I think they should be targeting the current execution
encoding as defined by locale.
EN: Which one.
JM: The global C++ locale. No reference to the C locale in the
cross reference. It says locale codecvt, which you get from the
global C++ locale.
VZ: I have a question to Jens. Clarify: the what() makes sense
after you construct, because locale can change
SD: At the point of construction.
VZ: Locale can be changed asynchronously-- what do you mean by
that?
SD: That you have a problem if someone does that.
JM: Well, no. Where's the global C++ locale query function? Is
that the default ctor of the locale class?
TH: Maybe? There might also be a global static factory function.
JM: Yes, there's a classic thing (useless) and a global locale
function...
JM: If we have a named locale, you get the C locale set to the
same thing, otherwise all bets are off. locale() is the
constructor of the locale class which gets you a copy of the
global c++ locale. Race conditions aren't relevant here-- it's the
ctor of a class, no special rules on race conditions, you can call
the locale global setting function unsynchronized and the stdlib
has to deal with it.
SD: You have a logical race condition between starting this
process and who interprets it, but that's baked in.
VZ: But the ctor might have multiple arguments ,what if the local
changes in between?
JM: Your program is broken.
VZ: Why?
JM: Because we don't prevent anyone from changing the locale
midway. The best atomicity guarantee is the default ctor of
locale. If you call it multiple times in close proximity and get
different results, tough luck.
TH: So you're supposed to acquire your own copy and reuse it.
VZ: We should specify that somehow, that it's in one locale and
not a multiple of locales.
SD: That's instructing programmers not to do broken things.
VZ: One exception object with potentially multiple things it needs
locale for.
JM: What multiple things? It will use the locale in effect when
initiating the ctor call of the exceptions. For user exceptions
that, eg.. combine system_errors into one exceptions, all we can
do is throw up our hands. WE can't even query which encodings were
used. Changing the global locale is a bad idea and as much as we
should be able to convey that, we should do that.
SD: The best we can do is, for exception definitions, what does
what() return? An NTBS to be interpreted in the locale that was in
effect when the exception was constructed.
JM: I don't know about this NTBS part. I'm pretty sure, if we have
an implementation that is an NTMBS, then that should be in the
locale at the time of construction. That's the easy part. If we
just have an NTBS, that is not a multibyte string, which isn't
Victor's example, by the way, because it combines UTF-8 with an
odd encoding in the exception string, but for the NTBS case where
the implementation chooses not to provide an NTMBS, just an NTBS,
which doesn't have enough capabilities to represent the union of
the characters in the explanatory string and the path, I don't
know what to do.
JM: Maybe all we can do is say, an implementation defined NTBS,
and stop there, and you get what you get, but you can give a
remarks recommendation for what happens for the multibyte case. If
the implementation tries to be helpful to you, you should get
something we know how to interpret, but it's in principle QOI.
Maybe you're on a small system where there's no practical choice
of encoding, so the NTBS of your system is all that counts.
SD: It's possible that an NTMBS is still a single byte encoding.
It's about who's promising what and when. But I agree. In the
remarks when we clarify this, we can say if someone hasn't handed
you a string in the locale when the string was constructed,
they're breaking your contract.
JM: So do we have agreement that we fine tune the NTMBS wording
because we have machinery how to interpret NTMBS strings, and we
leave the guarantee alone because there isn't any guarantee?
SD: The guarantee is that it's null terminated.
JM: Which is not very useful. But again, the remarks say, not as
clearly as they could, they say this is how you can be helpful to
your users. And it's implementation defined so presumably you can
ask your implementer.
SD: I propose that I'll take on drafting something after this
meeting, that we can propose as the resolution.
VZ: It's not sufficient. The core of the issue is you can't say
anything about what() and we're not fixing that.
SD: That's the state of the world. The remarks are guiding QOI.
VZ: It's broken and we're keeping it broken. We're not doing
anything.
SD: I don't see a way of telling everyone generally, because
there's user data that can show up in what(). The file name can be
misencoded. So there's no way to guarantee that this can be put
into a properly encoding string of anything.
TH: But what we can say is that for the purposes of std::format,
if you call what() on an exception object, interpret it as an
NTMBS, and for anything that doesn't convert you do escaping.
SD: Yes. When trying to format one of these strings, you're going
to have to be suspicious because it's foreign data. It's an NTMBS
in the execution encoding, do your best to produce output in the
requested format. But what() should be in the execution encoding.
EN: Couldn't we require NTMBS that's not NTBS to be forbidden?
JM: Do we have an overview of standard library implementations? If
we tighten the rules on what what() can return, we tighten the
rules on what can be passed as ctor args to e.g. logic_error.
Because what() returns byte-wise what was passed in. So no
transcoding can happen in the ctor. But now that we require a
valid MBS in what() we require a valid MBS as the ctor to the
exceptions. Those people don't care about encoding-- they just
want to say what they put in they get out. Do we want to
invalidate them?
EN: Seems convincing that we shouldn't.
SD: I think that, first off, if the execution encoding and
ordinary literal encoding aren't compatible, you have deep
problems producing any output whatsoever. Hello world starts to
fail.
JM: That's all fine but the point is my program has library UB if
I violate the preconditions of a library function. That's not a
good place to be in.
TH: I strongly agree. We don't want to invalidate any user code.
SD: Not in the exception or what parts. IN producing a formatter,
things are in play.
JM: The formatter has all rights to say, I expect the what string
to be an NTMBS. That is totally fine and good. Then you convert
from that NTMBS to whatever you want and go from there. That seems
plausible. But that's the formatter at the point it wants to
output that stuff. It needs to understand the details. We can
strengthen the words to say something like, we recommend that
implementations when constructing the what string on your own, as
opposed to a user one, should create a valid NTMBS. I don't think
we can do more than that. Life would be so much better if we just
said UTF-8 everywhere, but that's not our life.
VZ: I think there was some mischaracterization of the example.
Something like, because it's path we can't do anything. In fact we
can do a lot to improve the situation. We can have all the info,
even though we don't now. If we know the encoding of the path, we
can get a perfect output even in the constraints of the current
system. You can display arbitrary binary data through escaping.
JM: Do you want to expose the filesystem path itself in the
exception object so the formatter can use it?
VZ: It's already exposed as part of the message and should be
aligned with the rest of the text, not in a collection of the
text.
JM: We agree that what() should not be in multiple encoding. It
*should* as in implementation recommended practice, definitely.
That's what we're trying to formulate.
SD: I think the phrase here, in the native format, is woefully
vague, and a source of confusion as part of this. As Jens already
pointed out, there are other exceptions which just take a string
and copy it, which have all the same potential issues as Viktor
identified. This is more remediable by an implementer.
JM: We can address the filesystem issue, I think, at least we can
push implementations in the right direction. I'm not sure we can
do anything reasonable for exceptions as a whole.
TH: I agree. Viktor, I think if a path is being included in the
message, we'd want to reinterpret it and escape it using the
mechanism std::format use. Can we get away with convincing
implementers to change existing code?
JM: Certainly, it's an untenable situation that we have one NTMBS
that uses two encodings inside. That will never ever work.
VZ: We have an implementer here. Mark, would you be willing to
change exception messages?
MD: I'm not sure we want to. This might have implications on
users. Would be good to investigate further. With libc++ we are
typically UTF-8 only which makes our life easier. ON Windows
people have multiple encodings.
JM: The filesystem_error members not only takes two components but
actually three components. A what arg, a a literal string, a path,
and an error code. We have three components: user-defined (unknown
encoding), path (known), error_code converted to system_error
string (known to implementation). The problem is slightly larger
than we thought. We also need to require from users that their
what() arg is of the right kind, which is probably okay for
filesystem_error because we have good reasons to require that at
that point.
SD: I'll do some drafting work about the remarks.
JM: And there also needs to be a change in the guarantee of the
filesystem_error what. It just says that what() returns an NTBS,
We can tighten that part and add preconditions to the what() arg
of the ctors. We might be able to convince implementers to improve
that, specifically for filesystem_error.
JM: Introducing 4090. We have std::format, we have variants of
std::foramt that take an explicit std::locale parameter. Let's
focus on those. Then we have an L format specifier that uses the
locale you passed in. We have a statement, "For integral types,
the locale-specific form causes the context's locale to be used to
insert the appropriate digit group separator characters." There's
probably something similar for floating point. We have several
options to get that promise done and remember that locales are
user-configurable and therefore the users actually sees which
virtual functions are being invoked. For iostreams we have
specific rules under which circumstances functions are being
invoked. When outputting numbers num_put is invoked. WE don't say
this here, we should say something. Is it enough for users to
override the num_put facility to get different formatting, or do
they also or instead of, replace the numpunct facility? There are
also _byname facets, are those relevant? And so that's the
fundamental question here. The problem is that the num_put
facility may not actually insert the appropriate digit group
separator characters even though numpunct may specify which ones
are appropriate, because the user may ignore numpunct and do
num_put the way I want. numpunct also allows only single-byte
characters as digit separators. If we have UTF-8 and some Asian
locale, we could do interesting separator characters. We can't do
that with numpunct. There's a practical benefit of requiring a
call to num_put.
SD: I could make the digit separator a half-width comma.
JM: Something like that.
TH: We could find out what implementations are doing.
SD: Because these are user-creatable, they're user-perceivable.
JM: User-observable, so we need to be precise or expressly
imprecise.
SD: Either tell users precisely what's going to happen or tell
them to bring it up with their implementers.
JM: Or tell them they're doing something wrong and invoking UB.
JM: Tom suggested we should have an implementation survey of
existing std::format implementations.
TH: I'm looking at the msft implementation, was hoping Mark would
know offhand.
MD: I don't know offhand, I can take a look. It's different from
streams.
JM: Iostreams requires num_put. num_put just uses character
ranges, not streams, right?
TH: The msft implementation does use numpunct somewhere. In an
internal function called write_integral. And no uses of num_put.
JM: and num_put takes a reference to an ios_base as a parameter,
which is an alien concept to construct in a formatter, but not
something that'd be impossible to construct. But it does take an
iter_type, which is an output iterator, which is a template
parameter, which by default is an ostreambuf_iterator, but that's
presumably configurable-- except that then we have to tell the
world what we actually use if not an ostreambuf_iterator. So we
might conclude that num_put is too tied to iostreams to bother
with in the format context.
TH: Has std::format been implemented in a shipping libstdc++?
MD: Yes in 13. Not complete, 14 has more improvements.
TH: libstdc++ seems to use numpunct as well. Lots of uses of
numpunct and no uses of num_put.
JM: I agree. Looks like numpunct wins.
SD: We should specify that it's doing numpunct. That does mean
that you only get a char type for it.
JM: Well, we already know the locale interface is broken. If we
come up with a better way, maybe we'll have a new overload of
std::format.
MD: IMO, this opinion should also address floating point and
boolean values (the true name and the false name).
TH: Jens, will you offer a PR for your issue?
JM: I don't know, why? It's broken, you can keep all the pieces.
I'm not supplying glue.
TH: Well, but in terms of actually specifying that numpunct is
used?
JM: Yes, so?
TH: What do you think we should do with the issue you filed?
JM: WE should tell LWG that SG16 resolved that after
implementation review, numpunct is the winner and use of numpunct
is explicitly specified, as is true type, false type, and for
floating point numpunct is the only thing. WE can pass that on as
prose text and someone can morph it into a PR if they want to.
Tom.