Date: Tue, 27 Jan 2026 16:38:39 -0500
SG16 will hold a meeting *tomorrow*, Wednesday, January 28th, at 19:30
UTC (timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20260128T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
The agenda is:
* P3876R0: Extending <charconv> support to more character types
<https://wg21.link/p3876r0>.
* P3904R0: When paths go WTF: making formatting lossless
<https://wg21.link/p3904r0>.
We briefly started discussing *P3876R0* during the 2026-01-14 SG16
meeting. I haven't published notes for that meeting because I lacked
edit access to the new 2026 telecons wiki
<https://wiki.isocpp.org/2026_Telecons>. I seem to have access now, so
I'll try to get those notes published today. There wasn't much time for
discussion, so there isn't much to summarize beyond Jan's presentation
of the paper.
*P3904R0* seeks to preserve the values of code units that are not part
of a well-defined Unicode code unit sequence (e.g., a lone surrogate)
when formatting std::filesystem::path objects for the ordinary literal
encoding when that encoding is UTF-8. The idea is, given an ill-formed
code unit sequence (e.g., L'\xD800'), rather than encoding a U+FFFD
replacement character, to encode the code unit value in WTF-8
<https://wtf-8.codeberg.page/>; an extension of UTF-8 that encodes lone
surrogate code points as if they were valid Unicode scalar values. This
transformation has the downside of producing text that is not
well-formed UTF-8 (substituting a replacement character ensures
well-formed UTF-8), but has the upside of preserving invalid code unit
sequences in a way that allows the original path to be recovered. Note
that common filesystems that use 16-bit code units, such as on Windows,
do not require filesystem paths to be well-formed UTF-16. Also note that
std::format() and std::print() support use of the "?" formatting option
to produce a value preserving rendering of ill-formed code unit
sequences; given a lone surrogate such as U+D800, use of that option
would produce "\u{d800}" instead of a replacement character today. We
therefore have a way to do round-trip preservation of filesystem paths
today (but not via WTF-8; at least not without an additional explicit
translation step).
Tom.
UTC (timezone conversion
<https://www.timeanddate.com/worldclock/converter.html?iso=20260128T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
The agenda is:
* P3876R0: Extending <charconv> support to more character types
<https://wg21.link/p3876r0>.
* P3904R0: When paths go WTF: making formatting lossless
<https://wg21.link/p3904r0>.
We briefly started discussing *P3876R0* during the 2026-01-14 SG16
meeting. I haven't published notes for that meeting because I lacked
edit access to the new 2026 telecons wiki
<https://wiki.isocpp.org/2026_Telecons>. I seem to have access now, so
I'll try to get those notes published today. There wasn't much time for
discussion, so there isn't much to summarize beyond Jan's presentation
of the paper.
*P3904R0* seeks to preserve the values of code units that are not part
of a well-defined Unicode code unit sequence (e.g., a lone surrogate)
when formatting std::filesystem::path objects for the ordinary literal
encoding when that encoding is UTF-8. The idea is, given an ill-formed
code unit sequence (e.g., L'\xD800'), rather than encoding a U+FFFD
replacement character, to encode the code unit value in WTF-8
<https://wtf-8.codeberg.page/>; an extension of UTF-8 that encodes lone
surrogate code points as if they were valid Unicode scalar values. This
transformation has the downside of producing text that is not
well-formed UTF-8 (substituting a replacement character ensures
well-formed UTF-8), but has the upside of preserving invalid code unit
sequences in a way that allows the original path to be recovered. Note
that common filesystems that use 16-bit code units, such as on Windows,
do not require filesystem paths to be well-formed UTF-16. Also note that
std::format() and std::print() support use of the "?" formatting option
to produce a value preserving rendering of ill-formed code unit
sequences; given a lone surrogate such as U+D800, use of that option
would produce "\u{d800}" instead of a replacement character today. We
therefore have a way to do round-trip preservation of filesystem paths
today (but not via WTF-8; at least not without an additional explicit
translation step).
Tom.
Received on 2026-01-27 21:38:43
