Date: Wed, 11 Feb 2026 14:23:55 -0500
This meeting is starting in 10 minutes.
Tom.
On 2/10/26 10:02 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a meeting *tomorrow*, Wednesday, February 11th, at
> 19:30 UTC (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20260211T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
>
> The agenda is:
>
> * P3876R0: Extending <charconv> support to more character types
> <https://wg21.link/p3876r0>.
> * P3904R0: When paths go WTF: making formatting lossless
> <https://wg21.link/p3904r0>.
>
> This is the same agenda as our last meeting on 2026-01-28
> <https://wiki.isocpp.org/2026_Telecons:SG16Teleconference2026-01-28>.
> As you may recall, we spent the entire meeting discussing *P3876R0*,
> but did not quite conclude the discussion. We took six polls, all of
> which confirmed the direction of the paper. In this meeting, we'll
> review the wording and if all goes well, poll to forward.
>
> *P3904R0* seeks to preserve the values of code units that are not part
> of a well-defined Unicode code unit sequence (e.g., a lone surrogate)
> when formatting std::filesystem::path objects for the ordinary literal
> encoding when that encoding is UTF-8. The idea is, given an ill-formed
> code unit sequence (e.g., L'\xD800'), rather than encoding a U+FFFD
> replacement character, to encode the code unit value in WTF-8
> <https://wtf-8.codeberg.page/>; an extension of UTF-8 that encodes
> lone surrogate code points as if they were valid Unicode scalar
> values. This transformation has the downside of producing text that is
> not well-formed UTF-8 (substituting a replacement character ensures
> well-formed UTF-8), but has the upside of preserving invalid code unit
> sequences in a way that allows the original path to be recovered. Note
> that common filesystems that use 16-bit code units, such as on
> Windows, do not require filesystem paths to be well-formed UTF-16.
> Also note that std::format() and std::print() support use of the "?"
> formatting option to produce a value preserving rendering of
> ill-formed code unit sequences; given a lone surrogate such as U+D800,
> use of that option would produce "\u{d800}" instead of a replacement
> character today. We therefore have a way to do round-trip preservation
> of filesystem paths today (but not via WTF-8; at least not without an
> additional explicit translation step).
>
> Tom.
>
>
Tom.
On 2/10/26 10:02 PM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a meeting *tomorrow*, Wednesday, February 11th, at
> 19:30 UTC (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20260211T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
>
> The agenda is:
>
> * P3876R0: Extending <charconv> support to more character types
> <https://wg21.link/p3876r0>.
> * P3904R0: When paths go WTF: making formatting lossless
> <https://wg21.link/p3904r0>.
>
> This is the same agenda as our last meeting on 2026-01-28
> <https://wiki.isocpp.org/2026_Telecons:SG16Teleconference2026-01-28>.
> As you may recall, we spent the entire meeting discussing *P3876R0*,
> but did not quite conclude the discussion. We took six polls, all of
> which confirmed the direction of the paper. In this meeting, we'll
> review the wording and if all goes well, poll to forward.
>
> *P3904R0* seeks to preserve the values of code units that are not part
> of a well-defined Unicode code unit sequence (e.g., a lone surrogate)
> when formatting std::filesystem::path objects for the ordinary literal
> encoding when that encoding is UTF-8. The idea is, given an ill-formed
> code unit sequence (e.g., L'\xD800'), rather than encoding a U+FFFD
> replacement character, to encode the code unit value in WTF-8
> <https://wtf-8.codeberg.page/>; an extension of UTF-8 that encodes
> lone surrogate code points as if they were valid Unicode scalar
> values. This transformation has the downside of producing text that is
> not well-formed UTF-8 (substituting a replacement character ensures
> well-formed UTF-8), but has the upside of preserving invalid code unit
> sequences in a way that allows the original path to be recovered. Note
> that common filesystems that use 16-bit code units, such as on
> Windows, do not require filesystem paths to be well-formed UTF-16.
> Also note that std::format() and std::print() support use of the "?"
> formatting option to produce a value preserving rendering of
> ill-formed code unit sequences; given a lone surrogate such as U+D800,
> use of that option would produce "\u{d800}" instead of a replacement
> character today. We therefore have a way to do round-trip preservation
> of filesystem paths today (but not via WTF-8; at least not without an
> additional explicit translation step).
>
> Tom.
>
>
Received on 2026-02-11 19:24:01
