Date: Wed, 25 Oct 2023 20:34:18 +0200
It's the usual DST problem. The meeting follows US DST rules which are at
different times, so once or twice in March and October the meetings are an
hour off.
On Wed, Oct 25, 2023 at 8:32 PM Peter Brett via SG16 <sg16_at_[hidden]>
wrote:
> Slightly confused… I thought our meetings were usually at 19:30:00 UTC but
> maybe this one is not.
>
>
>
> Peter
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Tom Honermann
> via SG16
> *Sent:* 25 October 2023 15:51
> *To:* sg16_at_[hidden]; Alisdair Meredith <alisdairm_at_[hidden]>;
> Jonathan Wakely <cxx_at_[hidden]>; Charles Barto <chbarto_at_[hidden]>;
> Mark de Wever <koraq_at_[hidden]>
> *Cc:* Tom Honermann <tom_at_[hidden]>
> *Subject:* Re: [SG16] Agenda for the 2023-10-25 SG16 telecon
>
>
>
> EXTERNAL MAIL
>
> This is your friendly reminder that we are meeting today, in about 4 1/2
> hours.
>
> Tom.
>
> On 10/24/23 1:11 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, October 25th, at 19:30 UTC (timezone
> conversion
> <https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20231025T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXc78d0cpg$>
> ).
>
> The agenda follows.
>
> - charN_t, char_traits, codecvt, and iostreams:
>
>
> - P2873R0: Remove Deprecated Locale Category Facets For Unicode from
> C++26
> <https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>
> - LWG 3767: codecvt<charN_t, char8_t, mbstate_t> incorrectly added
> to locale
> <https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> - LWG 2959: char_traits<char16_t>::eof is a valid UTF-16 code unit
> <https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
>
>
> - SG16 #32: std::char_traits<char16_t>::eof() requires uint_least16_t
> to be larger than 16 bits
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>
>
>
> - SG16 #33: A correct codecvt facet that works with basic_filebuf
> can't do UTF conversions
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>
>
> Hang on, this is going to be a bumpy ride.
>
> When char16_t and char32_t were added for C++11, the standard library was
> extended to support corresponding specializations of std::char_traits (
> [char.traits.general]p1
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/char.traits.general*1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcRhrE-aA$>)
> and std::basic_string ([string.classes.general]p1
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.classes*general-1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcc1XPjkE$>).
> Curiously, type aliases were added for specializations of the std::fpos (
> [iosfwd.syn]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/iosfwd.syn*lib:fpos__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctBcLacg$>)
> class template (but only in the synopsis) and support for these types was
> added for the std::codecvt ([tab:locale.category.facets]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.category.facets__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcweUEDJM$>)
> and std::codecvt_byname ([tab:locale.spec]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.spec__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcJUdH4Dw$>)
> locale facets, but not for any of the other locale facets nor for iostreams
> in general. Support for these types was added to std::basic_string_view (
> [string.view.synop]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.view.synop__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctpkgvDQ$>)
> and std::filesystem::path ([fs.path.type.cvt]p2
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/fs.path.type.cvt*2__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcgpJF2w8$>)
> in C++17, but no additional support was ever extended to iostreams. The
> status quo is thus that the standard requires implementations to provide
> some fragments (std::fpos, std::codecvt, and std::codecvt_byname) of
> iostream support for these types despite there being no use of these type
> aliases and specializations in the standard; implementations are not
> required to support streams of char16_t or char32_t.
>
> std::char_traits is used by both the string library (e.g.,
> std::basic_string) and iostreams. However, the string library only
> depends on some of the std::char_traits members; it does not make use of
> the int_type member type alias nor any of the member functions that
> depend on that type (eof(), not_eof(), to_char_type(), to_int_type(),
> eq_int_type()). Per LWG 2959
> <https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
> and SG16 #32
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>,
> the specified std::char_traits<char16_t> specialization has a defect; all
> char16_t values are valid code unit values, but the int_type member type
> alias is defined as uint_least16_t (the same underlying type as char16_t)
> and it is thus unable to hold a distinct value for EOF. The obvious fix is
> to use a larger type for int_type, but that would result in an ABI break.
> I recently asked the ABI review group if there are any known tricks they
> could deploy to mitigate an ABI break, but no direct solutions were
> identified; a suggestion to provide an alternative type for
> std::char_traits<char16_t> that programmers would have to explicitly use
> instead of the broken specialization was offered. That is an option, but
> since the problematic int_type member is not actually used by any
> functionality the standard requires implementors to provide, an ABI break
> in this case might have little practical consequence.
>
> When char8_t was added for C++20 via P0482R6 (char8_t: A type for UTF-8
> characters and strings)
> <https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>,
> I failed to understand the intended purpose for which std::codecvt was
> added to the standard. My impression of it at the time was that it was a
> poorly designed general transcoding facility; I failed to appreciate its
> significance as a locale facet as used by iostreams. This resulted in two
> mistakes:
>
> 1. I deprecated the following specializations (and their use as locale
> category facets):
> std::codecvt<char16_t, char, std::mbstate_t>
> std::codecvt<char32_t, char, std::mbstate_t>
> std::codecvt_byname<char16_t, char, std::mbstate_t>
> std::codecvt_byname<char32_t, char, std::mbstate_t>
> 2. I added the following specializations as required locale category
> facets (adding the specializations themselves is arguably not a mistake,
> but adding them as locale category facets is):
> std::codecvt<char16_t, char8_t, std::mbstate_t>
> std::codecvt<char32_t, char8_t, std::mbstate_t>
> std::codecvt_byname<char16_t, char8_t, std::mbstate_t>
> std::codecvt_byname<char32_t, char8_t, std::mbstate_t>
>
> Note that std::codecvt facets are only used by std::basic_filebuf which
> only ever converts to and from elements of type char; the facets that
> convert to and from char8_t are not substitutable for that purpose.
>
> P2873R0
> <https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>,
> which SG16 already approved (or, rather, did not object to) during the 2023-05-26
> SG16 meeting
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings*may-24th-2023__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcsRw-LOs$>,
> now seeks to remove the deprecated specializations. LWG 3767
> <https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> tracks addressing the incorrect addition of the char8_t specializations
> as locale facets.
>
> Arguably, P0482R6
> <https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>
> should have added the following specializations as locale facets:
>
> - std::codecvt<char8_t, char, std::mbstate_t>
> - std::codecvt_byname<char8_t, char, std::mbstate_t>
>
> The only specification for std::codecvt_byname in the standard is the
> synopsis in [locale.codecvt.byname]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt.byname__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcBM3izx8$>;
> there is no other wording present.
>
> As mentioned, the standard does not require implementations to provide
> iostream support for the charN_t types. However, implementations may do
> so as an extension. If they do, then, per [filebuf.general]p7
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/input.output*filebuf.general-7__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXciFd_Xns$>,
> specializations of std::codecvt<charN_t, char, std::mbstate_t> are
> required to be available via a call to std::use_facet() for the imbued
> locale. In which case, per the standard, the status of the necessary
> specializations are:
>
> - std::codecvt<char8_t, char, std::mbstate_t> # Not specified.
> - std::codecvt<char16_t, char, std::mbstate_t> # Deprecated.
> - std::codecvt<char32_t, char, std::mbstate_t> # Deprecated.
>
> If it is desirable to provide a better foundation for iostream support of
> the charN_t types, either for a future version of the standard, or for
> implementations that want to provide such support as an extension, we could
> undeprecate the previously deprecated specializations and add the missing
> one for char8_t. Since iostreams does not support charN_t in the standard
> today and since the char16_t and char32_t specializations have already
> been deprecated for two release cycles, perhaps it is even reasonable to
> change their behavior so that they convert to and from the locale encoding
> rather than UTF-8. This would remove the existing inconsistency with the
> corresponding char and wchar_t specializations that was part of the
> motivation for their deprecation in the first place (see the discussion of
> codecvt in the Motivation section of P0482R6
> <https://urldefense.com/v3/__https:/wg21.link/p0482r6*motivation__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcnSQF2qQ$>
> ).
>
> However, an endeavor to improve the situation for iostreams and charN_t next
> runs into SG16 #33
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>;
> std::basic_fstream does not support the UTF-8 and UTF-16 encodings for
> the "internal" side of a std::codecvt conversion because
> std::basic_filebuf requires that, per [locale.codecvt.virtuals]p4
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*virtuals-4__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcAX8Ip4E$>
> and its related footnote
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*footnote-246__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcf-0h5Wo$>,
> "internal" characters are mapped 1-N to "external" characters. This is an
> existing issue for std::basic_fstream<wchar_t> with UTF-16 data.
>
> The Microsoft and libstdc++ standard library implementations appear to
> support iostreams with charN_t types; at least on the surface. Libc++
> intentionally does not provide definitions for charN_t specializations of
> locale facets that are not required by the standard and this suffices for
> basic usage to provoke compilation errors. I have not yet investigated to
> what extent the Microsoft and libstdc++ implementations work as might be
> expected. My impression is that, where they do produce expected results, it
> is serendipity at work. See https://godbolt.org/z/6T7hebY33
> <https://urldefense.com/v3/__https:/godbolt.org/z/6T7hebY33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcECN7sc4$>
> for a bit of fun (testing on Windows requires changes to use an actual zero
> valued file since Windows doesn't provide a builtin analog for /dev/zero,
> but in that case, MSVC produces an executable that behaves as might be
> expected).
>
> I haven't looked hard, but I have not yet identified any code in the wild
> that uses iostreams with charN_t types. One would think that, if any
> project did, it would be ICU. I confirmed that ICU, despite its use of
> char16_t, makes no attempt to use it with iostreams.
>
> So where is this all going? I see three general options that can be
> pursued to resolve these various issues.
>
> 1. We can fix these issues, despite the acknowledged ABI impact, so
> that the standard no longer actively hiders support for iostreams with the
> charN_t types. Optionally, we could further explore requiring such
> support in the standard (doing so would require adding charN_t support
> to more locale facets).
> 2. We can declare that iostreams will never support the charN_t types
> in the standard and deprecate and remove the fragments of such support that
> are present. Implementations could of course provide support as an
> extension if they so desire.
> 3. We can admit things are broken, choose to do nothing about it, and
> close the related LWG issues while chanting sorry-not-sorry.
>
> The above issues are sufficiently complicated that I believe a paper is
> warranted regardless of the direction that we favor. I'm signing up to
> write that paper since I'm responsible for some of the mess. I do not
> intend to poll any directions in this meeting; rather, the focus is to
> ensure that the issues are well understood, to discuss decisions we could
> make and their potential consequences, and to generally collect information
> that will lead to a better paper.
>
> Responses provided before the meeting to identify other existing related
> issues or considerations would be appreciated. Ideal responses do not
> include the phrase "burn it all to the ground".
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
different times, so once or twice in March and October the meetings are an
hour off.
On Wed, Oct 25, 2023 at 8:32 PM Peter Brett via SG16 <sg16_at_[hidden]>
wrote:
> Slightly confused… I thought our meetings were usually at 19:30:00 UTC but
> maybe this one is not.
>
>
>
> Peter
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Tom Honermann
> via SG16
> *Sent:* 25 October 2023 15:51
> *To:* sg16_at_[hidden]; Alisdair Meredith <alisdairm_at_[hidden]>;
> Jonathan Wakely <cxx_at_[hidden]>; Charles Barto <chbarto_at_[hidden]>;
> Mark de Wever <koraq_at_[hidden]>
> *Cc:* Tom Honermann <tom_at_[hidden]>
> *Subject:* Re: [SG16] Agenda for the 2023-10-25 SG16 telecon
>
>
>
> EXTERNAL MAIL
>
> This is your friendly reminder that we are meeting today, in about 4 1/2
> hours.
>
> Tom.
>
> On 10/24/23 1:11 AM, Tom Honermann via SG16 wrote:
>
> SG16 will hold a telecon on Wednesday, October 25th, at 19:30 UTC (timezone
> conversion
> <https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20231025T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXc78d0cpg$>
> ).
>
> The agenda follows.
>
> - charN_t, char_traits, codecvt, and iostreams:
>
>
> - P2873R0: Remove Deprecated Locale Category Facets For Unicode from
> C++26
> <https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>
> - LWG 3767: codecvt<charN_t, char8_t, mbstate_t> incorrectly added
> to locale
> <https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> - LWG 2959: char_traits<char16_t>::eof is a valid UTF-16 code unit
> <https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
>
>
> - SG16 #32: std::char_traits<char16_t>::eof() requires uint_least16_t
> to be larger than 16 bits
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>
>
>
> - SG16 #33: A correct codecvt facet that works with basic_filebuf
> can't do UTF conversions
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>
>
> Hang on, this is going to be a bumpy ride.
>
> When char16_t and char32_t were added for C++11, the standard library was
> extended to support corresponding specializations of std::char_traits (
> [char.traits.general]p1
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/char.traits.general*1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcRhrE-aA$>)
> and std::basic_string ([string.classes.general]p1
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.classes*general-1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcc1XPjkE$>).
> Curiously, type aliases were added for specializations of the std::fpos (
> [iosfwd.syn]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/iosfwd.syn*lib:fpos__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctBcLacg$>)
> class template (but only in the synopsis) and support for these types was
> added for the std::codecvt ([tab:locale.category.facets]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.category.facets__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcweUEDJM$>)
> and std::codecvt_byname ([tab:locale.spec]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.spec__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcJUdH4Dw$>)
> locale facets, but not for any of the other locale facets nor for iostreams
> in general. Support for these types was added to std::basic_string_view (
> [string.view.synop]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.view.synop__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctpkgvDQ$>)
> and std::filesystem::path ([fs.path.type.cvt]p2
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/fs.path.type.cvt*2__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcgpJF2w8$>)
> in C++17, but no additional support was ever extended to iostreams. The
> status quo is thus that the standard requires implementations to provide
> some fragments (std::fpos, std::codecvt, and std::codecvt_byname) of
> iostream support for these types despite there being no use of these type
> aliases and specializations in the standard; implementations are not
> required to support streams of char16_t or char32_t.
>
> std::char_traits is used by both the string library (e.g.,
> std::basic_string) and iostreams. However, the string library only
> depends on some of the std::char_traits members; it does not make use of
> the int_type member type alias nor any of the member functions that
> depend on that type (eof(), not_eof(), to_char_type(), to_int_type(),
> eq_int_type()). Per LWG 2959
> <https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
> and SG16 #32
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>,
> the specified std::char_traits<char16_t> specialization has a defect; all
> char16_t values are valid code unit values, but the int_type member type
> alias is defined as uint_least16_t (the same underlying type as char16_t)
> and it is thus unable to hold a distinct value for EOF. The obvious fix is
> to use a larger type for int_type, but that would result in an ABI break.
> I recently asked the ABI review group if there are any known tricks they
> could deploy to mitigate an ABI break, but no direct solutions were
> identified; a suggestion to provide an alternative type for
> std::char_traits<char16_t> that programmers would have to explicitly use
> instead of the broken specialization was offered. That is an option, but
> since the problematic int_type member is not actually used by any
> functionality the standard requires implementors to provide, an ABI break
> in this case might have little practical consequence.
>
> When char8_t was added for C++20 via P0482R6 (char8_t: A type for UTF-8
> characters and strings)
> <https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>,
> I failed to understand the intended purpose for which std::codecvt was
> added to the standard. My impression of it at the time was that it was a
> poorly designed general transcoding facility; I failed to appreciate its
> significance as a locale facet as used by iostreams. This resulted in two
> mistakes:
>
> 1. I deprecated the following specializations (and their use as locale
> category facets):
> std::codecvt<char16_t, char, std::mbstate_t>
> std::codecvt<char32_t, char, std::mbstate_t>
> std::codecvt_byname<char16_t, char, std::mbstate_t>
> std::codecvt_byname<char32_t, char, std::mbstate_t>
> 2. I added the following specializations as required locale category
> facets (adding the specializations themselves is arguably not a mistake,
> but adding them as locale category facets is):
> std::codecvt<char16_t, char8_t, std::mbstate_t>
> std::codecvt<char32_t, char8_t, std::mbstate_t>
> std::codecvt_byname<char16_t, char8_t, std::mbstate_t>
> std::codecvt_byname<char32_t, char8_t, std::mbstate_t>
>
> Note that std::codecvt facets are only used by std::basic_filebuf which
> only ever converts to and from elements of type char; the facets that
> convert to and from char8_t are not substitutable for that purpose.
>
> P2873R0
> <https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>,
> which SG16 already approved (or, rather, did not object to) during the 2023-05-26
> SG16 meeting
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings*may-24th-2023__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcsRw-LOs$>,
> now seeks to remove the deprecated specializations. LWG 3767
> <https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> tracks addressing the incorrect addition of the char8_t specializations
> as locale facets.
>
> Arguably, P0482R6
> <https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>
> should have added the following specializations as locale facets:
>
> - std::codecvt<char8_t, char, std::mbstate_t>
> - std::codecvt_byname<char8_t, char, std::mbstate_t>
>
> The only specification for std::codecvt_byname in the standard is the
> synopsis in [locale.codecvt.byname]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt.byname__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcBM3izx8$>;
> there is no other wording present.
>
> As mentioned, the standard does not require implementations to provide
> iostream support for the charN_t types. However, implementations may do
> so as an extension. If they do, then, per [filebuf.general]p7
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/input.output*filebuf.general-7__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXciFd_Xns$>,
> specializations of std::codecvt<charN_t, char, std::mbstate_t> are
> required to be available via a call to std::use_facet() for the imbued
> locale. In which case, per the standard, the status of the necessary
> specializations are:
>
> - std::codecvt<char8_t, char, std::mbstate_t> # Not specified.
> - std::codecvt<char16_t, char, std::mbstate_t> # Deprecated.
> - std::codecvt<char32_t, char, std::mbstate_t> # Deprecated.
>
> If it is desirable to provide a better foundation for iostream support of
> the charN_t types, either for a future version of the standard, or for
> implementations that want to provide such support as an extension, we could
> undeprecate the previously deprecated specializations and add the missing
> one for char8_t. Since iostreams does not support charN_t in the standard
> today and since the char16_t and char32_t specializations have already
> been deprecated for two release cycles, perhaps it is even reasonable to
> change their behavior so that they convert to and from the locale encoding
> rather than UTF-8. This would remove the existing inconsistency with the
> corresponding char and wchar_t specializations that was part of the
> motivation for their deprecation in the first place (see the discussion of
> codecvt in the Motivation section of P0482R6
> <https://urldefense.com/v3/__https:/wg21.link/p0482r6*motivation__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcnSQF2qQ$>
> ).
>
> However, an endeavor to improve the situation for iostreams and charN_t next
> runs into SG16 #33
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>;
> std::basic_fstream does not support the UTF-8 and UTF-16 encodings for
> the "internal" side of a std::codecvt conversion because
> std::basic_filebuf requires that, per [locale.codecvt.virtuals]p4
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*virtuals-4__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcAX8Ip4E$>
> and its related footnote
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*footnote-246__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcf-0h5Wo$>,
> "internal" characters are mapped 1-N to "external" characters. This is an
> existing issue for std::basic_fstream<wchar_t> with UTF-16 data.
>
> The Microsoft and libstdc++ standard library implementations appear to
> support iostreams with charN_t types; at least on the surface. Libc++
> intentionally does not provide definitions for charN_t specializations of
> locale facets that are not required by the standard and this suffices for
> basic usage to provoke compilation errors. I have not yet investigated to
> what extent the Microsoft and libstdc++ implementations work as might be
> expected. My impression is that, where they do produce expected results, it
> is serendipity at work. See https://godbolt.org/z/6T7hebY33
> <https://urldefense.com/v3/__https:/godbolt.org/z/6T7hebY33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcECN7sc4$>
> for a bit of fun (testing on Windows requires changes to use an actual zero
> valued file since Windows doesn't provide a builtin analog for /dev/zero,
> but in that case, MSVC produces an executable that behaves as might be
> expected).
>
> I haven't looked hard, but I have not yet identified any code in the wild
> that uses iostreams with charN_t types. One would think that, if any
> project did, it would be ICU. I confirmed that ICU, despite its use of
> char16_t, makes no attempt to use it with iostreams.
>
> So where is this all going? I see three general options that can be
> pursued to resolve these various issues.
>
> 1. We can fix these issues, despite the acknowledged ABI impact, so
> that the standard no longer actively hiders support for iostreams with the
> charN_t types. Optionally, we could further explore requiring such
> support in the standard (doing so would require adding charN_t support
> to more locale facets).
> 2. We can declare that iostreams will never support the charN_t types
> in the standard and deprecate and remove the fragments of such support that
> are present. Implementations could of course provide support as an
> extension if they so desire.
> 3. We can admit things are broken, choose to do nothing about it, and
> close the related LWG issues while chanting sorry-not-sorry.
>
> The above issues are sufficiently complicated that I believe a paper is
> warranted regardless of the direction that we favor. I'm signing up to
> write that paper since I'm responsible for some of the mess. I do not
> intend to poll any directions in this meeting; rather, the focus is to
> ensure that the issues are well understood, to discuss decisions we could
> make and their potential consequences, and to generally collect information
> that will lead to a better paper.
>
> Responses provided before the meeting to identify other existing related
> issues or considerations would be appreciated. Ideal responses do not
> include the phrase "burn it all to the ground".
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2023-10-25 18:34:30