C++ Logo

sg16

Advanced search

Re: Agenda for the 2023-10-25 SG16 telecon

From: Peter Bindels <peterbindels_at_[hidden]>
Date: Wed, 25 Oct 2023 20:54:27 +0200
Right, it's next week. That had me confused.

On Wed, Oct 25, 2023 at 8:52 PM Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

>
> On 25/10/2023 20.32, Peter Brett via SG16 wrote:
> > Slightly confused… I thought our meetings were usually at 19:30:00 UTC
> but maybe this one is not.
>
> This one is, too. And there is no US/Europe summer time confusion right
> now.
>
> Jens
>
>
> >
> >
> > Peter
> >
> >
> >
> > *From:*SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Tom
> Honermann via SG16
> > *Sent:* 25 October 2023 15:51
> > *To:* sg16_at_[hidden]; Alisdair Meredith <alisdairm_at_[hidden]>;
> Jonathan Wakely <cxx_at_[hidden]>; Charles Barto <chbarto_at_[hidden]>;
> Mark de Wever <koraq_at_[hidden]>
> > *Cc:* Tom Honermann <tom_at_[hidden]>
> > *Subject:* Re: [SG16] Agenda for the 2023-10-25 SG16 telecon
> >
> >
> >
> > EXTERNAL MAIL
> >
> > This is your friendly reminder that we are meeting today, in about 4 1/2
> hours.
> >
> > Tom.
> >
> > On 10/24/23 1:11 AM, Tom Honermann via SG16 wrote:
> >
> > SG16 will hold a telecon on Wednesday, October 25th, at 19:30 UTC
> (timezone conversion <
> https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20231025T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXc78d0cpg$
> >).
> >
> > The agenda follows.
> >
> > * charN_t, char_traits, codecvt, and iostreams:
> >
> > o P2873R0: Remove Deprecated Locale Category Facets For
> Unicode from C++26 <
> https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$
> >
> > o LWG 3767: codecvt<charN_t, char8_t, mbstate_t> incorrectly
> added to locale <
> https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$
> >
> > o LWG 2959: char_traits<char16_t>::eof is a valid UTF-16 code
> unit <
> https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$
> >
> >
> > + SG16 #32: std::char_traits<char16_t>::eof() requires
> uint_least16_t to be larger than 16 bits <
> https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$
> >
> >
> > o SG16 #33: A correct codecvt facet that works with
> basic_filebuf can't do UTF conversions <
> https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$
> >
> >
> > Hang on, this is going to be a bumpy ride.
> >
> > When char16_t and char32_t were added for C++11, the standard
> library was extended to support corresponding specializations of
> std::char_traits ([char.traits.general]p1 <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/char.traits.general*1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcRhrE-aA$>)
> and std::basic_string ([string.classes.general]p1 <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.classes*general-1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcc1XPjkE$>).
> Curiously, type aliases were added for specializations of the std::fpos
> ([iosfwd.syn] <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/iosfwd.syn*lib:fpos__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctBcLacg$>)
> class template (but only in the synopsis) and support for these types was
> added for the std::codecvt ([tab:locale.category.facets]
> > <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.category.facets__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcweUEDJM$>)
> and std::codecvt_byname ([tab:locale.spec] <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.spec__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcJUdH4Dw$>)
> locale facets, but not for any of the other locale facets nor for iostreams
> in general. Support for these types was added to std::basic_string_view
> ([string.view.synop] <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.view.synop__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctpkgvDQ$>)
> and std::filesystem::path ([fs.path.type.cvt]p2
> > <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/fs.path.type.cvt*2__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcgpJF2w8$>)
> in C++17, but no additional support was ever extended to iostreams. The
> status quo is thus that the standard requires implementations to provide
> some fragments (std::fpos, std::codecvt, and std::codecvt_byname) of
> iostream support for these types despite there being no use of these type
> aliases and specializations in the standard; implementations are not
> required to support streams of char16_t or char32_t.
> >
> > std::char_traits is used by both the string library (e.g.,
> std::basic_string) and iostreams. However, the string library only depends
> on some of the std::char_traits members; it does not make use of the
> int_type member type alias nor any of the member functions that depend on
> that type (eof(), ​not_eof(), ​to_char_type(), ​to_int_type(),
> ​eq_int_type()). Per LWG 2959 <
> https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
> and SG16 #32 <
> https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>,
> the specified std::char_traits<char16_t> specialization has a defect; all
> char16_t values are valid code unit values, but the int_type member type
> alias is defined as uint_least16_t (the same underlying type as char16_t)
> and it is thus unable to hold a distinct value for
> > EOF. The obvious fix is to use a larger type for int_type, but that
> would result in an ABI break. I recently asked the ABI review group if
> there are any known tricks they could deploy to mitigate an ABI break, but
> no direct solutions were identified; a suggestion to provide an alternative
> type for std::char_traits<char16_t> that programmers would have to
> explicitly use instead of the broken specialization was offered. That is an
> option, but since the problematic int_type member is not actually used by
> any functionality the standard requires implementors to provide, an ABI
> break in this case might have little practical consequence.
> >
> > When char8_t was added for C++20 via P0482R6 (char8_t: A type for
> UTF-8 characters and strings) <
> https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>,
> I failed to understand the intended purpose for which std::codecvt was
> added to the standard. My impression of it at the time was that it was a
> poorly designed general transcoding facility; I failed to appreciate its
> significance as a locale facet as used by iostreams. This resulted in two
> mistakes:
> >
> > 1. I deprecated the following specializations (and their use as
> locale category facets):
> > std::codecvt<char16_t, char, std::mbstate_t>
> > std::codecvt<char32_t, char, std::mbstate_t>
> > std::codecvt_byname<char16_t, char, std::mbstate_t>
> > std::codecvt_byname<char32_t, char, std::mbstate_t>
> > 2. I added the following specializations as required locale
> category facets (adding the specializations themselves is arguably not a
> mistake, but adding them as locale category facets is):
> > std::codecvt<char16_t, char8_t, std::mbstate_t>
> > std::codecvt<char32_t, char8_t, std::mbstate_t>
> > std::codecvt_byname<char16_t, char8_t, std::mbstate_t>
> > std::codecvt_byname<char32_t, char8_t, std::mbstate_t>
> >
> > Note that std::codecvt facets are only used by std::basic_filebuf
> which only ever converts to and from elements of type char; the facets that
> convert to and from char8_t are not substitutable for that purpose.
> >
> > P2873R0 <
> https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>,
> which SG16 already approved (or, rather, did not object to) during the
> 2023-05-26 SG16 meeting <
> https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings*may-24th-2023__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcsRw-LOs$>,
> now seeks to remove the deprecated specializations. LWG 3767 <
> https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> tracks addressing the incorrect addition of the char8_t specializations as
> locale facets.
> >
> > Arguably, P0482R6 <
> https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>
> should have added the following specializations as locale facets:
> >
> > * std::codecvt<char8_t, char, std::mbstate_t>
> > * std::codecvt_byname<char8_t, char, std::mbstate_t>
> >
> > The only specification for std::codecvt_byname in the standard is
> the synopsis in [locale.codecvt.byname] <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt.byname__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcBM3izx8$>;
> there is no other wording present.
> >
> > As mentioned, the standard does not require implementations to
> provide iostream support for the charN_t types. However, implementations
> may do so as an extension. If they do, then, per [filebuf.general]p7 <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/input.output*filebuf.general-7__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXciFd_Xns$>,
> specializations of std::codecvt<charN_t, char, std::mbstate_t> are required
> to be available via a call to std::use_facet() for the imbued locale. In
> which case, per the standard, the status of the necessary specializations
> are:
> >
> > * std::codecvt<char8_t, char, std::mbstate_t> # Not specified.
> > * std::codecvt<char16_t, char, std::mbstate_t> # Deprecated.
> > * std::codecvt<char32_t, char, std::mbstate_t> # Deprecated.
> >
> > If it is desirable to provide a better foundation for iostream
> support of the charN_t types, either for a future version of the standard,
> or for implementations that want to provide such support as an extension,
> we could undeprecate the previously deprecated specializations and add the
> missing one for char8_t. Since iostreams does not support charN_t in the
> standard today and since the char16_t and char32_t specializations have
> already been deprecated for two release cycles, perhaps it is even
> reasonable to change their behavior so that they convert to and from the
> locale encoding rather than UTF-8. This would remove the existing
> inconsistency with the corresponding char and wchar_t specializations that
> was part of the motivation for their deprecation in the first place (see
> the discussion of codecvt in the Motivation section of P0482R6
> > <
> https://urldefense.com/v3/__https:/wg21.link/p0482r6*motivation__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcnSQF2qQ$
> >).
> >
> > However, an endeavor to improve the situation for iostreams and
> charN_t next runs into SG16 #33 <
> https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>;
> std::basic_fstream does not support the UTF-8 and UTF-16 encodings for the
> "internal" side of a std::codecvt conversion because std::basic_filebuf
> requires that, per [locale.codecvt.virtuals]p4 <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*virtuals-4__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcAX8Ip4E$>
> and its related footnote <
> https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*footnote-246__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcf-0h5Wo$>,
> "internal" characters are mapped 1-N to "external" characters. This is an
> existing issue for std::basic_fstream<wchar_t> with
> > UTF-16 data.
> >
> > The Microsoft and libstdc++ standard library implementations appear
> to support iostreams with charN_t types; at least on the surface. Libc++
> intentionally does not provide definitions for charN_t specializations of
> locale facets that are not required by the standard and this suffices for
> basic usage to provoke compilation errors. I have not yet investigated to
> what extent the Microsoft and libstdc++ implementations work as might be
> expected. My impression is that, where they do produce expected results, it
> is serendipity at work. See https://godbolt.org/z/6T7hebY33 <
> https://urldefense.com/v3/__https:/godbolt.org/z/6T7hebY33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcECN7sc4$>
> for a bit of fun (testing on Windows requires changes to use an actual zero
> valued file since Windows doesn't provide a builtin analog for /dev/zero,
> but in that case, MSVC produces an executable that behaves as might be
> expected).
> >
> > I haven't looked hard, but I have not yet identified any code in the
> wild that uses iostreams with charN_t types. One would think that, if any
> project did, it would be ICU. I confirmed that ICU, despite its use of
> char16_t, makes no attempt to use it with iostreams.
> >
> > So where is this all going? I see three general options that can be
> pursued to resolve these various issues.
> >
> > 1. We can fix these issues, despite the acknowledged ABI impact, so
> that the standard no longer actively hiders support for iostreams with the
> charN_t types. Optionally, we could further explore requiring such support
> in the standard (doing so would require adding charN_t support to more
> locale facets).
> > 2. We can declare that iostreams will never support the charN_t
> types in the standard and deprecate and remove the fragments of such
> support that are present. Implementations could of course provide support
> as an extension if they so desire.
> > 3. We can admit things are broken, choose to do nothing about it,
> and close the related LWG issues while chanting sorry-not-sorry.
> >
> > The above issues are sufficiently complicated that I believe a paper
> is warranted regardless of the direction that we favor. I'm signing up to
> write that paper since I'm responsible for some of the mess. I do not
> intend to poll any directions in this meeting; rather, the focus is to
> ensure that the issues are well understood, to discuss decisions we could
> make and their potential consequences, and to generally collect information
> that will lead to a better paper.
> >
> > Responses provided before the meeting to identify other existing
> related issues or considerations would be appreciated. Ideal responses do
> not include the phrase "burn it all to the ground".
> >
> > Tom.
> >
> >
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2023-10-25 18:54:39