C++ Logo

sg16

Advanced search

Re: Agenda for the 2023-10-25 SG16 telecon

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 25 Oct 2023 14:59:05 -0400
No problem. Looking forward to seeing everyone one in ~30 minutes :)

Tom.

On 10/25/23 2:56 PM, Peter Brett via SG16 wrote:
>
> I worked out why I was confused. British Summer Time ends next Sunday.
> However, last Sunday I adjusted the time on my thermostat by 1 hour.
> It turns out that I was adjusting it **from** GMT **to** Summer Time,
> not the other way round.
>
> Sorry for wasting everybody’s time.
>
> Peter
>
> *From:*SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Peter
> Bindels via SG16
> *Sent:* 25 October 2023 19:54
> *To:* sg16_at_[hidden]
> *Cc:* Peter Bindels <peterbindels_at_[hidden]>
> *Subject:* Re: [SG16] Agenda for the 2023-10-25 SG16 telecon
>
> EXTERNAL MAIL
>
> Right, it's next week. That had me confused.
>
> On Wed, Oct 25, 2023 at 8:52 PM Jens Maurer via SG16
> <sg16_at_[hidden]> wrote:
>
>
> On 25/10/2023 20.32, Peter Brett via SG16 wrote:
> > Slightly confused… I thought our meetings were usually at
> 19:30:00 UTC but maybe this one is not.
>
> This one is, too. And there is no US/Europe summer time confusion
> right now.
>
> Jens
>
>
> >
> >
> > Peter
> >
> >
> >
> > *From:*SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Tom
> Honermann via SG16
> > *Sent:* 25 October 2023 15:51
> > *To:* sg16_at_[hidden]; Alisdair Meredith
> <alisdairm_at_[hidden]>; Jonathan Wakely <cxx_at_[hidden]>; Charles
> Barto <chbarto_at_[hidden]>; Mark de Wever <koraq_at_[hidden]>
> > *Cc:* Tom Honermann <tom_at_[hidden]>
> > *Subject:* Re: [SG16] Agenda for the 2023-10-25 SG16 telecon
> >
> >
> >
> > EXTERNAL MAIL
> >
> > This is your friendly reminder that we are meeting today, in
> about 4 1/2 hours.
> >
> > Tom.
> >
> > On 10/24/23 1:11 AM, Tom Honermann via SG16 wrote:
> >
> > SG16 will hold a telecon on Wednesday, October 25th, at
> 19:30 UTC (timezone conversion
> <https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20231025T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXc78d0cpg$
> <https://urldefense.com/v3/__https:/www.timeanddate.com/worldclock/converter.html?iso=20231025T193000&p1=1440&p2=tz_pt&p3=tz_mt&p4=tz_ct&p5=tz_et&p6=tz_cest__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXc78d0cpg$>>).
> >
> > The agenda follows.
> >
> > * charN_t, char_traits, codecvt, and iostreams:
> >
> > o P2873R0: Remove Deprecated Locale Category Facets
> For Unicode from C++26
> <https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>
> > o LWG 3767: codecvt<charN_t, char8_t, mbstate_t>
> incorrectly added to locale
> <https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> > o LWG 2959: char_traits<char16_t>::eof is a valid
> UTF-16 code unit
> <https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
> >
> > + SG16 #32: std::char_traits<char16_t>::eof()
> requires uint_least16_t to be larger than 16 bits
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>
> >
> > o SG16 #33: A correct codecvt facet that works with
> basic_filebuf can't do UTF conversions
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>
> >
> > Hang on, this is going to be a bumpy ride.
> >
> > When char16_t and char32_t were added for C++11, the
> standard library was extended to support corresponding
> specializations of std::char_traits ([char.traits.general]p1
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/char.traits.general*1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcRhrE-aA$>)
> and std::basic_string ([string.classes.general]p1
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.classes*general-1__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcc1XPjkE$>).
> Curiously, type aliases were added for specializations of the
> std::fpos ([iosfwd.syn]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/iosfwd.syn*lib:fpos__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctBcLacg$>)
> class template (but only in the synopsis) and support for these
> types was added for the std::codecvt ([tab:locale.category.facets]
> >
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.category.facets__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcweUEDJM$>)
> and std::codecvt_byname ([tab:locale.spec]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.category*tab:locale.spec__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcJUdH4Dw$>)
> locale facets, but not for any of the other locale facets nor for
> iostreams in general. Support for these types was added to
> std::basic_string_view ([string.view.synop]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/string.view.synop__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXctpkgvDQ$>)
> and std::filesystem::path ([fs.path.type.cvt]p2
> >
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/fs.path.type.cvt*2__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcgpJF2w8$>)
> in C++17, but no additional support was ever extended to
> iostreams. The status quo is thus that the standard requires
> implementations to provide some fragments (std::fpos,
> std::codecvt, and std::codecvt_byname) of iostream support for
> these types despite there being no use of these type aliases and
> specializations in the standard; implementations are not required
> to support streams of char16_t or char32_t.
> >
> > std::char_traits is used by both the string library (e.g.,
> std::basic_string) and iostreams. However, the string library only
> depends on some of the std::char_traits members; it does not make
> use of the int_type member type alias nor any of the member
> functions that depend on that type (eof(), ​not_eof(),
> ​to_char_type(), ​to_int_type(), ​eq_int_type()). Per LWG 2959
> <https://urldefense.com/v3/__https:/wg21.link/lwg2959__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcuUszTaI$>
> and SG16 #32
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/32__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcmm60T-E$>,
> the specified std::char_traits<char16_t> specialization has a
> defect; all char16_t values are valid code unit values, but the
> int_type member type alias is defined as uint_least16_t (the same
> underlying type as char16_t) and it is thus unable to hold a
> distinct value for
> > EOF. The obvious fix is to use a larger type for int_type,
> but that would result in an ABI break. I recently asked the ABI
> review group if there are any known tricks they could deploy to
> mitigate an ABI break, but no direct solutions were identified; a
> suggestion to provide an alternative type for
> std::char_traits<char16_t> that programmers would have to
> explicitly use instead of the broken specialization was offered.
> That is an option, but since the problematic int_type member is
> not actually used by any functionality the standard requires
> implementors to provide, an ABI break in this case might have
> little practical consequence.
> >
> > When char8_t was added for C++20 via P0482R6 (char8_t: A
> type for UTF-8 characters and strings)
> <https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>,
> I failed to understand the intended purpose for which std::codecvt
> was added to the standard. My impression of it at the time was
> that it was a poorly designed general transcoding facility; I
> failed to appreciate its significance as a locale facet as used by
> iostreams. This resulted in two mistakes:
> >
> > 1. I deprecated the following specializations (and their
> use as locale category facets):
> > std::codecvt<char16_t, char, std::mbstate_t>
> > std::codecvt<char32_t, char, std::mbstate_t>
> > std::codecvt_byname<char16_t, char, std::mbstate_t>
> > std::codecvt_byname<char32_t, char, std::mbstate_t>
> > 2. I added the following specializations as required locale
> category facets (adding the specializations themselves is arguably
> not a mistake, but adding them as locale category facets is):
> > std::codecvt<char16_t, char8_t, std::mbstate_t>
> > std::codecvt<char32_t, char8_t, std::mbstate_t>
> > std::codecvt_byname<char16_t, char8_t, std::mbstate_t>
> > std::codecvt_byname<char32_t, char8_t, std::mbstate_t>
> >
> > Note that std::codecvt facets are only used by
> std::basic_filebuf which only ever converts to and from elements
> of type char; the facets that convert to and from char8_t are not
> substitutable for that purpose.
> >
> > P2873R0
> <https://urldefense.com/v3/__https:/wg21.link/p2873r0__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcIsUFgw0$>,
> which SG16 already approved (or, rather, did not object to) during
> the 2023-05-26 SG16 meeting
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16-meetings*may-24th-2023__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcsRw-LOs$>,
> now seeks to remove the deprecated specializations. LWG 3767
> <https://urldefense.com/v3/__https:/wg21.link/lwg3767__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcEtMazYQ$>
> tracks addressing the incorrect addition of the char8_t
> specializations as locale facets.
> >
> > Arguably, P0482R6
> <https://urldefense.com/v3/__https:/wg21.link/p0482__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcVwej7AM$>
> should have added the following specializations as locale facets:
> >
> > * std::codecvt<char8_t, char, std::mbstate_t>
> > * std::codecvt_byname<char8_t, char, std::mbstate_t>
> >
> > The only specification for std::codecvt_byname in the
> standard is the synopsis in [locale.codecvt.byname]
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt.byname__;Kys!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcBM3izx8$>;
> there is no other wording present.
> >
> > As mentioned, the standard does not require implementations
> to provide iostream support for the charN_t types. However,
> implementations may do so as an extension. If they do, then, per
> [filebuf.general]p7
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/input.output*filebuf.general-7__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXciFd_Xns$>,
> specializations of std::codecvt<charN_t, char, std::mbstate_t> are
> required to be available via a call to std::use_facet() for the
> imbued locale. In which case, per the standard, the status of the
> necessary specializations are:
> >
> > * std::codecvt<char8_t, char, std::mbstate_t> # Not
> specified.
> > * std::codecvt<char16_t, char, std::mbstate_t> # Deprecated.
> > * std::codecvt<char32_t, char, std::mbstate_t> # Deprecated.
> >
> > If it is desirable to provide a better foundation for
> iostream support of the charN_t types, either for a future version
> of the standard, or for implementations that want to provide such
> support as an extension, we could undeprecate the previously
> deprecated specializations and add the missing one for char8_t.
> Since iostreams does not support charN_t in the standard today and
> since the char16_t and char32_t specializations have already been
> deprecated for two release cycles, perhaps it is even reasonable
> to change their behavior so that they convert to and from the
> locale encoding rather than UTF-8. This would remove the existing
> inconsistency with the corresponding char and wchar_t
> specializations that was part of the motivation for their
> deprecation in the first place (see the discussion of codecvt in
> the Motivation section of P0482R6
> >
> <https://urldefense.com/v3/__https:/wg21.link/p0482r6*motivation__;Iw!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcnSQF2qQ$>).
> >
> > However, an endeavor to improve the situation for iostreams
> and charN_t next runs into SG16 #33
> <https://urldefense.com/v3/__https:/github.com/sg16-unicode/sg16/issues/33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcHyyOy8w$>;
> std::basic_fstream does not support the UTF-8 and UTF-16 encodings
> for the "internal" side of a std::codecvt conversion because
> std::basic_filebuf requires that, per [locale.codecvt.virtuals]p4
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*virtuals-4__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcAX8Ip4E$>
> and its related footnote
> <https://urldefense.com/v3/__http:/eel.is/c**Adraft/locale.codecvt*footnote-246__;Kysj!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcf-0h5Wo$>,
> "internal" characters are mapped 1-N to "external" characters.
> This is an existing issue for std::basic_fstream<wchar_t> with
> > UTF-16 data.
> >
> > The Microsoft and libstdc++ standard library implementations
> appear to support iostreams with charN_t types; at least on the
> surface. Libc++ intentionally does not provide definitions for
> charN_t specializations of locale facets that are not required by
> the standard and this suffices for basic usage to provoke
> compilation errors. I have not yet investigated to what extent the
> Microsoft and libstdc++ implementations work as might be expected.
> My impression is that, where they do produce expected results, it
> is serendipity at work. See https://godbolt.org/z/6T7hebY33
> <https://urldefense.com/v3/__https:/godbolt.org/z/6T7hebY33__;!!EHscmS1ygiU1lA!EIqC6GXJuhdzvpuGjB2bGYYZbpHPoNVgtDt7TVrRp59e22riWbDWJ4Vi8d-jRLS1Q6M4mlY4paJUPd0$>
> <https://urldefense.com/v3/__https:/godbolt.org/z/6T7hebY33__;!!EHscmS1ygiU1lA!GbI6zf_V7qUem8pWpctE8-woVOtNAr350romjds_uqTac_go-C5sfVwzxT4-6UTYcnGi1wXcECN7sc4$>
> for a bit of fun (testing on Windows requires changes to use an
> actual zero valued file since Windows doesn't provide a builtin
> analog for /dev/zero, but in that case, MSVC produces an
> executable that behaves as might be expected).
> >
> > I haven't looked hard, but I have not yet identified any
> code in the wild that uses iostreams with charN_t types. One would
> think that, if any project did, it would be ICU. I confirmed that
> ICU, despite its use of char16_t, makes no attempt to use it with
> iostreams.
> >
> > So where is this all going? I see three general options that
> can be pursued to resolve these various issues.
> >
> > 1. We can fix these issues, despite the acknowledged ABI
> impact, so that the standard no longer actively hiders support for
> iostreams with the charN_t types. Optionally, we could further
> explore requiring such support in the standard (doing so would
> require adding charN_t support to more locale facets).
> > 2. We can declare that iostreams will never support the
> charN_t types in the standard and deprecate and remove the
> fragments of such support that are present. Implementations could
> of course provide support as an extension if they so desire.
> > 3. We can admit things are broken, choose to do nothing
> about it, and close the related LWG issues while chanting
> sorry-not-sorry.
> >
> > The above issues are sufficiently complicated that I believe
> a paper is warranted regardless of the direction that we favor.
> I'm signing up to write that paper since I'm responsible for some
> of the mess. I do not intend to poll any directions in this
> meeting; rather, the focus is to ensure that the issues are well
> understood, to discuss decisions we could make and their potential
> consequences, and to generally collect information that will lead
> to a better paper.
> >
> > Responses provided before the meeting to identify other
> existing related issues or considerations would be appreciated.
> Ideal responses do not include the phrase "burn it all to the ground".
> >
> > Tom.
> >
> >
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://urldefense.com/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/sg16__;!!EHscmS1ygiU1lA!EIqC6GXJuhdzvpuGjB2bGYYZbpHPoNVgtDt7TVrRp59e22riWbDWJ4Vi8d-jRLS1Q6M4mlY4Gaj8BUc$>
>
>

Received on 2023-10-25 18:59:07