Date: Wed, 24 Jun 2026 20:43:54 +0200
On Wed, Jun 24, 2026 at 8:33 PM Tiago Freire <tmiguelf_at_[hidden]> wrote:
> The problem is not UTF-16, the problem is the concept of
> char_traits<>::eof itself.
>
> Algorithms that assume that eof exists are broken, and we have since made
> better libraries that don’t use eof at all.
>
>
>
> We don’t need to fix the lack of char_traits<char16_t>::eof for UTF-16 it
> is not broken. We need to remove char_traits entirely, because that is the
> part that is broken.
>
sure, but removing char_traits, if it were possible, would take decades.
We could decide to deprecate char_traits<T>::eof, but the whole of iostream
is built on top of it. So as appealing as that would be, it's not a
realistic option.
We can decide to do _nothing_ on the basis that it has been broken for 30+
years but... if a targeted fix can help someone with limited disruption, we
might as well try that.
>
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Corentin
> Jabot via SG16
> *Sent:* Wednesday, June 24, 2026 20:15
> *To:* Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> *Cc:* Corentin Jabot <corentinjabot_at_[hidden]>; sg16_at_[hidden];
> dascandy_at_[hidden]; Steve Downey <sg16_at_[hidden]>
> *Subject:* Re: [isocpp-sg16] SG16 Meeting Tomorrow at 3:30 NYC Time
>
>
>
>
>
>
>
> On Wed, Jun 24, 2026 at 5:37 PM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
> On Wed, Jun 24, 2026 at 9:50 AM Corentin Jabot via SG16 <
> sg16_at_[hidden]> wrote:
>
>
>
>
>
> On Wed, Jun 24, 2026 at 3:17 AM Steve Downey via SG16 <
> sg16_at_[hidden]> wrote:
>
>
>
> wg21: TODO [[
> https://github.com/cplusplus/papers/issues/1572][LWG2959]]
> char_traits<char16_t>::eof is a valid UTF-16 code unit
> :sg16:
>
>
>
> We should ask Louis/STL if they are happy to change int_type, or,
> alternatively explain LEWG the only solution is to change int_type, and
> see if they care.
>
>
>
> Only for char16_t? Go beyond that, and I think we'll be back in the
> "committee broke std::string ABI" land.
>
>
>
> Right, both UTF-8 and UTF-32 have invalid values - for example 0xFF and
> 0xFFFFFFFF can be used as sentinel - so they don;t suffer the issue
> described here - but UTF-16 has no such luck, the whole space is used,
>
> either for codepoints, or surrogates.
>
>
>
>
>
> -- HT
>
>
> The problem is not UTF-16, the problem is the concept of
> char_traits<>::eof itself.
>
> Algorithms that assume that eof exists are broken, and we have since made
> better libraries that don’t use eof at all.
>
>
>
> We don’t need to fix the lack of char_traits<char16_t>::eof for UTF-16 it
> is not broken. We need to remove char_traits entirely, because that is the
> part that is broken.
>
sure, but removing char_traits, if it were possible, would take decades.
We could decide to deprecate char_traits<T>::eof, but the whole of iostream
is built on top of it. So as appealing as that would be, it's not a
realistic option.
We can decide to do _nothing_ on the basis that it has been broken for 30+
years but... if a targeted fix can help someone with limited disruption, we
might as well try that.
>
>
>
>
> *From:* SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Corentin
> Jabot via SG16
> *Sent:* Wednesday, June 24, 2026 20:15
> *To:* Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> *Cc:* Corentin Jabot <corentinjabot_at_[hidden]>; sg16_at_[hidden];
> dascandy_at_[hidden]; Steve Downey <sg16_at_[hidden]>
> *Subject:* Re: [isocpp-sg16] SG16 Meeting Tomorrow at 3:30 NYC Time
>
>
>
>
>
>
>
> On Wed, Jun 24, 2026 at 5:37 PM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
> On Wed, Jun 24, 2026 at 9:50 AM Corentin Jabot via SG16 <
> sg16_at_[hidden]> wrote:
>
>
>
>
>
> On Wed, Jun 24, 2026 at 3:17 AM Steve Downey via SG16 <
> sg16_at_[hidden]> wrote:
>
>
>
> wg21: TODO [[
> https://github.com/cplusplus/papers/issues/1572][LWG2959]]
> char_traits<char16_t>::eof is a valid UTF-16 code unit
> :sg16:
>
>
>
> We should ask Louis/STL if they are happy to change int_type, or,
> alternatively explain LEWG the only solution is to change int_type, and
> see if they care.
>
>
>
> Only for char16_t? Go beyond that, and I think we'll be back in the
> "committee broke std::string ABI" land.
>
>
>
> Right, both UTF-8 and UTF-32 have invalid values - for example 0xFF and
> 0xFFFFFFFF can be used as sentinel - so they don;t suffer the issue
> described here - but UTF-16 has no such luck, the whole space is used,
>
> either for codepoints, or surrogates.
>
>
>
>
>
> -- HT
>
>
Received on 2026-06-24 18:44:16
