Date: Wed, 24 Jun 2026 18:33:48 +0000
The problem is not UTF-16, the problem is the concept of char_traits<>::eof itself.
Algorithms that assume that eof exists are broken, and we have since made better libraries that don’t use eof at all.
We don’t need to fix the lack of char_traits<char16_t>::eof for UTF-16 it is not broken. We need to remove char_traits entirely, because that is the part that is broken.
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Corentin Jabot via SG16
Sent: Wednesday, June 24, 2026 20:15
To: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Cc: Corentin Jabot <corentinjabot_at_[hidden]>; sg16_at_[hidden]; dascandy_at_[hidden]; Steve Downey <sg16_at_[hidden]>
Subject: Re: [isocpp-sg16] SG16 Meeting Tomorrow at 3:30 NYC Time
On Wed, Jun 24, 2026 at 5:37 PM Hubert Tong <hubert.reinterpretcast_at_[hidden]<mailto:hubert.reinterpretcast_at_[hidden]>> wrote:
On Wed, Jun 24, 2026 at 9:50 AM Corentin Jabot via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
On Wed, Jun 24, 2026 at 3:17 AM Steve Downey via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
wg21: TODO [[https://github.com/cplusplus/papers/issues/1572][LWG2959<https://github.com/cplusplus/papers/issues/1572%5D%5BLWG2959>]] char_traits<char16_t>::eof is a valid UTF-16 code unit :sg16:
We should ask Louis/STL if they are happy to change int_type, or, alternatively explain LEWG the only solution is to change int_type, and see if they care.
Only for char16_t? Go beyond that, and I think we'll be back in the "committee broke std::string ABI" land.
Right, both UTF-8 and UTF-32 have invalid values - for example 0xFF and 0xFFFFFFFF can be used as sentinel - so they don;t suffer the issue described here - but UTF-16 has no such luck, the whole space is used,
either for codepoints, or surrogates.
-- HT
Algorithms that assume that eof exists are broken, and we have since made better libraries that don’t use eof at all.
We don’t need to fix the lack of char_traits<char16_t>::eof for UTF-16 it is not broken. We need to remove char_traits entirely, because that is the part that is broken.
From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Corentin Jabot via SG16
Sent: Wednesday, June 24, 2026 20:15
To: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Cc: Corentin Jabot <corentinjabot_at_[hidden]>; sg16_at_[hidden]; dascandy_at_[hidden]; Steve Downey <sg16_at_[hidden]>
Subject: Re: [isocpp-sg16] SG16 Meeting Tomorrow at 3:30 NYC Time
On Wed, Jun 24, 2026 at 5:37 PM Hubert Tong <hubert.reinterpretcast_at_[hidden]<mailto:hubert.reinterpretcast_at_[hidden]>> wrote:
On Wed, Jun 24, 2026 at 9:50 AM Corentin Jabot via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
On Wed, Jun 24, 2026 at 3:17 AM Steve Downey via SG16 <sg16_at_[hidden]<mailto:sg16_at_[hidden]>> wrote:
wg21: TODO [[https://github.com/cplusplus/papers/issues/1572][LWG2959<https://github.com/cplusplus/papers/issues/1572%5D%5BLWG2959>]] char_traits<char16_t>::eof is a valid UTF-16 code unit :sg16:
We should ask Louis/STL if they are happy to change int_type, or, alternatively explain LEWG the only solution is to change int_type, and see if they care.
Only for char16_t? Go beyond that, and I think we'll be back in the "committee broke std::string ABI" land.
Right, both UTF-8 and UTF-32 have invalid values - for example 0xFF and 0xFFFFFFFF can be used as sentinel - so they don;t suffer the issue described here - but UTF-16 has no such luck, the whole space is used,
either for codepoints, or surrogates.
-- HT
Received on 2026-06-24 18:33:52
