The problem is not UTF-16, the problem is the concept of char_traits<>::eof itself.

Algorithms that assume that eof exists are broken, and we have since made better libraries that don’t use eof at all.

 

We don’t need to fix the lack of char_traits<char16_t>::eof for UTF-16 it is not broken. We need to remove char_traits entirely, because that is the part that is broken.

 

 

From: SG16 <sg16-bounces@lists.isocpp.org> On Behalf Of Corentin Jabot via SG16
Sent: Wednesday, June 24, 2026 20:15
To: Hubert Tong <hubert.reinterpretcast@gmail.com>
Cc: Corentin Jabot <corentinjabot@gmail.com>; sg16@lists.isocpp.org; dascandy@gmail.com; Steve Downey <sg16@sdowney.dev>
Subject: Re: [isocpp-sg16] SG16 Meeting Tomorrow at 3:30 NYC Time

 

 

 

On Wed, Jun 24, 2026 at 5:37PM Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:

On Wed, Jun 24, 2026 at 9:50AM Corentin Jabot via SG16 <sg16@lists.isocpp.org> wrote:

 

 

On Wed, Jun 24, 2026 at 3:17AM Steve Downey via SG16 <sg16@lists.isocpp.org> wrote:

 

  wg21:       TODO [[https://github.com/cplusplus/papers/issues/1572][LWG2959]] char_traits<char16_t>::eof is a valid UTF-16 code unit                                                                                    :sg16:

 

We should ask Louis/STL if they are happy to change int_type, or, alternatively explain LEWG the only solution is to change int_type, and see if they care.

 

Only for char16_t? Go beyond that, and I think we'll be back in the "committee broke std::string ABI" land.

 

Right, both UTF-8 and UTF-32 have invalid values - for example 0xFF and 0xFFFFFFFF can be used as sentinel - so they don;t suffer the issue described here - but UTF-16 has no such luck, the whole space is used, 

either for codepoints, or surrogates.

 

 

-- HT