Hi Tom,
I saw that Jon bumped this on the issue tracker. I’m loathe to spend much time on this;
char_traits is just completely broken.
Unfortunately, the brokenness propagates from
char_traits to other standard library facilities, so we probably need to talk about it.
Just picking an example at random, if we take the deprecate-and-replace strategy, we immediately have to tackle
basic_istream::peek() and
basic_istream::get(), for example, which is defined to return a
traits::int_type value. Would we end up in our classic “broken defaults” situation, where users should always call
basic_istream::peek_eof() but the broken function has the pretty name?
I would like the ABI Review Group to look at this to get some specific feedback on the impact of increasing the size of
std::char_traits<char16_t>::int_type to
std::uint_least32_t. This seems
by far to be the simplest and most preferable option, if it’s available.
Best regards,
Peter
From: SG16 <sg16-bounces@lists.isocpp.org>
On Behalf Of Tom Honermann via SG16
Sent: 30 May 2023 18:02
SG16 has had an
issue tracking a design defect with the
std::char_traits<char16_t> specialization since 2018. The issue was originally reported by Jonathan Wakely as
LWG 2959. Jonathan recently created an
LWG github tracking issue and assigned it to SG16. I'll schedule this for discussion at a future SG16 telecon, but would like to discuss some options on the mailing list first. I encourage reading the comments in the
SG16 issue before proceeding.
Briefly, the design defect is that std::char_traits<char16_t>::int_type is specified ([char.traits.specializations.char16.t])
to be std::uint_least16_t. The problem is that all 16-bit values are valid code units in UTF-16, so there is no value left to indicate the EOF condition that
int_type is intended to be used for (unless
uint_least16_t happens to be larger than 16-bit; something that is not the case for major implementations).
There does not appear to be a way to fix this problem without causing an ABI break; we can't just change the
int_type type alias to use a larger type. However, we could bring the issue to the ABI Review Group (ARG) to see if they know of some black magic that could be useful.
This same problem occurs for std::char_traits<wchar_t> when both
wchar_t and
wint_t (the specified target of the
int_type member type alias;
[char.traits.specializations.wchar.t]) are 16-bit types and the wide character encoding is UTF-16. This is the case for Microsoft's implementation.
Absent a way to fix the problem directly, it seems a deprecate-and-replace strategy will be required. I haven't thoroughly researched this, but a possible approach is to deprecate the existing
int_type member (for all of the
std::char_traits specializations) and to introduce a new
eof_type member that is guaranteed to be able to hold a value that is not a valid code unit value. This would require replacements for at least the following member functions:
What would be the consequences of such a change? Are other parts of the standard library impacted?
I know that nobody is excited about the prospect of spending time addressing issues with
char_traits.
Please share your thoughts.
Tom.