ISOCPP std-proposals List: Re: [std-proposals] char8

From: Oliver Hunt <oliver_at_[hidden]>
Date: Sun, 31 Aug 2025 01:57:02 -0700

> On Aug 31, 2025, at 12:47 AM, Tiago Freire <tmiguelf_at_[hidden]> wrote:
>
> Too much to distill here.
>

There really isn’t: you have repeatedly asserted that utf16 is sufficiently widely used that it is what should become the default. I have given numerous examples of how your assertions are simply incorrect.

I have pointed out that this has not been true of new languages, new APIs, or new platforms for a couple of decades at this point.

I’ve also pointed out that C and C++ don’t support utf16 on multiple platforms.

> Let's start with something simple.
> Do we agree that the interfaces we are talking about are either:
> 1. file system
> 2. terminal interaction
> and nothing else?
>

No. Just no. You do not get to simply define this entire discussion by excluding things that you personally do not care about.

> “Do we agree that the interfaces we are talking about are either: file system and terminal interaction are the only things we care about”

No, they were brought up because on numerous platforms, the utf8 nature of the entire platform is directly and immediately visible through those interfaces.

You have clearly stated your position: APIs exist that have ucs2 interfaces, and the fact that the world has moved to utf8 is irrelevant. This statement comes across as a clear intent to rule out any platform, system, API, or environment that is not in the set of things that you care about, that is the clear goal of the above “question". It also clearly disregards multiple comments from multiple people about multiple platforms.

We are talking about the entire ecosystem of platforms and languages, and for decades now new platforms have overwhelmingly preferred utf8 for decades. Please stop trying to make universal claims, while ignoring all contrary points people present. I worked on web browsers for over a decade and you don’t hear me proclaiming that C++ should support Shift-JIS or any other widespread encoding, that are still in widespread use, despite being widely regarded as not being something that should be used in any new interfaces or APIs. I expect a similar recognition from you that the use of specific encodings in the past is not a de facto justification to continue to use them, especially when it is abundantly clear that nothing new chooses to use those encodings unless specifically necessary.

The existing ucs2 and utf16 interfaces and APIS are made up entirely of APIs that date to the 90s when UCS2 was considered the perfect solution, or APIs that need to operate with those older APIs, and the reason they were not changed is because that it is not an option without breaking ABI, it is not because utf16 is _good_, or even desired. For example Qt has a pile of UCS2 interfaces because Qt has to interact with the platforms upon which it operates, and because it wants to support windows, where wchar_t is 16 bits, and a wide string is notionally utf16.

Making C++ default or focus on utf16, when literally no other environment sees that as a goal, is a choice on par with favoring EBCDIC over ASCII in the 90s.

Instead of accepting the feedback that not every platform is built around utf16, that most new languages and platforms are based on utf8, the entire internet is overwhelmingly utf8*, you have sought out things to point to as “look utf16!” in every environemt, no matter how separated from the platform APIs, or how deep they are in the implementation as if that has some baring on the platform API, as if (even if they were utf16) implementation details are more important than the platform interfaces. Such an approach would assert that platforms like macOS and ios that operate of unicode scalars in many cases - based on reading the code in swift - which is not hard, even for me, and I have not used swift in any meaningful way - is built on utf32 rather than utf8.

You point to many languages and environments that support ucs2 as your reason for making utf16 a core part of C++, but all of your examples date to the 90s or maybe early 2000s, or again environments that interact with those interfaces out of necessity. I can point to just as many, if not more, libraries, platforms, and environments that are based on utf8.

After that your entire proposition also assumes utf16 is widely/universally supported in C and C++: C/C++ has char* and wchar_t* strings, char is one character (which for absurd reasons is not 8 bits by standard), and wchar_t which is an implementation defined type, which is clearly utf32/scalars on linux, macOS, and presumably most other similar platforms. That wchar_t is 16bits on _some_ platforms does not mean utf16 is widely supported in C or C++. The options are: utf8 or an implementation defined selection of ucs2, utf16, or utf32.

Making C++ support utf16, would require adding new types, and new interfaces - the existing wide string interfaces are not utf16. The existing wchar_t type is not a 16bit code point.

I’m done with this thread at this point. You have dismissed, written off, or tried to find irrelevant gotchas in response to ever opposing position, so there is no point in me saying anything. You have made it exceptionally clear that you do not consider anything I say as relevant, especially with this last attempt to cut the discussion into things you care about, and things that can be dismissed.

Throughout this thread I have tried to assume you were making a discussion in good faith, but this last email does not make me believe that is true. My only option given the rule that we must operate in good faith, and we must assume good faith on the part of others, is to no longer participate.

—Oliver

* every JS implementation, every TC39 member, and Brendan Eich - if we need to appeal to authority - regrets the use of ucs2 in JS.

Received on 2025-08-31 08:57:15