C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Oliver Hunt <oliver_at_[hidden]>
Date: Tue, 26 Aug 2025 18:04:33 -0700
> On Aug 26, 2025, at 5:12 PM, Thiago Macieira via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Tuesday, 26 August 2025 14:48:21 Pacific Daylight Time David Brown via Std-
> Proposals wrote:
>> The consensus of the modern world is UTF-8 for everything, except for
>> legacy API's that are difficult to change.
>
> And that is the big issue: all the legacy APIs. We're not talking about a
> green field scenario. In the real world, UTF-16 has a place and is in use for
> in-memory representation more frequently than UTF-8 or UTF-32. UTF-8 is used
> for external representation (network protocols and files).
>
>> AFAIUI, all these languages, libraries and OS's that had UCS2 character
>> encodings also now support UTF-8, and generally encourage UTF-8 as the
>> main choice of character type.
>
> Internally they still operate in UTF-16 and will need to perform conversion
> to/from it to operate on UTF-8. And that includes *the* library for Unicode
> support, ICU. If the Standard proposed an API for performing collation in
> Unicode, chances are it would be implemented using ucol_strcoll[1].

I don’t know about rust, but with the exception of the NSString bridge (which I assume does something exciting, because bridging), Swift operates entirely in utf-8. In addition to that I know many string implementations that are ostensibly utf16 or ucs2 use either ascii (not utf8 - this is generally because they’re often stuck with some char16* API and utf8->utf16->utf8 is expensive) or utf16 internally depending on the content.

—Oliver

Received on 2025-08-27 01:04:45