C++ Logo

std-proposals

Advanced search

Re: [std-proposals] TBAA and extended floating-point types

From: Oliver Hunt <oliver_at_[hidden]>
Date: Sat, 30 Aug 2025 13:46:59 -0700
(I just saw the subject change, but not sure how mail threads will handle me changing this subject as well)

> On Aug 30, 2025, at 9:24 AM, Thiago Macieira via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Saturday, 30 August 2025 00:19:57 Pacific Daylight Time Oliver Hunt via Std-
> Proposals wrote:
>> It is entirely utf8 - they normalize to NFC first and then perform an
>> lexical comparison on the normalized utf8 strings — the start of the slow
>> path (non-ascii, possibly? Possible grapheme clusters?, etc the ascii check
>> is obvious but it would seem plausible that comparisons can be faster if
>> everything in the string is a single scalar?[1]) is at (Apache License
>> v2.0+library):
>> https://github.com/swiftlang/swift/blob/main/stdlib/public/core/StringCompa
>> rison.swift#L127
>>
>> It is also important to be aware that the Character type in Swift represents
>> a complete(extended?) grapheme cluster, not a byte, code unit or scalar,
>> e.g “👨‍👩‍👧‍👦”.count is 1, and iterating across the string will only
>> see one character.
>
> After a lot of reading the source code and still not finding what I was looking
> for (hampered by having zero knowledge of the language in question), I think
> what Swift does in the code above is that in the slowest of code paths, either
> the grapheme clusters or the UnicodeScalars have a comparison value dictated
> by their collation order. The "string manifesto" did mention that a Swift
> string containing "é" would always sort before "i", for generic locales. I
> just can't find where a value is assigned to the comparisons.

I’ll prod folk again, but I’m not sure I understand why you seem so absolutely adamant that every does or should use utf16 internally when multiple people have said this is not true, and pointed to every API you reference correctly as “this API was introduced when ucs2 was thought to be sufficient, and then got utf16 bolted on after the fact and different rates”.

What you seem to be arguing is old ABI fixed APIs that were extended to support utf16, so despite the many problems of utf16 vs utf8, and the wide spread adoption of utf8 everywhere other than places that are stuck with utf16 due to aforementioned ABI constraints, all new systems languages being built on utf8 strings, we should make new APIs built around utf16 so we can continue to be required to maintain an encoding that is (what the domain experts have told me) is bad on every metric.

This also means requiring all these new languages and APIs being supported by systems that do not support utf16 at all (from the documentation I can see in swift there is not significant utf16 support beyond the code point enumeration in the view classes - at least one part I saw said utf16 was dependent on the inclusion of foundation, which to me sounds like “if you use utf16, we just forward to objective-c’s foundation library. That presumably means swift only supports those operations on macosx and iOS targets, and linux, windows, etc only have utf8 support.


> Asking Google and insisting that I meant collation, not concatenation,
> indicates that you can pass a locale: option to String's compare() function.
> But I could not find the implementation for this to verify. The AI answer did
> also suggest that sorting of things that should be the same as elsewhere in
> the OS, like file names as in Finder, should convert to NSString and use the
> functions from there.

“AI” is just predictive text generation regurgitating existing content, so of course it will produce answers that are most like the above. The majority of the posts it regurgitates written about stuff like this are from _decades_ of objc + foundation. AI doesn’t magically know anything, it literally just regurgitates the work of others, periodically adding errors. There is no reason to use it in a technical forum.

—Oliver

>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel Platform & System Engineering
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-08-30 20:47:14