Date: Sat, 30 Aug 2025 09:24:21 -0700
On Saturday, 30 August 2025 00:19:57 Pacific Daylight Time Oliver Hunt via Std-
Proposals wrote:
> It is entirely utf8 - they normalize to NFC first and then perform an
> lexical comparison on the normalized utf8 strings โ the start of the slow
> path (non-ascii, possibly? Possible grapheme clusters?, etc the ascii check
> is obvious but it would seem plausible that comparisons can be faster if
> everything in the string is a single scalar?[1]) is at (Apache License
> v2.0+library):
> https://github.com/swiftlang/swift/blob/main/stdlib/public/core/StringCompa
> rison.swift#L127
>
> It is also important to be aware that the Character type in Swift represents
> a complete(extended?) grapheme cluster, not a byte, code unit or scalar,
> e.g โ๐จโ๐ฉโ๐งโ๐ฆโ.count is 1, and iterating across the string will only
> see one character.
After a lot of reading the source code and still not finding what I was looking
for (hampered by having zero knowledge of the language in question), I think
what Swift does in the code above is that in the slowest of code paths, either
the grapheme clusters or the UnicodeScalars have a comparison value dictated
by their collation order. The "string manifesto" did mention that a Swift
string containing "รฉ" would always sort before "i", for generic locales. I
just can't find where a value is assigned to the comparisons.
Asking Google and insisting that I meant collation, not concatenation,
indicates that you can pass a locale: option to String's compare() function.
But I could not find the implementation for this to verify. The AI answer did
also suggest that sorting of things that should be the same as elsewhere in
the OS, like file names as in Finder, should convert to NSString and use the
functions from there.
Proposals wrote:
> It is entirely utf8 - they normalize to NFC first and then perform an
> lexical comparison on the normalized utf8 strings โ the start of the slow
> path (non-ascii, possibly? Possible grapheme clusters?, etc the ascii check
> is obvious but it would seem plausible that comparisons can be faster if
> everything in the string is a single scalar?[1]) is at (Apache License
> v2.0+library):
> https://github.com/swiftlang/swift/blob/main/stdlib/public/core/StringCompa
> rison.swift#L127
>
> It is also important to be aware that the Character type in Swift represents
> a complete(extended?) grapheme cluster, not a byte, code unit or scalar,
> e.g โ๐จโ๐ฉโ๐งโ๐ฆโ.count is 1, and iterating across the string will only
> see one character.
After a lot of reading the source code and still not finding what I was looking
for (hampered by having zero knowledge of the language in question), I think
what Swift does in the code above is that in the slowest of code paths, either
the grapheme clusters or the UnicodeScalars have a comparison value dictated
by their collation order. The "string manifesto" did mention that a Swift
string containing "รฉ" would always sort before "i", for generic locales. I
just can't find where a value is assigned to the comparisons.
Asking Google and insisting that I meant collation, not concatenation,
indicates that you can pass a locale: option to String's compare() function.
But I could not find the implementation for this to verify. The AI answer did
also suggest that sorting of things that should be the same as elsewhere in
the OS, like file names as in Finder, should convert to NSString and use the
functions from there.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Platform & System Engineering
Received on 2025-08-30 16:24:32