Date: Sat, 30 Aug 2025 15:59:09 -0700
> On Aug 30, 2025, at 1:58 PM, Thiago Macieira <thiago_at_[hidden]> wrote:
>
> On Saturday, 30 August 2025 13:46:59 Pacific Daylight Time Oliver Hunt wrote:
>> I’ll prod folk again, but I’m not sure I understand why you seem so
>> absolutely adamant that every does or should use utf16 internally when
>> multiple people have said this is not true, and pointed to every API you
>> reference correctly as “this API was introduced when ucs2 was thought to be
>> sufficient, and then got utf16 bolted on after the fact and different
>> rates”.
>
> I'm not adamant on this any more. I think based on what you said that Swift
> reimplemented the support for the Unicode Database. I just can't find it,
> because I don't know how to navigate the source code. I've found where it
> iterates over the UTF-8 string and returns UTF-32 code units/points, but not
> where it looks up the collation value such that U+00E9 is less than U+0069.
>
> The problem is of course that this means they've duplicated the access to the
> Unicode Database, instead of using the OS. Then again, if Swift is cross-
> platform to other OSes, it kind of has to if it doesn't want to depend on ICU.
On Apple platforms Swift foundation libraries _is_ part of the OS.
That said, it would seem - though I don’t know the details of icu, etc - entirely
plausible for the swift foundation libraries to directly include the icu tables, or
reference them by symbols. But again I _really_ don’t know: C++ compiler guy,
not swift.
>
>> What you seem to be arguing is old ABI fixed APIs that were extended to
>> support utf16, so despite the many problems of utf16 vs utf8, and the wide
>> spread adoption of utf8 everywhere other than places that are stuck with
>> utf16 due to aforementioned ABI constraints, all new systems languages
>> being built on utf8 strings, we should make new APIs built around utf16 so
>> we can continue to be required to maintain an encoding that is (what the
>> domain experts have told me) is bad on every metric.
>
> I'm arguing that because we have such a widespread use of UTF-16 in C and C++,
> we need first-class UTF-16 support in the C++ Standard. I don't care about
> other languages, because I'm not writing code for them. But the underlying
> infrastructure for UTF-16 for C and C++ seems to be there.
>
> So instead of talking about Rust or Swift, let's ask what libc++ would use to
> implement collation.
I’m saying that we don’t have widespread use of utf16 in C and C++. C and C++
do not have _any_ awareness of unicode, strings are blobs, and code points are
equivalent to characters. The only platform in which C/C++ have even ucs2
support seems to be windows - on linux, macOS, and I would guess the other
unix like systems wchar_t is 32bit, e.g. a unicode scalar, not ucs2 or a utf16
code point.
If C++ _were_ to add support for unicode, based on my understanding of the C++
spec process, it would not be a matter of pointing to a system library or libicu,
it would be each C++ edition referencing a _specific_ unicode release that would
need to be embedded in the standard library.
This used to be a problem with web standard specifications, as it put standards in
a position that would require the same text rendering differently in a webpage
than the rest of the OS. So browsers ignored it, and the specifications removed
those issues. C++ has very different constraints however, which means it
the requirement for specific version specifications is perhaps more reasonable.
>> “AI” is just predictive text generation regurgitating existing content, so
>> of course it will produce answers that are most like the above. The
>> majority of the posts it regurgitates written about stuff like this are
>> from _decades_ of objc + foundation. AI doesn’t magically know anything, it
>> literally just regurgitates the work of others, periodically adding errors.
>> There is no reason to use it in a technical forum.
>
> Which is why I almost always ignore the AI and go straight for the sources,
> because until it is 99% reliable or more, it's useless. But in this case,
> since I can't pass the judgement either on the accuracy of the sources, the AI
> answer suffices. It seemed plausible that, if you needed the exact same sorting
> as Finder, you'd use the same function that Finder uses, not one that may be
> slightly different due to a reimplementation, however correct it may be.
As above: swift is part of the platform, the swift standard libraries are system
libraries. Much of what you are thinking is swift libraries/implementation vs
system implementation (or even “old” system functions) are part of the swift
standard libraries, not something separate.
It is a fundamental misunderstanding to think that swift behavior can diverge
from “platform” behavior on out platforms: it fundamentally is the platform,
and as such the swift implementations of operations does not, and cannot,
change the ABI or observable behavior. It is also incorrect to thing “I am calling
the system version of this not the swift one”, and think that those are necessarily
not the same implementation, or that the implementation of those functions
is not coming from the swift standard library.
I am unclear on why you seem so adamant about adding new utf16 APIs when
no one who works with unicode believes that utf16 is a good answer to any
problem, most vendors of APIs that use utf16 regret those APIs, most language
standards that specified thet use of utf16 as their string representation regret it,
new languages and APIs are in terms of utf8, unless there are specific platform
reasons that _require_ un-abstracted utf16/char16_t interfaces.
This is especially true for C/C++ (as opposed to languages like Java, JavaScript,
etc) where utf16 is not the standard non-8bit character encoding.
—Oliver
>
> On Saturday, 30 August 2025 13:46:59 Pacific Daylight Time Oliver Hunt wrote:
>> I’ll prod folk again, but I’m not sure I understand why you seem so
>> absolutely adamant that every does or should use utf16 internally when
>> multiple people have said this is not true, and pointed to every API you
>> reference correctly as “this API was introduced when ucs2 was thought to be
>> sufficient, and then got utf16 bolted on after the fact and different
>> rates”.
>
> I'm not adamant on this any more. I think based on what you said that Swift
> reimplemented the support for the Unicode Database. I just can't find it,
> because I don't know how to navigate the source code. I've found where it
> iterates over the UTF-8 string and returns UTF-32 code units/points, but not
> where it looks up the collation value such that U+00E9 is less than U+0069.
>
> The problem is of course that this means they've duplicated the access to the
> Unicode Database, instead of using the OS. Then again, if Swift is cross-
> platform to other OSes, it kind of has to if it doesn't want to depend on ICU.
On Apple platforms Swift foundation libraries _is_ part of the OS.
That said, it would seem - though I don’t know the details of icu, etc - entirely
plausible for the swift foundation libraries to directly include the icu tables, or
reference them by symbols. But again I _really_ don’t know: C++ compiler guy,
not swift.
>
>> What you seem to be arguing is old ABI fixed APIs that were extended to
>> support utf16, so despite the many problems of utf16 vs utf8, and the wide
>> spread adoption of utf8 everywhere other than places that are stuck with
>> utf16 due to aforementioned ABI constraints, all new systems languages
>> being built on utf8 strings, we should make new APIs built around utf16 so
>> we can continue to be required to maintain an encoding that is (what the
>> domain experts have told me) is bad on every metric.
>
> I'm arguing that because we have such a widespread use of UTF-16 in C and C++,
> we need first-class UTF-16 support in the C++ Standard. I don't care about
> other languages, because I'm not writing code for them. But the underlying
> infrastructure for UTF-16 for C and C++ seems to be there.
>
> So instead of talking about Rust or Swift, let's ask what libc++ would use to
> implement collation.
I’m saying that we don’t have widespread use of utf16 in C and C++. C and C++
do not have _any_ awareness of unicode, strings are blobs, and code points are
equivalent to characters. The only platform in which C/C++ have even ucs2
support seems to be windows - on linux, macOS, and I would guess the other
unix like systems wchar_t is 32bit, e.g. a unicode scalar, not ucs2 or a utf16
code point.
If C++ _were_ to add support for unicode, based on my understanding of the C++
spec process, it would not be a matter of pointing to a system library or libicu,
it would be each C++ edition referencing a _specific_ unicode release that would
need to be embedded in the standard library.
This used to be a problem with web standard specifications, as it put standards in
a position that would require the same text rendering differently in a webpage
than the rest of the OS. So browsers ignored it, and the specifications removed
those issues. C++ has very different constraints however, which means it
the requirement for specific version specifications is perhaps more reasonable.
>> “AI” is just predictive text generation regurgitating existing content, so
>> of course it will produce answers that are most like the above. The
>> majority of the posts it regurgitates written about stuff like this are
>> from _decades_ of objc + foundation. AI doesn’t magically know anything, it
>> literally just regurgitates the work of others, periodically adding errors.
>> There is no reason to use it in a technical forum.
>
> Which is why I almost always ignore the AI and go straight for the sources,
> because until it is 99% reliable or more, it's useless. But in this case,
> since I can't pass the judgement either on the accuracy of the sources, the AI
> answer suffices. It seemed plausible that, if you needed the exact same sorting
> as Finder, you'd use the same function that Finder uses, not one that may be
> slightly different due to a reimplementation, however correct it may be.
As above: swift is part of the platform, the swift standard libraries are system
libraries. Much of what you are thinking is swift libraries/implementation vs
system implementation (or even “old” system functions) are part of the swift
standard libraries, not something separate.
It is a fundamental misunderstanding to think that swift behavior can diverge
from “platform” behavior on out platforms: it fundamentally is the platform,
and as such the swift implementations of operations does not, and cannot,
change the ABI or observable behavior. It is also incorrect to thing “I am calling
the system version of this not the swift one”, and think that those are necessarily
not the same implementation, or that the implementation of those functions
is not coming from the swift standard library.
I am unclear on why you seem so adamant about adding new utf16 APIs when
no one who works with unicode believes that utf16 is a good answer to any
problem, most vendors of APIs that use utf16 regret those APIs, most language
standards that specified thet use of utf16 as their string representation regret it,
new languages and APIs are in terms of utf8, unless there are specific platform
reasons that _require_ un-abstracted utf16/char16_t interfaces.
This is especially true for C/C++ (as opposed to languages like Java, JavaScript,
etc) where utf16 is not the standard non-8bit character encoding.
—Oliver
Received on 2025-08-30 22:59:22