Date: Mon, 25 Aug 2025 10:38:16 -0700
On Monday, 25 August 2025 10:01:18 Pacific Daylight Time Oliver Hunt wrote:
> UTF8 is the preferred encoding by new code and languages - Foundation uses
> UTF16 for legacy API/ABI reasons, but Swift strings are UTF8 by default -
> the only exception is bridged NSStrings, because of Foundation’s UTF16 ABI
> requirements.
>
> Web content is overwhelmingly UTF-8, and so browse engines would prefer UTF8
> as well - the only reason for not doing so is that JS exposes a UCS2
> interface, so internally their strings are either 8 or 16 bit. This is
> because the JS exposure of UCS2 means that there is no efficient way to use
> a different multibyte encoding, and the UCS2 nature of JS means encoding to
> utf 8 is not necessarily possible (hence the WTF8 encoding).
That confirms my argument that char16_t must be supported, for compatibility
with existing content.
> UTF8 is the preferred encoding by new code and languages - Foundation uses
> UTF16 for legacy API/ABI reasons, but Swift strings are UTF8 by default -
> the only exception is bridged NSStrings, because of Foundation’s UTF16 ABI
> requirements.
>
> Web content is overwhelmingly UTF-8, and so browse engines would prefer UTF8
> as well - the only reason for not doing so is that JS exposes a UCS2
> interface, so internally their strings are either 8 or 16 bit. This is
> because the JS exposure of UCS2 means that there is no efficient way to use
> a different multibyte encoding, and the UCS2 nature of JS means encoding to
> utf 8 is not necessarily possible (hence the WTF8 encoding).
That confirms my argument that char16_t must be supported, for compatibility
with existing content.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Platform & System Engineering
Received on 2025-08-25 17:38:25