Date: Thu, 06 Oct 2022 15:14:23 -0700
On Thursday, 6 October 2022 14:13:07 PDT Dimitrij Mijoski via SG16 wrote:
> The functions will fall in four categories:
>
> 1. *Functions that decode/encode only one code point.* ICU has this
> functionality in utf8.h
>
> <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf8_8h.html>
> and utf16.h
>
> <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf16_8h.html
> >. but it is implemented as macros that are not type-safe.
I don't see how this can safely be done because UTF-8 and UTF-16 might need
multiple code units to represent a single code point. Therefore, conversions
are inherently a string operation.
Unless there's a strong use-case for these, I'd simply forego them completely.
But since this is what you've dedicated the majority of your email to, you may
feel there is a need for them. Can you elaborate?
Iterative conversion could probably be implemented with a stateful chunk API
(your #4) where the input chunks are of size 1, though output chunks should
probably never be less than 4 bytes (2 char16_t or 1 char32_t) long.
> The functions will fall in four categories:
>
> 1. *Functions that decode/encode only one code point.* ICU has this
> functionality in utf8.h
>
> <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf8_8h.html>
> and utf16.h
>
> <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf16_8h.html
> >. but it is implemented as macros that are not type-safe.
I don't see how this can safely be done because UTF-8 and UTF-16 might need
multiple code units to represent a single code point. Therefore, conversions
are inherently a string operation.
Unless there's a strong use-case for these, I'd simply forego them completely.
But since this is what you've dedicated the majority of your email to, you may
feel there is a need for them. Can you elaborate?
Iterative conversion could probably be implemented with a stateful chunk API
(your #4) where the input chunks are of size 1, though output chunks should
probably never be less than 4 bytes (2 char16_t or 1 char32_t) long.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel DCAI Cloud Engineering
Received on 2022-10-06 22:14:25