Date: Fri, 20 Dec 2024 10:53:16 +0100
I have made conversions between UTF-8 and UTF-32 (link below), in such a way that there is no overhead for valid byte sequences, but for invalid byte sequences it is possible, as needed, to recover both the partially computed value, such as for overlong sequences, and the original byte sequences. The original 31 bit UTF-8 is extended to 32 bits so all char32_t values can be converted to UTF-8 without the need for exception handling. The intent is that the Unicode limit should be handled on a higher level.
The background is that the standard does not capture the fact that UTF-8 and UTF-32 are encodings of the same character set, making the corresponding types not so useful. With this addition, it is possible to replace std::string to use u32string internally and read to it using byte streams.
https://git.savannah.gnu.org/cgit/metalogic-inference.git/tree/src/utf.hh
The background is that the standard does not capture the fact that UTF-8 and UTF-32 are encodings of the same character set, making the corresponding types not so useful. With this addition, it is possible to replace std::string to use u32string internally and read to it using byte streams.
https://git.savannah.gnu.org/cgit/metalogic-inference.git/tree/src/utf.hh
Received on 2024-12-20 09:53:31