On Sun, Apr 28, 2019 at 4:01 PM <keld@keldix.com> wrote:

I believe there are a number of encodings in East Asia that there will still be
developed for for quite some time.

major languages and toolkits and operating systems are still character set independent.
some people believe that unicode has not won, and some people are not happy with
the unicode consortium. why abandon a model that still delivers for all?

keld

I think there's really only one thing that needs to be fixed, and that's the POSIX and C locales. Right now, they force a by-requirement 256 single-byte encoding. (Chapter 6, Section 2, first sentence: http://pubs.opengroup.org/onlinepubs/9699919799/).

This restriction is what has been utterly and absolutely destroying the ability to behave properly with a large set of encodings deployed around the world, including Unicode, as a default. I am actually spending time and cycles now contacting people on the C Standards Committee and reaching out to people to find the POSIX individuals responsible for overseeing this standard: that the locale is a single-byte encoding is not "character set independent": it means that only a small fraction (ASCII, or similar) can possibly be the default C or POSIX locale. That Unicode (specifically, UTF8) happens to work in C and C++ is because the defaults for many of the implementations simply pass char/wchar_t/char16_t/char32_t through their interfaces and do not touch it. But, the moment anyone uses facets or locales in any meaningful manner, much of it falls over.

POSIX/C need to acknowledge that multibyte encodings are reasonable defaults (not just recommended extensions, but plausible defaults). Until then, no: the C standard does not deliver for all and actively harms the development and growth of international text processing on large and small hardware systems.