Date: Mon, 29 Apr 2019 19:37:24 +0200
On Sun, Apr 28, 2019 at 05:25:28PM -0400, JeanHeyd Meneide wrote:
> On Sun, Apr 28, 2019 at 4:01 PM <keld_at_[hidden]> wrote:
>
> > I believe there are a number of encodings in East Asia that there will
> > still be
> > developed for for quite some time.
> >
> > major languages and toolkits and operating systems are still character set
> > independent.
> > some people believe that unicode has not won, and some people are not
> > happy with
> > the unicode consortium. why abandon a model that still delivers for all?
> >
> > keld
> >
>
> I think there's really only one thing that needs to be fixed, and that's
> the POSIX and C locales. Right now, they force a by-requirement 256
> single-byte encoding. (Chapter 6, Section 2, first sentence:
> http://pubs.opengroup.org/onlinepubs/9699919799/).
the posix std has since 1991 had provisions for iso 10646 and most posix implementations
today supports iso 10646 and iso 14651 - with a lot of collation and character attribure support
long befor unicide made something up.
>
> This restriction is what has been utterly and absolutely destroying the
> ability to behave properly with a large set of encodings deployed around
> the world, including Unicode, as a default. I am actually spending time and
> cycles now contacting people on the C Standards Committee and reaching out
> to people to find the POSIX individuals responsible for overseeing this
> standard: that the locale is a single-byte encoding is not "character set
> independent": it means that only a small fraction (ASCII, or similar) can
> possibly be the default C or POSIX locale. That Unicode (specifically,
> UTF8) happens to work in C and C++ is because the defaults for many of the
> implementations simply pass char/wchar_t/char16_t/char32_t through their
> interfaces and do not touch it. But, the moment anyone uses facets or
> locales in any meaningful manner, much of it falls over.
this is not true, quite the contrary.
yes posix has a standard posix locale which is 7/8 bit and portable,
but 10646 has been supported since 1991 in posix. and works are inderway for a posix 10646 locale,
iso 14652 has a candidate for that which is also the base for many glibc national locales.
>
> POSIX/C need to acknowledge that multibyte encodings are reasonable
> defaults (not just recommended extensions, but plausible defaults). Until
> then, no: the C standard does not deliver for all and actively harms the
> development and growth of international text processing on large and small
> hardware systems.
I think you are not up to date. how can Linux and osx and other posix os'es deliver
fully internationalized systems with support for more languages than microsoft windows?
linux supports more than 100 languages, an mostly in utf-8.
keld
> On Sun, Apr 28, 2019 at 4:01 PM <keld_at_[hidden]> wrote:
>
> > I believe there are a number of encodings in East Asia that there will
> > still be
> > developed for for quite some time.
> >
> > major languages and toolkits and operating systems are still character set
> > independent.
> > some people believe that unicode has not won, and some people are not
> > happy with
> > the unicode consortium. why abandon a model that still delivers for all?
> >
> > keld
> >
>
> I think there's really only one thing that needs to be fixed, and that's
> the POSIX and C locales. Right now, they force a by-requirement 256
> single-byte encoding. (Chapter 6, Section 2, first sentence:
> http://pubs.opengroup.org/onlinepubs/9699919799/).
the posix std has since 1991 had provisions for iso 10646 and most posix implementations
today supports iso 10646 and iso 14651 - with a lot of collation and character attribure support
long befor unicide made something up.
>
> This restriction is what has been utterly and absolutely destroying the
> ability to behave properly with a large set of encodings deployed around
> the world, including Unicode, as a default. I am actually spending time and
> cycles now contacting people on the C Standards Committee and reaching out
> to people to find the POSIX individuals responsible for overseeing this
> standard: that the locale is a single-byte encoding is not "character set
> independent": it means that only a small fraction (ASCII, or similar) can
> possibly be the default C or POSIX locale. That Unicode (specifically,
> UTF8) happens to work in C and C++ is because the defaults for many of the
> implementations simply pass char/wchar_t/char16_t/char32_t through their
> interfaces and do not touch it. But, the moment anyone uses facets or
> locales in any meaningful manner, much of it falls over.
this is not true, quite the contrary.
yes posix has a standard posix locale which is 7/8 bit and portable,
but 10646 has been supported since 1991 in posix. and works are inderway for a posix 10646 locale,
iso 14652 has a candidate for that which is also the base for many glibc national locales.
>
> POSIX/C need to acknowledge that multibyte encodings are reasonable
> defaults (not just recommended extensions, but plausible defaults). Until
> then, no: the C standard does not deliver for all and actively harms the
> development and growth of international text processing on large and small
> hardware systems.
I think you are not up to date. how can Linux and osx and other posix os'es deliver
fully internationalized systems with support for more languages than microsoft windows?
linux supports more than 100 languages, an mostly in utf-8.
keld
Received on 2019-04-29 19:37:25