sg16: Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

From: keld_at <keld_at_[hidden]>
Date: Mon, 29 Apr 2019 20:11:52 +0200

On Mon, Apr 29, 2019 at 02:00:23PM -0400, Steve Downey wrote:
> Not "some". I want the entire set of Unicode functionality as a first class
> citizen, although some of them have higher priority than others.

I am willing to consider support for all of unicode functionality.
much is however not defined by unicode. and not in a c/c++ style
I think some unicode is not well designed, like ucs16.
(the stuff they copied from me, however, is ok:-)

> Keld, what capabilities not provided by Unicode algorithms and databases
> are you concerned about not being supported? I've been doing text
> processing a lot, and working with code points or scalar values has made my
> life easier, with less complaints from my customers. Well, except for a few
> who don't like CDATA in XML dumps, but were somehow OK with utterly broken
> XML.

i did a number of designs that they have not yet copied, eg in 14652, and posix has some
non-unicode stuff on the way.

keld

> On Mon, Apr 29, 2019 at 1:48 PM Steve Downey <sdowney_at_[hidden]> wrote:
>
> > The "POSIX" and "C" locales, where the "POSIX" locale is the superset of
> > capabilities of the "C" locale, but otherwise by definition equivalent, is
> > the one you get if you do not make a setlocale() call.
> > So, not _a_ posix locale, but _the_ POSIX locale.
> >
> > On Mon, Apr 29, 2019 at 1:37 PM <keld_at_[hidden]> wrote:
> >
> >> On Sun, Apr 28, 2019 at 05:25:28PM -0400, JeanHeyd Meneide wrote:
> >> > On Sun, Apr 28, 2019 at 4:01 PM <keld_at_[hidden]> wrote:
> >> >
> >> > > I believe there are a number of encodings in East Asia that there
> >> will
> >> > > still be
> >> > > developed for for quite some time.
> >> > >
> >> > > major languages and toolkits and operating systems are still
> >> character set
> >> > > independent.
> >> > > some people believe that unicode has not won, and some people are not
> >> > > happy with
> >> > > the unicode consortium. why abandon a model that still delivers for
> >> all?
> >> > >
> >> > > keld
> >> > >
> >> >
> >> > I think there's really only one thing that needs to be fixed, and that's
> >> > the POSIX and C locales. Right now, they force a by-requirement 256
> >> > single-byte encoding. (Chapter 6, Section 2, first sentence:
> >> > http://pubs.opengroup.org/onlinepubs/9699919799/).
> >>
> >> the posix std has since 1991 had provisions for iso 10646 and most posix
> >> implementations
> >> today supports iso 10646 and iso 14651 - with a lot of collation and
> >> character attribure support
> >> long befor unicide made something up.
> >>
> >> >
> >> > This restriction is what has been utterly and absolutely destroying the
> >> > ability to behave properly with a large set of encodings deployed around
> >> > the world, including Unicode, as a default. I am actually spending time
> >> and
> >> > cycles now contacting people on the C Standards Committee and reaching
> >> out
> >> > to people to find the POSIX individuals responsible for overseeing this
> >> > standard: that the locale is a single-byte encoding is not "character
> >> set
> >> > independent": it means that only a small fraction (ASCII, or similar)
> >> can
> >> > possibly be the default C or POSIX locale. That Unicode (specifically,
> >> > UTF8) happens to work in C and C++ is because the defaults for many of
> >> the
> >> > implementations simply pass char/wchar_t/char16_t/char32_t through their
> >> > interfaces and do not touch it. But, the moment anyone uses facets or
> >> > locales in any meaningful manner, much of it falls over.
> >>
> >> this is not true, quite the contrary.
> >> yes posix has a standard posix locale which is 7/8 bit and portable,
> >> but 10646 has been supported since 1991 in posix. and works are inderway
> >> for a posix 10646 locale,
> >> iso 14652 has a candidate for that which is also the base for many glibc
> >> national locales.
> >>
> >>
> >> >
> >> > POSIX/C need to acknowledge that multibyte encodings are reasonable
> >> > defaults (not just recommended extensions, but plausible defaults).
> >> Until
> >> > then, no: the C standard does not deliver for all and actively harms the
> >> > development and growth of international text processing on large and
> >> small
> >> > hardware systems.
> >>
> >> I think you are not up to date. how can Linux and osx and other posix
> >> os'es deliver
> >> fully internationalized systems with support for more languages than
> >> microsoft windows?
> >> linux supports more than 100 languages, an mostly in utf-8.
> >>
> >> keld
> >> _______________________________________________
> >> SG16 Unicode mailing list
> >> Unicode_at_[hidden]
> >> http://www.open-std.org/mailman/listinfo/unicode
> >>
> >

Received on 2019-04-29 20:11:52