sg16: Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

From: keld_at <keld_at_[hidden]>
Date: Tue, 30 Apr 2019 14:19:34 +0200

On Mon, Apr 29, 2019 at 05:26:48PM -0400, Steve Downey wrote:
> Is there an open access version of 14652 ? It sounds like an extension of
> chapters 6 and 7 of the Posix spec? The ISO desc mentions APIs that will be
> developed? Also it looks like the spec is currently withdrawn, is there a
> replacement? https://www.iso.org/standard/37069.html

there are drafts of 14652 available
http://www.open-std.org/jtc1/SC22/WG20/docs/n972-14652ft.pdf

iso 30112 is a replacement.

they are backwards compatible with posix

keld

> On Mon, Apr 29, 2019 at 2:11 PM <keld_at_[hidden]> wrote:
>
> > On Mon, Apr 29, 2019 at 02:00:23PM -0400, Steve Downey wrote:
> > > Not "some". I want the entire set of Unicode functionality as a first
> > class
> > > citizen, although some of them have higher priority than others.
> >
> >
> > I am willing to consider support for all of unicode functionality.
> > much is however not defined by unicode. and not in a c/c++ style
> > I think some unicode is not well designed, like ucs16.
> > (the stuff they copied from me, however, is ok:-)
> >
> >
> >
> >
> > > Keld, what capabilities not provided by Unicode algorithms and databases
> > > are you concerned about not being supported? I've been doing text
> > > processing a lot, and working with code points or scalar values has made
> > my
> > > life easier, with less complaints from my customers. Well, except for a
> > few
> > > who don't like CDATA in XML dumps, but were somehow OK with utterly
> > broken
> > > XML.
> >
> > i did a number of designs that they have not yet copied, eg in 14652, and
> > posix has some
> > non-unicode stuff on the way.
> >
> > keld
> >
> > > On Mon, Apr 29, 2019 at 1:48 PM Steve Downey <sdowney_at_[hidden]> wrote:
> > >
> > > > The "POSIX" and "C" locales, where the "POSIX" locale is the superset
> > of
> > > > capabilities of the "C" locale, but otherwise by definition
> > equivalent, is
> > > > the one you get if you do not make a setlocale() call.
> > > > So, not _a_ posix locale, but _the_ POSIX locale.
> > > >
> > > > On Mon, Apr 29, 2019 at 1:37 PM <keld_at_[hidden]> wrote:
> > > >
> > > >> On Sun, Apr 28, 2019 at 05:25:28PM -0400, JeanHeyd Meneide wrote:
> > > >> > On Sun, Apr 28, 2019 at 4:01 PM <keld_at_[hidden]> wrote:
> > > >> >
> > > >> > > I believe there are a number of encodings in East Asia that
> > there
> > > >> will
> > > >> > > still be
> > > >> > > developed for for quite some time.
> > > >> > >
> > > >> > > major languages and toolkits and operating systems are still
> > > >> character set
> > > >> > > independent.
> > > >> > > some people believe that unicode has not won, and some people are
> > not
> > > >> > > happy with
> > > >> > > the unicode consortium. why abandon a model that still delivers
> > for
> > > >> all?
> > > >> > >
> > > >> > > keld
> > > >> > >
> > > >> >
> > > >> > I think there's really only one thing that needs to be fixed, and
> > that's
> > > >> > the POSIX and C locales. Right now, they force a by-requirement 256
> > > >> > single-byte encoding. (Chapter 6, Section 2, first sentence:
> > > >> > http://pubs.opengroup.org/onlinepubs/9699919799/).
> > > >>
> > > >> the posix std has since 1991 had provisions for iso 10646 and most
> > posix
> > > >> implementations
> > > >> today supports iso 10646 and iso 14651 - with a lot of collation and
> > > >> character attribure support
> > > >> long befor unicide made something up.
> > > >>
> > > >> >
> > > >> > This restriction is what has been utterly and absolutely destroying
> > the
> > > >> > ability to behave properly with a large set of encodings deployed
> > around
> > > >> > the world, including Unicode, as a default. I am actually spending
> > time
> > > >> and
> > > >> > cycles now contacting people on the C Standards Committee and
> > reaching
> > > >> out
> > > >> > to people to find the POSIX individuals responsible for overseeing
> > this
> > > >> > standard: that the locale is a single-byte encoding is not
> > "character
> > > >> set
> > > >> > independent": it means that only a small fraction (ASCII, or
> > similar)
> > > >> can
> > > >> > possibly be the default C or POSIX locale. That Unicode
> > (specifically,
> > > >> > UTF8) happens to work in C and C++ is because the defaults for many
> > of
> > > >> the
> > > >> > implementations simply pass char/wchar_t/char16_t/char32_t through
> > their
> > > >> > interfaces and do not touch it. But, the moment anyone uses facets
> > or
> > > >> > locales in any meaningful manner, much of it falls over.
> > > >>
> > > >> this is not true, quite the contrary.
> > > >> yes posix has a standard posix locale which is 7/8 bit and portable,
> > > >> but 10646 has been supported since 1991 in posix. and works are
> > inderway
> > > >> for a posix 10646 locale,
> > > >> iso 14652 has a candidate for that which is also the base for many
> > glibc
> > > >> national locales.
> > > >>
> > > >>
> > > >> >
> > > >> > POSIX/C need to acknowledge that multibyte encodings are reasonable
> > > >> > defaults (not just recommended extensions, but plausible defaults).
> > > >> Until
> > > >> > then, no: the C standard does not deliver for all and actively
> > harms the
> > > >> > development and growth of international text processing on large and
> > > >> small
> > > >> > hardware systems.
> > > >>
> > > >> I think you are not up to date. how can Linux and osx and other posix
> > > >> os'es deliver
> > > >> fully internationalized systems with support for more languages than
> > > >> microsoft windows?
> > > >> linux supports more than 100 languages, an mostly in utf-8.
> > > >>
> > > >> keld
> > > >> _______________________________________________
> > > >> SG16 Unicode mailing list
> > > >> Unicode_at_[hidden]
> > > >> http://www.open-std.org/mailman/listinfo/unicode
> > > >>
> > > >
> >

Received on 2019-04-30 14:19:34