sg16: Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

From: Steve Downey <sdowney_at_[hidden]>
Date: Mon, 29 Apr 2019 17:26:48 -0400

Is there an open access version of 14652 ? It sounds like an extension of
chapters 6 and 7 of the Posix spec? The ISO desc mentions APIs that will be
developed? Also it looks like the spec is currently withdrawn, is there a
replacement? https://www.iso.org/standard/37069.html

On Mon, Apr 29, 2019 at 2:11 PM <keld_at_[hidden]> wrote:

> On Mon, Apr 29, 2019 at 02:00:23PM -0400, Steve Downey wrote:
> > Not "some". I want the entire set of Unicode functionality as a first
> class
> > citizen, although some of them have higher priority than others.
>
>
> I am willing to consider support for all of unicode functionality.
> much is however not defined by unicode. and not in a c/c++ style
> I think some unicode is not well designed, like ucs16.
> (the stuff they copied from me, however, is ok:-)
>
>
>
>
> > Keld, what capabilities not provided by Unicode algorithms and databases
> > are you concerned about not being supported? I've been doing text
> > processing a lot, and working with code points or scalar values has made
> my
> > life easier, with less complaints from my customers. Well, except for a
> few
> > who don't like CDATA in XML dumps, but were somehow OK with utterly
> broken
> > XML.
>
> i did a number of designs that they have not yet copied, eg in 14652, and
> posix has some
> non-unicode stuff on the way.
>
> keld
>
> > On Mon, Apr 29, 2019 at 1:48 PM Steve Downey <sdowney_at_[hidden]> wrote:
> >
> > > The "POSIX" and "C" locales, where the "POSIX" locale is the superset
> of
> > > capabilities of the "C" locale, but otherwise by definition
> equivalent, is
> > > the one you get if you do not make a setlocale() call.
> > > So, not _a_ posix locale, but _the_ POSIX locale.
> > >
> > > On Mon, Apr 29, 2019 at 1:37 PM <keld_at_[hidden]> wrote:
> > >
> > >> On Sun, Apr 28, 2019 at 05:25:28PM -0400, JeanHeyd Meneide wrote:
> > >> > On Sun, Apr 28, 2019 at 4:01 PM <keld_at_[hidden]> wrote:
> > >> >
> > >> > > I believe there are a number of encodings in East Asia that
> there
> > >> will
> > >> > > still be
> > >> > > developed for for quite some time.
> > >> > >
> > >> > > major languages and toolkits and operating systems are still
> > >> character set
> > >> > > independent.
> > >> > > some people believe that unicode has not won, and some people are
> not
> > >> > > happy with
> > >> > > the unicode consortium. why abandon a model that still delivers
> for
> > >> all?
> > >> > >
> > >> > > keld
> > >> > >
> > >> >
> > >> > I think there's really only one thing that needs to be fixed, and
> that's
> > >> > the POSIX and C locales. Right now, they force a by-requirement 256
> > >> > single-byte encoding. (Chapter 6, Section 2, first sentence:
> > >> > http://pubs.opengroup.org/onlinepubs/9699919799/).
> > >>
> > >> the posix std has since 1991 had provisions for iso 10646 and most
> posix
> > >> implementations
> > >> today supports iso 10646 and iso 14651 - with a lot of collation and
> > >> character attribure support
> > >> long befor unicide made something up.
> > >>
> > >> >
> > >> > This restriction is what has been utterly and absolutely destroying
> the
> > >> > ability to behave properly with a large set of encodings deployed
> around
> > >> > the world, including Unicode, as a default. I am actually spending
> time
> > >> and
> > >> > cycles now contacting people on the C Standards Committee and
> reaching
> > >> out
> > >> > to people to find the POSIX individuals responsible for overseeing
> this
> > >> > standard: that the locale is a single-byte encoding is not
> "character
> > >> set
> > >> > independent": it means that only a small fraction (ASCII, or
> similar)
> > >> can
> > >> > possibly be the default C or POSIX locale. That Unicode
> (specifically,
> > >> > UTF8) happens to work in C and C++ is because the defaults for many
> of
> > >> the
> > >> > implementations simply pass char/wchar_t/char16_t/char32_t through
> their
> > >> > interfaces and do not touch it. But, the moment anyone uses facets
> or
> > >> > locales in any meaningful manner, much of it falls over.
> > >>
> > >> this is not true, quite the contrary.
> > >> yes posix has a standard posix locale which is 7/8 bit and portable,
> > >> but 10646 has been supported since 1991 in posix. and works are
> inderway
> > >> for a posix 10646 locale,
> > >> iso 14652 has a candidate for that which is also the base for many
> glibc
> > >> national locales.
> > >>
> > >>
> > >> >
> > >> > POSIX/C need to acknowledge that multibyte encodings are reasonable
> > >> > defaults (not just recommended extensions, but plausible defaults).
> > >> Until
> > >> > then, no: the C standard does not deliver for all and actively
> harms the
> > >> > development and growth of international text processing on large and
> > >> small
> > >> > hardware systems.
> > >>
> > >> I think you are not up to date. how can Linux and osx and other posix
> > >> os'es deliver
> > >> fully internationalized systems with support for more languages than
> > >> microsoft windows?
> > >> linux supports more than 100 languages, an mostly in utf-8.
> > >>
> > >> keld
> > >> _______________________________________________
> > >> SG16 Unicode mailing list
> > >> Unicode_at_[hidden]
> > >> http://www.open-std.org/mailman/listinfo/unicode
> > >>
> > >
>

Received on 2019-04-29 23:27:02