C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++
From: Steve Downey (sdowney_at_[hidden])
Date: 2019-04-29 13:00:23


Not "some". I want the entire set of Unicode functionality as a first class
citizen, although some of them have higher priority than others.

Keld, what capabilities not provided by Unicode algorithms and databases
are you concerned about not being supported? I've been doing text
processing a lot, and working with code points or scalar values has made my
life easier, with less complaints from my customers. Well, except for a few
who don't like CDATA in XML dumps, but were somehow OK with utterly broken
XML.

On Mon, Apr 29, 2019 at 1:48 PM Steve Downey <sdowney_at_[hidden]> wrote:

> The "POSIX" and "C" locales, where the "POSIX" locale is the superset of
> capabilities of the "C" locale, but otherwise by definition equivalent, is
> the one you get if you do not make a setlocale() call.
> So, not _a_ posix locale, but _the_ POSIX locale.
>
> On Mon, Apr 29, 2019 at 1:37 PM <keld_at_[hidden]> wrote:
>
>> On Sun, Apr 28, 2019 at 05:25:28PM -0400, JeanHeyd Meneide wrote:
>> > On Sun, Apr 28, 2019 at 4:01 PM <keld_at_[hidden]> wrote:
>> >
>> > > I believe there are a number of encodings in East Asia that there
>> will
>> > > still be
>> > > developed for for quite some time.
>> > >
>> > > major languages and toolkits and operating systems are still
>> character set
>> > > independent.
>> > > some people believe that unicode has not won, and some people are not
>> > > happy with
>> > > the unicode consortium. why abandon a model that still delivers for
>> all?
>> > >
>> > > keld
>> > >
>> >
>> > I think there's really only one thing that needs to be fixed, and that's
>> > the POSIX and C locales. Right now, they force a by-requirement 256
>> > single-byte encoding. (Chapter 6, Section 2, first sentence:
>> > http://pubs.opengroup.org/onlinepubs/9699919799/).
>>
>> the posix std has since 1991 had provisions for iso 10646 and most posix
>> implementations
>> today supports iso 10646 and iso 14651 - with a lot of collation and
>> character attribure support
>> long befor unicide made something up.
>>
>> >
>> > This restriction is what has been utterly and absolutely destroying the
>> > ability to behave properly with a large set of encodings deployed around
>> > the world, including Unicode, as a default. I am actually spending time
>> and
>> > cycles now contacting people on the C Standards Committee and reaching
>> out
>> > to people to find the POSIX individuals responsible for overseeing this
>> > standard: that the locale is a single-byte encoding is not "character
>> set
>> > independent": it means that only a small fraction (ASCII, or similar)
>> can
>> > possibly be the default C or POSIX locale. That Unicode (specifically,
>> > UTF8) happens to work in C and C++ is because the defaults for many of
>> the
>> > implementations simply pass char/wchar_t/char16_t/char32_t through their
>> > interfaces and do not touch it. But, the moment anyone uses facets or
>> > locales in any meaningful manner, much of it falls over.
>>
>> this is not true, quite the contrary.
>> yes posix has a standard posix locale which is 7/8 bit and portable,
>> but 10646 has been supported since 1991 in posix. and works are inderway
>> for a posix 10646 locale,
>> iso 14652 has a candidate for that which is also the base for many glibc
>> national locales.
>>
>>
>> >
>> > POSIX/C need to acknowledge that multibyte encodings are reasonable
>> > defaults (not just recommended extensions, but plausible defaults).
>> Until
>> > then, no: the C standard does not deliver for all and actively harms the
>> > development and growth of international text processing on large and
>> small
>> > hardware systems.
>>
>> I think you are not up to date. how can Linux and osx and other posix
>> os'es deliver
>> fully internationalized systems with support for more languages than
>> microsoft windows?
>> linux supports more than 100 languages, an mostly in utf-8.
>>
>> keld
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode_at_[hidden]
>> http://www.open-std.org/mailman/listinfo/unicode
>>
>



SG16 list run by sg16-owner@lists.isocpp.org