C++ Logo

SG16

Advanced search

Subject: Re: Locales, Encodings and Unicode
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-01-25 03:27:56


On 24/01/2020 21.57, Corentin Jabot wrote:
> Locales do have encoding requirements.

Well, to be more precise, they have requirements on the
character set ("must contain French accented characters"),
but not on the actual encoding.

If you have Unicode as the character set, a French locale
is happy regardless of whether the encoding is UTF-8 or
UTF-16 or UTF-32 or whatever.

> If you want to format a date, February is février in France which cannot be encoded in an en_US.ASCII locale.

Your last word here is what I would really like to see
eradicated from the discussion except when prefixed with
POSIX or so. The (abstract) locale is "en_US", and that
locale would never produce "février".

> That's is why historically these things are related.
> It is also why character classification is related to locale despite these things being orthogonal.

I thought we wanted to fix historic accidents instead of
trying to preserve them? <cctype> should be left alone
by SG16; if you want Unicode-style character classification,
use a facility designed for that.

Jens


SG16 list run by sg16-owner@lists.isocpp.org