C++ Logo

sg16

Advanced search

Re: [SG16] Locales, Encodings and Unicode

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Sat, 25 Jan 2020 10:27:56 +0100
On 24/01/2020 21.57, Corentin Jabot wrote:
> Locales do have encoding requirements.

Well, to be more precise, they have requirements on the
character set ("must contain French accented characters"),
but not on the actual encoding.

If you have Unicode as the character set, a French locale
is happy regardless of whether the encoding is UTF-8 or
UTF-16 or UTF-32 or whatever.

> If you want to format a date, February is février in France which cannot be encoded in an en_US.ASCII locale.

Your last word here is what I would really like to see
eradicated from the discussion except when prefixed with
POSIX or so. The (abstract) locale is "en_US", and that
locale would never produce "février".

> That's is why historically these things are related.
> It is also why character classification is related to locale despite these things being orthogonal.

I thought we wanted to fix historic accidents instead of
trying to preserve them? <cctype> should be left alone
by SG16; if you want Unicode-style character classification,
use a facility designed for that.

Jens

Received on 2020-01-25 03:30:35