Subject: Re: Locales, Encodings and Unicode
From: Jens Maurer (Jens.Maurer_at_[hidden])
Date: 2020-01-10 15:07:57
On 08/01/2020 20.15, Corentin Jabot via SG16 wrote:
> Here is a paper attempting to describe some of the issue with the <locale> facilities
> I offer a a few solutions to explore but there is no denying it will be an uphill battle to remedy some of these issues.
> My goal was mostly to have a document we can refer people to and have a basis of conversation for ourselves.
I think a key observation here is that locale and encoding
need to get a divorce. And that probably means std::locale
needs to die (in its present shape and form).
To me, it seems the feature set of the current C or C++
localization facilities are so much sub-par that nobody
essentially uses them for anything serious. So, there
is little motivation to keep them except as a deprecated
I've heard that ICU is quite comprehensive in feature coverage,
so any future design should take that into account.
Regarding encoding, here's a situation I'm not sure how to
Suppose I have an xterm on my desktop configured for UTF-8,
and another xterm configured for (say) ISO 8859-1. I'm now
running the same binary in both xterms. What should happen?
It seems inefficient and possibly burdensome to support
one of several runtime-chosen encodings at every step of my
program, so the recommendation probably is to have a
(statically chosen) program-internal encoding (likely UTF-8
or UTF-32) plus conversion facilities that can convert to
the environment's encoding.
Whatever we do here, the programmer should have the ability
to opt-out of any locale support (beyond "C") and any
encoding conversion to keep the program footprint small for
situations where advanced locale/encoding fun is not needed.
SG16 list run by email@example.com