sg16: Re: [SG16] Locales, Encodings and Unicode

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 10 Jan 2020 22:07:57 +0100

On 08/01/2020 20.15, Corentin Jabot via SG16 wrote:
> Hello
> Here is a paper attempting to describe some of the issue with the <locale> facilities
> I offer a a few solutions to explore but there is no denying it will be an uphill battle to remedy some of these issues.
>
> My goal was mostly to have a document we can refer people to and have a basis of conversation for ourselves.
>
> https://github.com/cor3ntin/CPPProposals/raw/master/P2020/P2020.pdf

I think a key observation here is that locale and encoding
need to get a divorce. And that probably means std::locale
needs to die (in its present shape and form).

To me, it seems the feature set of the current C or C++
localization facilities are so much sub-par that nobody
essentially uses them for anything serious. So, there
is little motivation to keep them except as a deprecated
thing.

I've heard that ICU is quite comprehensive in feature coverage,
so any future design should take that into account.

Regarding encoding, here's a situation I'm not sure how to
handle:

Suppose I have an xterm on my desktop configured for UTF-8,
and another xterm configured for (say) ISO 8859-1. I'm now
running the same binary in both xterms. What should happen?
It seems inefficient and possibly burdensome to support
one of several runtime-chosen encodings at every step of my
program, so the recommendation probably is to have a
(statically chosen) program-internal encoding (likely UTF-8
or UTF-32) plus conversion facilities that can convert to
the environment's encoding.

Whatever we do here, the programmer should have the ability
to opt-out of any locale support (beyond "C") and any
encoding conversion to keep the program footprint small for
situations where advanced locale/encoding fun is not needed.

Jens

Received on 2020-01-10 15:10:31