Date: Fri, 10 Jan 2020 22:07:57 +0100
On 08/01/2020 20.15, Corentin Jabot via SG16 wrote:
> Hello
> Here is a paper attempting to describe some of the issue with the <locale> facilities
> I offer a a few solutions to explore but there is no denying it will be an uphill battle to remedy some of these issues.
>
> My goal was mostly to have a document we can refer people to and have a basis of conversation for ourselves.
>
> https://github.com/cor3ntin/CPPProposals/raw/master/P2020/P2020.pdf
I think a key observation here is that locale and encoding
need to get a divorce. And that probably means std::locale
needs to die (in its present shape and form).
To me, it seems the feature set of the current C or C++
localization facilities are so much sub-par that nobody
essentially uses them for anything serious. So, there
is little motivation to keep them except as a deprecated
thing.
I've heard that ICU is quite comprehensive in feature coverage,
so any future design should take that into account.
Regarding encoding, here's a situation I'm not sure how to
handle:
Suppose I have an xterm on my desktop configured for UTF-8,
and another xterm configured for (say) ISO 8859-1. I'm now
running the same binary in both xterms. What should happen?
It seems inefficient and possibly burdensome to support
one of several runtime-chosen encodings at every step of my
program, so the recommendation probably is to have a
(statically chosen) program-internal encoding (likely UTF-8
or UTF-32) plus conversion facilities that can convert to
the environment's encoding.
Whatever we do here, the programmer should have the ability
to opt-out of any locale support (beyond "C") and any
encoding conversion to keep the program footprint small for
situations where advanced locale/encoding fun is not needed.
Jens
> Hello
> Here is a paper attempting to describe some of the issue with the <locale> facilities
> I offer a a few solutions to explore but there is no denying it will be an uphill battle to remedy some of these issues.
>
> My goal was mostly to have a document we can refer people to and have a basis of conversation for ourselves.
>
> https://github.com/cor3ntin/CPPProposals/raw/master/P2020/P2020.pdf
I think a key observation here is that locale and encoding
need to get a divorce. And that probably means std::locale
needs to die (in its present shape and form).
To me, it seems the feature set of the current C or C++
localization facilities are so much sub-par that nobody
essentially uses them for anything serious. So, there
is little motivation to keep them except as a deprecated
thing.
I've heard that ICU is quite comprehensive in feature coverage,
so any future design should take that into account.
Regarding encoding, here's a situation I'm not sure how to
handle:
Suppose I have an xterm on my desktop configured for UTF-8,
and another xterm configured for (say) ISO 8859-1. I'm now
running the same binary in both xterms. What should happen?
It seems inefficient and possibly burdensome to support
one of several runtime-chosen encodings at every step of my
program, so the recommendation probably is to have a
(statically chosen) program-internal encoding (likely UTF-8
or UTF-32) plus conversion facilities that can convert to
the environment's encoding.
Whatever we do here, the programmer should have the ability
to opt-out of any locale support (beyond "C") and any
encoding conversion to keep the program footprint small for
situations where advanced locale/encoding fun is not needed.
Jens
Received on 2020-01-10 15:10:31