C++ Logo

SG16

Advanced search

Subject: Re: Locales, Encodings and Unicode
From: Thiago Macieira (thiago_at_[hidden])
Date: 2020-01-25 12:04:31


On Saturday, 25 January 2020 02:31:00 PST Corentin Jabot via SG16 wrote:
> Changing the global locale from en_US to fr_FR requires changing the
> encoding too, if the encoding is not assumed to be an Unicode encoding.

No, it doesn't. The encoding should be left alone since it's what other
applications (including the terminal) use to match bytes to glyphs on screen.
There are plenty of encodings that can represent all of French and English,
not just UTF-8 (notably Latin1 and Latin9).

If the encoding can't represent "février", you're going to get garbage on
screen. You might see "f?vrier" or some other replacement character. So it
behooves the operating system to choose an encoding that can represent all of
the possible locales that the user may choose.

> But then again, you don't get to choose the encoding when doing i/o.
> In the case of a console, the environment detect what encoding should be
> used.
>
> Which leaves us with a few choices:
>
> - Use UTF-8
> - Don't use locales that are not the system locales or are not
> representable in the environment encoding

Those choices are beyond our control.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

SG16 list run by sg16-owner@lists.isocpp.org