C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
From: Thiago Macieira (thiago_at_[hidden])
Date: 2019-08-14 14:41:31


On Wednesday, 14 August 2019 00:54:28 PDT Peter Dimov wrote:
> - what file names use, per filesystem, there can be more than one (*)

There's some work in Linux to create a per-directory setting that configures
the character set and case sensitiveness (and I'm going to guess locale too,
as soon as Turkish users are involved). I don't think this is ready.

> - what file contents use

Here we can make an easy distinction: text files and binary files. Text files
are always encoded in the locale-provided runtime execution encoding, whereas
everything else is binary. If you want to interpret those bytes, you need to
use some library to convert from bytes to text.

Some libraries can provide an extension to fopen() that automatically does
this for you. glibc does:
        fopen(name", "r,ccs=latin1")

> - what the console/the terminal uses

That should also be the locale runtime encoding, under any sane configuration.
You can have a misconfigured terminal application -- this used to happen in
2004 quite often. But that's a mistake, not the expected behaviour.

The terminal may be capable of showing more than the locale expects, but
that's an implementation-defined extension. For example, Unix terminals have
been capable of switching to UTF-8 mode with an escape sequence, but no one
uses that nowadays; the Windows console technically receives the data via the
wide-char API.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

SG16 list run by sg16-owner@lists.isocpp.org