C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)
From: Joseph Myers (joseph_at_[hidden])
Date: 2019-08-14 12:01:19


On Wed, 14 Aug 2019, Niall Douglas via Liaison wrote:

> Just make the entire lot UTF-8! And let individual files opt-out if they
> want, or whole TUs if the user asks the compiler to do so, with the
> standard making it very clear that anything other than UTF-8 =
> implementation defined behaviour for C++ 23 onwards.

The present implementation-defined interpretation of the byte sequence in
source files allows a default of "UTF-8 in strings, comments can use
arbitrary bytes" (which thus allows existing source files in a range of
ASCII-compatible 8-bit character sets if the non-ASCII characters only
appear in comments, without needing to tell the compiler which character
set is being used). That approach (which is what GCC does by default)
seems more friendly to users with existing source files using various
character sets in comments than strictly requiring everything to be UTF-8
(even in comments) unless the compiler is explicitly told otherwise.

-- 
Joseph S. Myers
joseph_at_[hidden]

SG16 list run by sg16-owner@lists.isocpp.org