sg16: Re: [SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Wed, 14 Aug 2019 18:36:00 +0100

> The present implementation-defined interpretation of the byte sequence in
> source files allows a default of "UTF-8 in strings, comments can use
> arbitrary bytes" (which thus allows existing source files in a range of
> ASCII-compatible 8-bit character sets if the non-ASCII characters only
> appear in comments, without needing to tell the compiler which character
> set is being used). That approach (which is what GCC does by default)
> seems more friendly to users with existing source files using various
> character sets in comments than strictly requiring everything to be UTF-8
> (even in comments) unless the compiler is explicitly told otherwise.

I would find that choice unhelpful for tooling which processes C++
source code. e.g. Python, which insists that text you feed it is either
correct, or not text. And that's not unreasonable, either text is
encoded correctly, or it is not.

What do you think of my "all 7-bit clean ASCII" proposal? #pragma
encoding (if supported by your C compiler) to opt out.

Niall

Received on 2019-08-14 19:36:04