sg16: Re: [SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Wed, 14 Aug 2019 23:18:31 +0100

On 14/08/2019 19:24, Billy O'Neal (VC LIBS) wrote:
>>Far more importantly, if the committee can assume unicode-clean
> source code going forth, that makes far more tractable lots of other
> problems such as how char string literals ought to be interpreted.
>
> I don't think this actually matters for implementations. The standard
> can describe what happens for Unicode and let implementations figure out
> what that means for the legacy encodings they target. An implementation
> on an EBCDIC machine, for example, can do an 'as if' notional conversion
> into UTF-8 for the purposes of following the standard's rules.

Just to be clear, I'm not referring to anything about implementation
quality nor correctness wrt source files here. That all pretty much
"just works" for each compiler, or rather, each compiler can be poked
and prodded to just work eventually.

I *am* speaking about the user experience, where if the standard insists
on ASCII-only-if-otherwise-not-specified, then typing umlauts into the
source code will yield a useful compiler error saying "Please add a
#pragma encoding to tell me what encoding this source file is". Like
with Python 2.

Then because we always know the source file encoding, we can make other
end user experience improvements. Most of the problems with encoding
are, of course, the fact it isn't specified. This would fix that for one
situation, which is C and C++ source code.

That's my pitch. I pitch nothing regarding runtime encoding, which is a
viper's nest, and will remain so for decades to come.

Niall

Received on 2019-08-15 00:18:34