C++ Logo

liaison

Advanced search

Re: [wg14/wg21 liaison] [SG16-Unicode] [isocpp-core] Source file encoding

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 14 Aug 2019 23:38:56 -0400
On 8/14/19 6:18 PM, Niall Douglas wrote:
> On 14/08/2019 19:24, Billy O'Neal (VC LIBS) wrote:
>>> Far more importantly, if the committee can assume unicode-clean
>> source code going forth, that makes far more tractable lots of other
>> problems such as how char string literals ought to be interpreted.
>>
>> I don't think this actually matters for implementations. The standard
>> can describe what happens for Unicode and let implementations figure out
>> what that means for the legacy encodings they target. An implementation
>> on an EBCDIC machine, for example, can do an 'as if' notional conversion
>> into UTF-8 for the purposes of following the standard's rules.
> Just to be clear, I'm not referring to anything about implementation
> quality nor correctness wrt source files here. That all pretty much
> "just works" for each compiler, or rather, each compiler can be poked
> and prodded to just work eventually.
>
> I *am* speaking about the user experience, where if the standard insists
> on ASCII-only-if-otherwise-not-specified, then typing umlauts into the
> source code will yield a useful compiler error saying "Please add a
> #pragma encoding to tell me what encoding this source file is". Like
> with Python 2.

The standard already permits implementations to do this. The fact that
none do (by default) should be considered informative. This doesn't
sound like something we need the standard telling implementors to do,
though adding an encoding pragma may help enable implementations to do
something like what is suggested here.

Tom.

>
> Then because we always know the source file encoding, we can make other
> end user experience improvements. Most of the problems with encoding
> are, of course, the fact it isn't specified. This would fix that for one
> situation, which is C and C++ source code.
>
> That's my pitch. I pitch nothing regarding runtime encoding, which is a
> viper's nest, and will remain so for decades to come.
>
> Niall
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-08-14 22:40:59