C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
From: Ville Voutilainen (ville.voutilainen_at_[hidden])
Date: 2019-08-13 11:08:57


On Tue, 13 Aug 2019 at 19:03, Niall Douglas <s_sourceforge_at_[hidden]> wrote:
>
> On 13/08/2019 15:27, Herring, Davis via Core wrote:
> >> Is it politically feasible for C++ 23 and C 2x to require
> >> implementations to default to interpreting source files as either (i) 7
> >> bit ASCII or (ii) UTF-8? To be specific, char literals would thus be
> >> either 7 bit ASCII or UTF-8.
> >
> > We could specify the source file directly as a sequence of ISO 10646 abstract characters, or even as a sequence of UTF-8 code units, but the implementation could choose to interpret the disk file to contain KOI-7 N1 with some sort of escape sequences for other characters. You might say "That's not UTF-8 on disk!", to which the implementation replies "That's how my operating system natively stores UTF-8." and the standard replies "What's a disk?".
>
> I think that's an unproductive way of looking at the situation.

I think your missing Davis's point. His point is that we can ban
source code that has an encoding that we don't like,
but that doesn't necessarily have any noticeable impact on anything,
because implementations may choose differently,
and they especially may choose a different default.

> I'd prefer to look at it this way:
> 1. How much existing code gets broken if when recompiled as C++ 23, the
> default is now to assume UTF-8 input unless input is obviously not that?

I don't know what that "default" means.


SG16 list run by sg16-owner@lists.isocpp.org