C++ Logo

sg16

Advanced search

Re: [SG16] Is the concept of basic execution character sets useful?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 3 Feb 2021 00:09:02 +0100
On Tue, Feb 2, 2021 at 11:57 PM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:

> > For the core language, I think we should
> > simply replace "execution character set" with "literal encoding" (narrow
> and wide),
> > because we never actually care about character sets, just about encoding
>
> I would be very much in favor of this change. "Literal encoding" is
> exactly what this is and "execution character set" is just confusing. I
> also agree that it shouldn't be tied to locales in any way.
>
>
I'd love feedback on the draft I posted earlier in this thread which does
that, whenever you have time before the next deadline :)
A slightly more recent draft is here
https://isocpp.org/files/papers/D2297R0.pdf




> - Victor
>
>
> On Mon, Feb 1, 2021 at 1:22 AM Peter Brett via SG16 <sg16_at_[hidden]>
> wrote:
>
>> > -----Original Message-----
>> > From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer
>> via SG16
>> > Sent: 30 January 2021 19:26
>> > To: sg16_at_[hidden]; Hubert Tong <
>> hubert.reinterpretcast_at_[hidden]>
>> > Cc: Jens Maurer <Jens.Maurer_at_[hidden]>; Corentin <
>> corentin.jabot_at_[hidden]>
>> > Subject: Re: [SG16] Is the concept of basic execution character sets
>> useful?
>> >
>> > > Unfortunately, when that's the case (and I agree that's the case more
>> > often than we'd like, another good example is shift-jis/win-1251),
>> string
>> > literals cannot be interpreted properly by "locale specific" runtime
>> > functions.
>> > > Such runtime function expects an encoding that is not the same as the
>> > string literal, it cannot interpret it correctly, which can lead to
>> > mojibake, etc.
>> >
>> > From a core language perspective, we have a compile-time encoding for
>> > literals
>> > (i.e. mapping of character sequences inside literals to code unit
>> > sequences).
>> >
>> > The actual execution environment of the program (possibly conveyed via
>> > locale)
>> > might not be compatible with that. For the core language, I think we
>> should
>> > simply replace "execution character set" with "literal encoding"
>> (narrow and
>> > wide),
>> > because we never actually care about character sets, just about
>> encoding,
>> > i.e. a sequence of code units with which to initialize a string literal
>> > object.
>> >
>> > Maybe locale-dependent library functions just need to get a divorce from
>> > that.
>>
>> Hi all,
>>
>> I agree with Jens.
>>
>> Although in principle a C++ interpreter could somehow make literals
>> appear in a locale-specific encoding, all C++ implementations I'm aware of
>> permanently fix the encoding of string literals at compilation time and
>> before any knowledge of the run-time locale is available.
>>
>> Furthermore, we want C++ compilers processing a particular corpus of
>> source code to produce the same executable no matter whether the compiler
>> is being run in France, Germany, China or the USA. Locale can -- should --
>> obviously affect compiler diagnostics, etc., but these are already
>> implementation-defined and have no impact on the *effect* of processing the
>> program.
>>
>> I think that it is best to keep all knowledge of locale-dependence in the
>> library. I like the idea of replacing "execution character set" with
>> "literal encoding" everywhere in the core language.
>>
>> Best regards,
>>
>> Peter
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2021-02-02 17:09:16