C++ Logo

sg16

Advanced search

Re: [SG16] Is the concept of basic execution character sets useful?

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Tue, 2 Feb 2021 14:57:48 -0800
> For the core language, I think we should
> simply replace "execution character set" with "literal encoding" (narrow
and wide),
> because we never actually care about character sets, just about encoding

I would be very much in favor of this change. "Literal encoding" is exactly
what this is and "execution character set" is just confusing. I also agree
that it shouldn't be tied to locales in any way.

- Victor


On Mon, Feb 1, 2021 at 1:22 AM Peter Brett via SG16 <sg16_at_[hidden]>
wrote:

> > -----Original Message-----
> > From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer via
> SG16
> > Sent: 30 January 2021 19:26
> > To: sg16_at_[hidden]; Hubert Tong <hubert.reinterpretcast_at_[hidden]
> >
> > Cc: Jens Maurer <Jens.Maurer_at_[hidden]>; Corentin <
> corentin.jabot_at_[hidden]>
> > Subject: Re: [SG16] Is the concept of basic execution character sets
> useful?
> >
> > > Unfortunately, when that's the case (and I agree that's the case more
> > often than we'd like, another good example is shift-jis/win-1251), string
> > literals cannot be interpreted properly by "locale specific" runtime
> > functions.
> > > Such runtime function expects an encoding that is not the same as the
> > string literal, it cannot interpret it correctly, which can lead to
> > mojibake, etc.
> >
> > From a core language perspective, we have a compile-time encoding for
> > literals
> > (i.e. mapping of character sequences inside literals to code unit
> > sequences).
> >
> > The actual execution environment of the program (possibly conveyed via
> > locale)
> > might not be compatible with that. For the core language, I think we
> should
> > simply replace "execution character set" with "literal encoding" (narrow
> and
> > wide),
> > because we never actually care about character sets, just about encoding,
> > i.e. a sequence of code units with which to initialize a string literal
> > object.
> >
> > Maybe locale-dependent library functions just need to get a divorce from
> > that.
>
> Hi all,
>
> I agree with Jens.
>
> Although in principle a C++ interpreter could somehow make literals appear
> in a locale-specific encoding, all C++ implementations I'm aware of
> permanently fix the encoding of string literals at compilation time and
> before any knowledge of the run-time locale is available.
>
> Furthermore, we want C++ compilers processing a particular corpus of
> source code to produce the same executable no matter whether the compiler
> is being run in France, Germany, China or the USA. Locale can -- should --
> obviously affect compiler diagnostics, etc., but these are already
> implementation-defined and have no impact on the *effect* of processing the
> program.
>
> I think that it is best to keep all knowledge of locale-dependence in the
> library. I like the idea of replacing "execution character set" with
> "literal encoding" everywhere in the core language.
>
> Best regards,
>
> Peter
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-02-02 16:58:03