C++ Logo

sg16

Advanced search

Re: [SG16] Is the concept of basic execution character sets useful?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 3 Feb 2021 19:22:56 +0100
On Wed, Feb 3, 2021 at 6:41 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 03/02/2021 00.09, Corentin wrote:
> >
> > On Tue, Feb 2, 2021 at 11:57 PM Victor Zverovich <
> victor.zverovich_at_[hidden] <mailto:victor.zverovich_at_[hidden]>> wrote:
> >
> > > For the core language, I think we should
> > > simply replace "execution character set" with "literal encoding"
> (narrow and wide),
> > > because we never actually care about character sets, just about
> encoding
> >
> > I would be very much in favor of this change. "Literal encoding" is
> exactly what this is and "execution character set" is just confusing. I
> also agree that it shouldn't be tied to locales in any way.
> >
> >
> > I'd love feedback on the draft I posted earlier in this thread which
> does that, whenever you have time before the next deadline :)
> > A slightly more recent draft is here
> https://isocpp.org/files/papers/D2297R0.pdf <
> https://isocpp.org/files/papers/D2297R0.pdf>
>
> My paper is doing the same updates:
>
> https://wiki.edg.com/pub/Wg21telecons2021/SG16/charset.html
>
> I'd suggest to use the terms "ordinary literal encoding" (for char) and
> "wide literal encoding" (for wchar_t)
> so that "literal encoding" remains available to refer to both, or to
> the general concept.
>
> The paper has funny italics (only italicize when defining a term, not when
> just mentioning it).
> It seems to lose the definition of "execution (wide) character set".
>
> I thought we had discussed that the standard library has certain
> facilities with locale-dependent character set.
> I haven't found a mention of "execution character set" in the library
> wording, so I'm interested in learning how these locale-dependent
> character sets are described / referenced.
>

There is a whole new paragraph in the library introduction (page 10).

>
> Jens
>
>
>
> >
> >
> >
> > - Victor
> >
> >
> > On Mon, Feb 1, 2021 at 1:22 AM Peter Brett via SG16 <
> sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> >
> > > -----Original Message-----
> > > From: SG16 <sg16-bounces_at_[hidden] <mailto:
> sg16-bounces_at_[hidden]>> On Behalf Of Jens Maurer via SG16
> > > Sent: 30 January 2021 19:26
> > > To: sg16_at_[hidden] <mailto:sg16_at_[hidden]>;
> Hubert Tong <hubert.reinterpretcast_at_[hidden] <mailto:
> hubert.reinterpretcast_at_[hidden]>>
> > > Cc: Jens Maurer <Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>>; Corentin <corentin.jabot_at_[hidden] <mailto:
> corentin.jabot_at_[hidden]>>
> > > Subject: Re: [SG16] Is the concept of basic execution
> character sets useful?
> > >
> > > > Unfortunately, when that's the case (and I agree that's the
> case more
> > > often than we'd like, another good example is
> shift-jis/win-1251), string
> > > literals cannot be interpreted properly by "locale specific"
> runtime
> > > functions.
> > > > Such runtime function expects an encoding that is not the
> same as the
> > > string literal, it cannot interpret it correctly, which can
> lead to
> > > mojibake, etc.
> > >
> > > From a core language perspective, we have a compile-time
> encoding for
> > > literals
> > > (i.e. mapping of character sequences inside literals to code
> unit
> > > sequences).
> > >
> > > The actual execution environment of the program (possibly
> conveyed via
> > > locale)
> > > might not be compatible with that. For the core language, I
> think we should
> > > simply replace "execution character set" with "literal
> encoding" (narrow and
> > > wide),
> > > because we never actually care about character sets, just
> about encoding,
> > > i.e. a sequence of code units with which to initialize a
> string literal
> > > object.
> > >
> > > Maybe locale-dependent library functions just need to get a
> divorce from
> > > that.
> >
> > Hi all,
> >
> > I agree with Jens.
> >
> > Although in principle a C++ interpreter could somehow make
> literals appear in a locale-specific encoding, all C++ implementations I'm
> aware of permanently fix the encoding of string literals at compilation
> time and before any knowledge of the run-time locale is available.
> >
> > Furthermore, we want C++ compilers processing a particular
> corpus of source code to produce the same executable no matter whether the
> compiler is being run in France, Germany, China or the USA. Locale can --
> should -- obviously affect compiler diagnostics, etc., but these are
> already implementation-defined and have no impact on the *effect* of
> processing the program.
> >
> > I think that it is best to keep all knowledge of
> locale-dependence in the library. I like the idea of replacing "execution
> character set" with "literal encoding" everywhere in the core language.
> >
> > Best regards,
> >
> > Peter
> > --
> > SG16 mailing list
> > SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> > https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
> >
>
>

Received on 2021-02-03 12:23:09