On Tue, Feb 2, 2021 at 11:57 PM Victor Zverovich <victor.zverovich@gmail.com> wrote:
> For the core language, I think we should
> simply replace "execution character set" with "literal encoding" (narrow and wide),
> because we never actually care about character sets, just about encoding

I would be very much in favor of this change. "Literal encoding" is exactly what this is and "execution character set" is just confusing. I also agree that it shouldn't be tied to locales in any way.


I'd love feedback on the draft I posted earlier in this thread which does that, whenever you have time before the next deadline :)
A slightly more recent draft is here https://isocpp.org/files/papers/D2297R0.pdf
 

 
- Victor


On Mon, Feb 1, 2021 at 1:22 AM Peter Brett via SG16 <sg16@lists.isocpp.org> wrote:
> -----Original Message-----
> From: SG16 <sg16-bounces@lists.isocpp.org> On Behalf Of Jens Maurer via SG16
> Sent: 30 January 2021 19:26
> To: sg16@lists.isocpp.org; Hubert Tong <hubert.reinterpretcast@gmail.com>
> Cc: Jens Maurer <Jens.Maurer@gmx.net>; Corentin <corentin.jabot@gmail.com>
> Subject: Re: [SG16] Is the concept of basic execution character sets useful?
>
> > Unfortunately, when that's the case (and I agree that's the case more
> often than we'd like, another good example is shift-jis/win-1251), string
> literals cannot be interpreted properly by "locale specific" runtime
> functions.
> > Such runtime function expects an encoding that is not the same as the
> string literal, it cannot interpret it correctly, which can lead to
> mojibake, etc.
>
> From a core language perspective, we have a compile-time encoding for
> literals
> (i.e. mapping of character sequences inside literals to code unit
> sequences).
>
> The actual execution environment of the program (possibly conveyed via
> locale)
> might not be compatible with that.  For the core language, I think we should
> simply replace "execution character set" with "literal encoding" (narrow and
> wide),
> because we never actually care about character sets, just about encoding,
> i.e. a sequence of code units with which to initialize a string literal
> object.
>
> Maybe locale-dependent library functions just need to get a divorce from
> that.

Hi all,

I agree with Jens.

Although in principle a C++ interpreter could somehow make literals appear in a locale-specific encoding, all C++ implementations I'm aware of permanently fix the encoding of string literals at compilation time and before any knowledge of the run-time locale is available.

Furthermore, we want C++ compilers processing a particular corpus of source code to produce the same executable no matter whether the compiler is being run in France, Germany, China or the USA.  Locale can -- should -- obviously affect compiler diagnostics, etc., but these are already implementation-defined and have no impact on the *effect* of processing the program.

I think that it is best to keep all knowledge of locale-dependence in the library.  I like the idea of replacing "execution character set" with "literal encoding" everywhere in the core language.

Best regards,

                         Peter
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16