Subject: Re: Is the concept of basic execution character sets useful?
From: Hubert Tong (hubert.reinterpretcast_at_[hidden])
Date: 2021-01-30 14:28:47
On Sat, Jan 30, 2021 at 2:26 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> On 30/01/2021 20.16, Corentin via SG16 wrote:
> basic execution character set shall be represented in each locale-specific
> > I think we want to say ( to match existing practice ), that the
> execution environment has an encoding / character set that is either the
> same or a super set of the execution character set (same values but may
> have extra members).
> > It is unclear that "local specific" currently say that.
> > I don't think the encoding interpretation of the above (which I
> think was the intended interpretation) actually matches existing practice
> (except perhaps for the "C" locale). That different locales present in
> runtime environments may encode characters within the basic execution
> character set differently is a practical reality (web search for "PPCS
> variant characters").
> > Unfortunately, when that's the case (and I agree that's the case more
> often than we'd like, another good example is shift-jis/win-1251), string
> literals cannot be interpreted properly by "locale specific" runtime
> > Such runtime function expects an encoding that is not the same as the
> string literal, it cannot interpret it correctly, which can lead to
> mojibake, etc.
> From a core language perspective, we have a compile-time encoding for
> (i.e. mapping of character sequences inside literals to code unit
> The actual execution environment of the program (possibly conveyed via
> might not be compatible with that. For the core language, I think we
> simply replace "execution character set" with "literal encoding" (narrow
> and wide),
> because we never actually care about character sets, just about encoding,
> i.e. a sequence of code units with which to initialize a string literal
> Maybe locale-dependent library functions just need to get a divorce from
Sure. We're dealing with the common requirements between encodings that
span across the core language and the library though. So we would still
want something to say that the locale-specific narrow/wide encodings
fulfill the requirements for being literal narrow/wide encodings.
SG16 list run by firstname.lastname@example.org