Date: Sat, 30 Jan 2021 22:08:42 +0100
On Sat, Jan 30, 2021 at 8:26 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
> On 30/01/2021 20.16, Corentin via SG16 wrote:
> basic execution character set shall be represented in each locale-specific
> encoding.
> >
> >
> > I think we want to say ( to match existing practice ), that the
> execution environment has an encoding / character set that is either the
> same or a super set of the execution character set (same values but may
> have extra members).
> > It is unclear that "local specific" currently say that.
> >
> > I don't think the encoding interpretation of the above (which I
> think was the intended interpretation) actually matches existing practice
> (except perhaps for the "C" locale). That different locales present in
> runtime environments may encode characters within the basic execution
> character set differently is a practical reality (web search for "PPCS
> variant characters").
> >
> >
> > Unfortunately, when that's the case (and I agree that's the case more
> often than we'd like, another good example is shift-jis/win-1251), string
> literals cannot be interpreted properly by "locale specific" runtime
> functions.
> > Such runtime function expects an encoding that is not the same as the
> string literal, it cannot interpret it correctly, which can lead to
> mojibake, etc.
>
> From a core language perspective, we have a compile-time encoding for
> literals
> (i.e. mapping of character sequences inside literals to code unit
> sequences).
>
> The actual execution environment of the program (possibly conveyed via
> locale)
> might not be compatible with that. For the core language, I think we
> should
> simply replace "execution character set" with "literal encoding" (narrow
> and wide),
> because we never actually care about character sets, just about encoding,
> i.e. a sequence of code units with which to initialize a string literal
> object.
> Maybe locale-dependent library functions just need to get a divorce from
> that.
>
+1 :)
>
> Jens
>
> On 30/01/2021 20.16, Corentin via SG16 wrote:
> basic execution character set shall be represented in each locale-specific
> encoding.
> >
> >
> > I think we want to say ( to match existing practice ), that the
> execution environment has an encoding / character set that is either the
> same or a super set of the execution character set (same values but may
> have extra members).
> > It is unclear that "local specific" currently say that.
> >
> > I don't think the encoding interpretation of the above (which I
> think was the intended interpretation) actually matches existing practice
> (except perhaps for the "C" locale). That different locales present in
> runtime environments may encode characters within the basic execution
> character set differently is a practical reality (web search for "PPCS
> variant characters").
> >
> >
> > Unfortunately, when that's the case (and I agree that's the case more
> often than we'd like, another good example is shift-jis/win-1251), string
> literals cannot be interpreted properly by "locale specific" runtime
> functions.
> > Such runtime function expects an encoding that is not the same as the
> string literal, it cannot interpret it correctly, which can lead to
> mojibake, etc.
>
> From a core language perspective, we have a compile-time encoding for
> literals
> (i.e. mapping of character sequences inside literals to code unit
> sequences).
>
> The actual execution environment of the program (possibly conveyed via
> locale)
> might not be compatible with that. For the core language, I think we
> should
> simply replace "execution character set" with "literal encoding" (narrow
> and wide),
> because we never actually care about character sets, just about encoding,
> i.e. a sequence of code units with which to initialize a string literal
> object.
> Maybe locale-dependent library functions just need to get a divorce from
> that.
>
+1 :)
>
> Jens
>
Received on 2021-01-30 15:08:54