sg16: Re: [SG16] Is the concept of basic execution character sets useful?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 3 Feb 2021 22:55:36 +0100

On Wed, Feb 3, 2021 at 10:03 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 03/02/2021 21.45, Corentin wrote:
> >
> >
> > On Wed, Feb 3, 2021 at 9:22 PM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:
> Jens.Maurer_at_[hidden]>> wrote:
> >
> > On 03/02/2021 19.22, Corentin wrote:
> >
> > > I thought we had discussed that the standard library has
> certain
> > > facilities with locale-dependent character set.
> > > I haven't found a mention of "execution character set" in the
> library
> > > wording, so I'm interested in learning how these
> locale-dependent
> > > character sets are described / referenced.
> > >
> > >
> > > There is a whole new paragraph in the library introduction (page
> 10).
> >
> > That paragraph doesn't define the term "execution character set",
> > for example.
> >
> >
> > That paragraph is (supposed to be) the definition. these terms are not
> mentioned before and are introduced in this paragraph which (attempts to)
> describe them
>
> That paragraph fails in doing that.
>
> > And I have trouble parsing the sentences here. In particular, I
> > don't understand to what
> > "with the same value in the execution character set"
> > refers to ("the same" relative to what?)
> >
> >
> > Same code point value.
> > Say your literal encoding is ASCII, the code point value for 'A' is 65,
> then the execution encoding is such that the code point value of A is also
> 65.
>
> And that means std::isalpha, for example, will return true?
> Are any other functions affected by that constraint?
> Where did we have that constraint previously?
> Where is the C++20 normative statement for the edited
> footnote in [multibyte.string]?
>

I think the footnote only says that NTBS are NTMBS

> And does that mean I can't compile a program with an EBCDIC
> compiler (producing EBCDIC literal encoding) and then
> running it in an ASCII environment? Or does that just
> mean certain functions won't work on literals as
> expected, e.g. std::isalpha('a') might not return true?
>

Certain functions will be UB. They already are, in that is in your scenario
isalpha('a') violates the precondition that 'a' is a character in the
encoding of the current locale

std::string(runtime_string).find('a') will also return non sense

That constraint is currently not specified but, during execution, the
program does not distinguish literals from runtime data, or ordinary
literal encoding from execution encoding.
There are just strings assumed to be in execution encoding and if they
aren't they violate all of these functions preconditions.

>
> > I struggled a bit with the formulation.
> > I'm trying to say that both the execution character set and encoding are
> ""super sets"" of the literal ones, but "super set" of encoding does not
> seem like a good formulation.
>
> Where do we say that in the C++20 wording?
>

We don't. We should. (unless we are happy with isalpha('a') returning
false, puts("a") not displaying a and string("a").find('a') returning npos !

But I also don't see where the standard ever admits currently that the
execution encoding as defined in [lex] can ever be different from the one
used through the library.
I think for the standard they are currently one of the same, and if we want
to split execution encoding from literal encoding there should be a
description of how they relate to one another

>
> Jens
>

Received on 2021-02-03 15:55:50