sg16: Re: [SG16] Comment on P1885R0: Naming Text Encodings to Demystify Them

From: Steve Downey <sdowney_at_[hidden]>
Date: Thu, 23 Jan 2020 18:43:17 -0500

Execution character set isn't the right term. It's a basic execution
character set is super set of the source character set, including a few
control characters. The values of the characters, which is what we are
interested in, is not the same, and are locale-specific. Saying locale,
however is too general. We need to distinguish the fixed translation of
literals from whatever the current locale specified encoding, including the
default "C" locale.
The footnote 9) in [lex.charset] also mentions that the intent is to
identify characters from the ascii subset in ISO 10646, but it's also clear
that we're not specifying that the execution character set is unicode,
although given the rules regarding universal character names, the internal
encoding pretty much has to be a unicode transformation form.
The standard language around encodings really needs an update. I've got a
paper working towards that.

On Thu, Jan 23, 2020, 17:32 Jens Maurer via SG16 <sg16_at_[hidden]>
wrote:

> On 23/01/2020 23.19, Corentin Jabot wrote:
> >
> >
> > On Thu, Jan 23, 2020, 21:57 Jens Maurer via SG16 <sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>> wrote:
> >
> > Hi,
> >
> > We talked quite a bit about this paper in the teleconference.
> >
> > I have another concern: The core language defines the
> > terms "execution character set" and "execution wide-character set"
> > in [lex.charset].
> >
> > The wording in the paper should use exactly these phrases, with
> > an appropriate cross-reference.
> >
> > Given these definitions, I'm a bit concern about the name of
> > the member function "literal". If it wants to talk about the
> > execution character set, it should state so in its name.
> >
> >
> > While we can bikeshed the particulars, the paper does explain the names
> chosen.
>
> That's one part of my concern; the other is the expression
> of the specification. If the core language specifies a term
> that has the right semantics, the library wording should use it.
>
> > The core wording is not necessarily intuitive for users.
>
> Mission accomplished.
>
> > The core wording also assumes (it doesn't really have a choice) that the
> execution encoding is a subset of the encoding associated to the current
> locale).
>
> I don't understand that sentence.
> I thought locales and encoding should (conceptually) get
> a divorce.
>
> Or is your concern that "execution character set" sounds like a
> compile-time constant, whereas the environment's character set
> might actually be runtime-defined (e.g. xterm for UTF-8 vs. Latin-1)?
>
> If so, do you suggest changes to the definition of
> "execution character set"? Put differently, do you anticipate that
> literal() might return a text_encoding that is different from the
> execution character set? Or is there some haziness between
> "character set" and "encoding" in the core language? After all,
> when translating literals to the execution character set, the
> compiler actually has to pick an encoding, because it has
> to put string literals down to program memory.
>
> Jens
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-01-23 17:46:02