C++ Logo


Advanced search

Re: [SG16] Comment on P1885R0: Naming Text Encodings to Demystify Them

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Fri, 24 Jan 2020 00:41:18 +0100
On Thu, 23 Jan 2020 at 23:32, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 23/01/2020 23.19, Corentin Jabot wrote:
> >
> >
> > On Thu, Jan 23, 2020, 21:57 Jens Maurer via SG16 <sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>> wrote:
> >
> > Hi,
> >
> > We talked quite a bit about this paper in the teleconference.
> >
> > I have another concern: The core language defines the
> > terms "execution character set" and "execution wide-character set"
> > in [lex.charset].
> >
> > The wording in the paper should use exactly these phrases, with
> > an appropriate cross-reference.
> >
> > Given these definitions, I'm a bit concern about the name of
> > the member function "literal". If it wants to talk about the
> > execution character set, it should state so in its name.
> >
> >
> > While we can bikeshed the particulars, the paper does explain the names
> chosen.
> That's one part of my concern; the other is the expression
> of the specification. If the core language specifies a term
> that has the right semantics, the library wording should use it.

That's a good point.
I'll make sure the wording use the right term

> > The core wording is not necessarily intuitive for users.
> Mission accomplished.
> > The core wording also assumes (it doesn't really have a choice) that the
> execution encoding is a subset of the encoding associated to the current
> locale).
> I don't understand that sentence.
> I thought locales and encoding should (conceptually) get
> a divorce.

Yep, but separate paper!
Right now we can only speak about locale associated encoding in the wording.

> Or is your concern that "execution character set" sounds like a
> compile-time constant, whereas the environment's character set
> might actually be runtime-defined (e.g. xterm for UTF-8 vs. Latin-1)?
> If so, do you suggest changes to the definition of
> "execution character set"? Put differently, do you anticipate that
> literal() might return a text_encoding that is different from the
> execution character set? Or is there some haziness between
> "character set" and "encoding" in the core language? After all,
> when translating literals to the execution character set, the
> compiler actually has to pick an encoding, because it has
> to put string literals down to program memory.

No, the wording is fine.
The underlying issue is that (to the best of my understanding, this is your
area of expertise, not mine!), the standard consider both compilation/
constant evaluation and runtime as "execution".
In practice of course if the "execution" encoding is set to be UTF-8 during
compilation but later executed on an ebcdic machine, attempting to do any
kind of text i/o will result in mojibake, as the information of what the
execution encoding was is lost (hence this proposal), and no conversion is
implicitly performed.

In that context, I am afraid that "execution" will be understood at
"runtime" by many people.
But again, you are right that I should have used "execution encoding" in my
wording - independently of the user facing method name.

It's something that Steve is also looking into

> Jens

Received on 2020-01-23 17:44:04