C++ Logo

sg16

Advanced search

Re: [SG16] Comment on P1885R0: Naming Text Encodings to Demystify Them

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Fri, 24 Jan 2020 00:41:18 +0100
On Thu, 23 Jan 2020 at 23:32, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 23/01/2020 23.19, Corentin Jabot wrote:
> >
> >
> > On Thu, Jan 23, 2020, 21:57 Jens Maurer via SG16 <sg16_at_[hidden]
> <mailto:sg16_at_[hidden]>> wrote:
> >
> > Hi,
> >
> > We talked quite a bit about this paper in the teleconference.
> >
> > I have another concern: The core language defines the
> > terms "execution character set" and "execution wide-character set"
> > in [lex.charset].
> >
> > The wording in the paper should use exactly these phrases, with
> > an appropriate cross-reference.
> >
> > Given these definitions, I'm a bit concern about the name of
> > the member function "literal". If it wants to talk about the
> > execution character set, it should state so in its name.
> >
> >
> > While we can bikeshed the particulars, the paper does explain the names
> chosen.
>
> That's one part of my concern; the other is the expression
> of the specification. If the core language specifies a term
> that has the right semantics, the library wording should use it.
>

That's a good point.
I'll make sure the wording use the right term


>
> > The core wording is not necessarily intuitive for users.
>
> Mission accomplished.
>
> > The core wording also assumes (it doesn't really have a choice) that the
> execution encoding is a subset of the encoding associated to the current
> locale).
>
> I don't understand that sentence.
> I thought locales and encoding should (conceptually) get
> a divorce.
>

Yep, but separate paper!
Right now we can only speak about locale associated encoding in the wording.


>
> Or is your concern that "execution character set" sounds like a
> compile-time constant, whereas the environment's character set
> might actually be runtime-defined (e.g. xterm for UTF-8 vs. Latin-1)?
>
> If so, do you suggest changes to the definition of
> "execution character set"? Put differently, do you anticipate that
> literal() might return a text_encoding that is different from the
> execution character set? Or is there some haziness between
> "character set" and "encoding" in the core language? After all,
> when translating literals to the execution character set, the
> compiler actually has to pick an encoding, because it has
> to put string literals down to program memory.
>

No, the wording is fine.
The underlying issue is that (to the best of my understanding, this is your
area of expertise, not mine!), the standard consider both compilation/
constant evaluation and runtime as "execution".
In practice of course if the "execution" encoding is set to be UTF-8 during
compilation but later executed on an ebcdic machine, attempting to do any
kind of text i/o will result in mojibake, as the information of what the
execution encoding was is lost (hence this proposal), and no conversion is
implicitly performed.

In that context, I am afraid that "execution" will be understood at
"runtime" by many people.
But again, you are right that I should have used "execution encoding" in my
wording - independently of the user facing method name.

It's something that Steve is also looking into
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1859r0.html


>
> Jens
>

Received on 2020-01-23 17:44:04