C++ Logo

sg16

Advanced search

Re: [SG16] Comment on P1885R0: Naming Text Encodings to Demystify Them

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 24 Jan 2020 01:22:48 +0100
On 24/01/2020 00.41, Corentin Jabot wrote:
>
>
> On Thu, 23 Jan 2020 at 23:32, Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> On 23/01/2020 23.19, Corentin Jabot wrote:
> >
> >
> > On Thu, Jan 23, 2020, 21:57 Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]> <mailto:sg16_at_[hidden] <mailto:sg16_at_[hidden]>>> wrote:
> >
> > Hi,
> >
> > We talked quite a bit about this paper in the teleconference.
> >
> > I have another concern: The core language defines the
> > terms "execution character set" and "execution wide-character set"
> > in [lex.charset].
> >
> > The wording in the paper should use exactly these phrases, with
> > an appropriate cross-reference.
> >
> > Given these definitions, I'm a bit concern about the name of
> > the member function "literal". If it wants to talk about the
> > execution character set, it should state so in its name.
> >
> >
> > While we can bikeshed the particulars, the paper does explain the names chosen.
>
> That's one part of my concern; the other is the expression
> of the specification. If the core language specifies a term
> that has the right semantics, the library wording should use it.
>
>
> That's a good point.
> I'll make sure the wording use the right term
>
>
>
> > The core wording is not necessarily intuitive for users.
>
> Mission accomplished.
>
> > The core wording also assumes (it doesn't really have a choice) that the execution encoding is a subset of the encoding associated to the current locale).
>
> I don't understand that sentence.
> I thought locales and encoding should (conceptually) get
> a divorce.
>
>
> Yep, but separate paper!
> Right now we can only speak about locale associated encoding in the wording.
>
>
>
> Or is your concern that "execution character set" sounds like a
> compile-time constant, whereas the environment's character set
> might actually be runtime-defined (e.g. xterm for UTF-8 vs. Latin-1)?
>
> If so, do you suggest changes to the definition of
> "execution character set"? Put differently, do you anticipate that
> literal() might return a text_encoding that is different from the
> execution character set? Or is there some haziness between
> "character set" and "encoding" in the core language? After all,
> when translating literals to the execution character set, the
> compiler actually has to pick an encoding, because it has
> to put string literals down to program memory.
>
>
> No, the wording is fine.
> The underlying issue is that (to the best of my understanding, this is your area of expertise, not mine!), the standard consider both compilation/ constant evaluation and runtime as "execution".

Not really.

[lex.phrases] is pretty clear that there is a conceptual distinction
between "translation" and execution in the execution environment.
(That doesn't exclude interpreters under the as-if rule, of course.)

My understanding of [lex.charset] is that the standard assumes the
existence of a single execution character set (with a certain
encoding), so that it can statically translate char and string
literals (phase 5), i.e. determine the values needed to initialize
the character array representing a string literal (for instance).

> In practice of course if the "execution" encoding is set to be UTF-8 during compilation but later executed on an ebcdic machine, attempting to do any kind of text i/o will result in mojibake, as the information of what the execution encoding was is lost (hence this proposal), and no conversion is implicitly performed.

Agreed.

I'm wondering whether the SG16 efforts should leave old-style literals
alone and focus on well-defined u8 literals, and eventually necessary
encoding transformations to match the (runtime) environment.

Jens


> In that context, I am afraid that "execution" will be understood at "runtime" by many people.
> But again, you are right that I should have used "execution encoding" in my wording - independently of the user facing method name.
>
> It's something that Steve is also looking into http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1859r0.html
>
>
>
> Jens
>

Received on 2020-01-23 18:25:29