C++ Logo

sg16

Advanced search

Re: [SG16] Comment on P1885R0: Naming Text Encodings to Demystify Them

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 23 Jan 2020 23:32:20 +0100
On 23/01/2020 23.19, Corentin Jabot wrote:
>
>
> On Thu, Jan 23, 2020, 21:57 Jens Maurer via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> Hi,
>
> We talked quite a bit about this paper in the teleconference.
>
> I have another concern: The core language defines the
> terms "execution character set" and "execution wide-character set"
> in [lex.charset].
>
> The wording in the paper should use exactly these phrases, with
> an appropriate cross-reference.
>
> Given these definitions, I'm a bit concern about the name of
> the member function "literal". If it wants to talk about the
> execution character set, it should state so in its name.
>
>
> While we can bikeshed the particulars, the paper does explain the names chosen.

That's one part of my concern; the other is the expression
of the specification. If the core language specifies a term
that has the right semantics, the library wording should use it.

> The core wording is not necessarily intuitive for users.

Mission accomplished.

> The core wording also assumes (it doesn't really have a choice) that the execution encoding is a subset of the encoding associated to the current locale).

I don't understand that sentence.
I thought locales and encoding should (conceptually) get
a divorce.

Or is your concern that "execution character set" sounds like a
compile-time constant, whereas the environment's character set
might actually be runtime-defined (e.g. xterm for UTF-8 vs. Latin-1)?

If so, do you suggest changes to the definition of
"execution character set"? Put differently, do you anticipate that
literal() might return a text_encoding that is different from the
execution character set? Or is there some haziness between
"character set" and "encoding" in the core language? After all,
when translating literals to the execution character set, the
compiler actually has to pick an encoding, because it has
to put string literals down to program memory.

Jens

Received on 2020-01-23 16:34:57