C++ Logo


Advanced search

Re: [SG16] [isocpp-core] New draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 2 Jul 2020 12:43:55 -0400
On 7/2/20 3:15 AM, Corentin via Core wrote:
> On Thu, 2 Jul 2020 at 09:04, Jens Maurer via Core
> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
> On 02/07/2020 07.39, Tom Honermann wrote:
> > On 7/1/20 3:28 AM, Jens Maurer wrote:
> >> Maybe replace "associated character encoding" -> "associated
> literal encoding"
> >> globally to avoid the mention of "character" here.
> > Despite the use of the "C" word, "character encoding" is more
> consistent
> > with Unicode terminology. Though if we really want to be
> consistent, we
> > should use "character encoding form" (which ISO/IEC 10646 then calls
> > simply "encoding form"). This is something we could discuss at
> the SG16
> > meeting next week.
> The paper is in CWG's court; involving SG16 is not helpful at this
> stage
> absent more severe concerns that would involve sending back the paper
> as a CWG action. That said, everybody (including members of SG16) are
> invited to CWG telecons to offer their opinion.
> off-topic remarks: Since we'd be using a new term such as
> "literal encoding" here, I don't think Unicode will get into our
> way. I'd like to point out that "character encoding" (also in the
> Unicode meaning) sounds like a character-at-a-time encoding, which
> we expressly don't want to require. So, choosing a different term
> than one that has Unicode semantic connotations seems wise.
> literal encoding is a less ambiguous term either way.
> We need a terminology such that we can distinguish the encoding of
> literals from that of runtime strings, literal (associated) encoding
> achieves that.

Ah, I think we may be crossing hairs here. I agree that we should have
an abstract name that indicates the encoding used for literals. We lack
a term for that today (which is why the paper uses the phrase "encoding
of the execution character set"). But that is different from what is
intended by "associated character encoding"; this is intended to name an
encoding (possibly indirectly, hence "encoding of the ...") that might
be registered with IANA
(where the term "character set" is used to mean "character encoding",
but the former is used for legacy reasons).


Received on 2020-07-02 11:48:57