C++ Logo

SG16

Advanced search

Subject: Re: [isocpp-core] New draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)
From: Tom Honermann (tom_at_[hidden])
Date: 2020-07-02 11:43:55


On 7/2/20 3:15 AM, Corentin via Core wrote:
>
>
> On Thu, 2 Jul 2020 at 09:04, Jens Maurer via Core
> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
>
> On 02/07/2020 07.39, Tom Honermann wrote:
> > On 7/1/20 3:28 AM, Jens Maurer wrote:
>
>
> >> Maybe replace "associated character encoding" -> "associated
> literal encoding"
> >> globally to avoid the mention of "character" here.
> > Despite the use of the "C" word, "character encoding" is more
> consistent
> > with Unicode terminology.  Though if we really want to be
> consistent, we
> > should use "character encoding form" (which ISO/IEC 10646 then calls
> > simply "encoding form").  This is something we could discuss at
> the SG16
> > meeting next week.
>
> The paper is in CWG's court; involving SG16 is not helpful at this
> stage
> absent more severe concerns that would involve sending back the paper
> as a CWG action.  That said, everybody (including members of SG16) are
> invited to CWG telecons to offer their opinion.
>
> off-topic remarks: Since we'd be using a new term such as
> "literal encoding" here, I don't think Unicode will get into our
> way.  I'd like to point out that "character encoding" (also in the
> Unicode meaning) sounds like a character-at-a-time encoding, which
> we expressly don't want to require.  So, choosing a different term
> than one that has Unicode semantic connotations seems wise.
>
>
> literal encoding is a less ambiguous term either way.
> We need a terminology such that we can distinguish the encoding of
> literals from that of runtime strings, literal (associated) encoding
> achieves that.

Ah, I think we may be crossing hairs here.  I agree that we should have
an abstract name that indicates the encoding used for literals.  We lack
a term for that today (which is why the paper uses the phrase "encoding
of the execution character set").  But that is different from what is
intended by "associated character encoding"; this is intended to name an
encoding (possibly indirectly, hence "encoding of the ...") that might
be registered with IANA
<https://www.iana.org/assignments/character-sets/character-sets.xhtml>
(where the term "character set" is used to mean "character encoding",
but the former is used for legacy reasons).

Tom.



SG16 list run by sg16-owner@lists.isocpp.org