C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-core] New draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 3 Jul 2020 19:37:19 -0400
> On Jul 3, 2020, at 3:11 AM, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 02/07/2020 18.43, Tom Honermann via SG16 wrote:
>> On 7/2/20 3:15 AM, Corentin via Core wrote:
>
>>> literal encoding is a less ambiguous term either way.
>>> We need a terminology such that we can distinguish the encoding of literals from that of runtime strings, literal (associated) encoding achieves that.
>>
>> Ah, I think we may be crossing hairs here. I agree that we should have an abstract name that indicates the encoding used for literals. We lack a term for that today (which is why the paper uses the phrase "encoding of the execution character set"). But that is different from what is intended by "associated character encoding"; this is intended to name an encoding (possibly indirectly, hence "encoding of the ...") that might be registered with IANA <https://www.iana.org/assignments/character-sets/character-sets.xhtml> (where the term "character set" is used to mean "character encoding", but the former is used for legacy reasons).
>
> You lost me there.
>
> What does IANA's list of character sets/encodings have to do with
> how a compiler chooses to encode C++ literals?
>
> Maybe the compiler has invented an encoding of its own, just
> for the fun of it.
>
> If you intend some stronger relationship to IANA here, this
> needs to be a lot of explicit.

I explained that poorly. I was just trying to explain that the “associated character encoding” names a concrete encoding and not an abstraction. The IANA registry was just used as an example set of concrete character encodings.

Another way to explain the distinction is, if we were to introduce a new “literal encoding” term to mean “the encoding of the execution character set”, then the “associated character encoding” of ordinary string literals would be “the literal encoding”, but the “associated character encoding” for UTF-8 string literals would, of course, be “UTF-8”.

Tom.

>
> Jens

Received on 2020-07-03 18:40:36