C++ Logo


Advanced search

Re: [SG16] Reminder: SG16 telecon tomorrow (Wednesday, 2020-05-27)

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 27 May 2020 13:05:27 -0400
I was unclear, unfortunately. The standard term "basic source character
set" is the only source character set. Other things representing characters
that will be translated are escape sequences and universal character names,
which are sequences of basic source characters. There are a few places in
the standard that mention "source character set" without the basic
qualification, but it's pretty clear that the same set is meant.

The execution character set, on the other hand, has basic, corresponding to
the basic source character set, and the extended superset, which is often

Also, "source character set" is not the same as what's actually in a source
file, leading to further confusion. That mapping is phase 1. If we were to
switch to a code point basis, this would be a decode from source file
encoding to code points. Plus some 'implementation defined' handwaving to
preserve wetware implementations.

And 26 because I think it will take more than a couple years to understand
all the implications of switching from "source characters" and "universal
character names" to "code points". At least not without just starting over,
which is equally scary.

On Wed, May 27, 2020 at 1:36 AM Tom Honermann <tom_at_[hidden]> wrote:

> On 5/27/20 12:08 AM, Steve Downey via SG16 wrote:
> With respect to P1859:
> -Basic source character set
> -: The abstract characters that must be representable in the _character
> set_ used for source code
> +Source character set
> +: The abstract characters that must be representable in the internal
> _character set_ used after phase 1 of translation. All characters not in
> the source character set are converted to universal-character-names, which
> are made up of characters from the basic character set. The abstract parser
> only sees characters in the source character set.
> There is no "Basic" source character set. There is the character set the
> lexer and parser uses that is available after the implementation defined
> conversion from whatever was presented as source.
> I don't think anyone understands that, outside CWG.
> I'm not sure I'm following. The standard does define *basic source
> character set* in [lex.charset]p1 <http://eel.is/c++draft/lex.charset#1>.
> Are you proposing that it be renamed to just *source character set*?
> I think we will need a term for the encoding of source files. We could
> use *source file encoding* for that, but I'm a little concerned about
> these two terms being confused.
> Having more precision around the values emitted into narrow, wide, and uN
> literals from the execution character set, and what happens when that fails
> I still believe would be useful.
> I think P2029 may address that.
> Perhaps for 26 we could rewrite entirely in terms of processing code
> points and occasionally "orginal spelling". It would be nice if the logical
> model was closer to what the physical model is.
> Why wait for 26? :)
> Tom.
> On Tue, May 26, 2020 at 1:37 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>> This is your friendly reminder that an SG16 telecon will be held
>> tomorrow, Wednesday May 27th, at 19:30 UTC (timezone conversion
>> <https://www.timeanddate.com/worldclock/converter.html?iso=20200527T193000&p1=1440>).
>> To attend, visit https://bluejeans.com/140274541 at the start of the
>> meeting.
>> Steve will circulate a draft revision of P1949 on the SG16 mailing list
>> today.
>> The agenda for the meeting is:
>> - D1949R4: C++ Identifier Syntax using Unicode Standard Annex 31
>> - Review updates since the April 22nd review.
>> - Discuss terminology updates to strive for in C++23
>> - P1859R0: Standard terminology character sets and encodings
>> <https://wg21.link/p1859>
>> - Establish priorities for terms to address.
>> - Establish a methodology for drafting wording updates.
>> Anticipated decisions to be made at this meeting include:
>> - Whether to forward the new draft revision of P1949 to EWG.
>> Prior to tomorrow's meeting, please:
>> - review Steve's draft revision.
>> - review P1859R0, particularly the proposed terminology.
>> - think of other terminology changes to be considered.
>> - think of how we can divide up the work for making terminology
>> updates.
>> Tom.
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2020-05-27 12:08:44