On 4/27/22 12:05 AM, Steve Downey wrote:


On Tue, Apr 26, 2022 at 6:18 PM Corentin Jabot <corentinjabot@gmail.com> wrote:


On Tue, Apr 26, 2022 at 10:39 PM Steve Downey via SG16 <sg16@lists.isocpp.org> wrote:


On Tue, Apr 26, 2022 at 4:20 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
On 4/26/22 4:12 PM, Jens Maurer via SG16 wrote:
> On 26/04/2022 22.06, Tom Honermann via SG16 wrote:
>> The summary for the SG16 meeting held April 13th, 2022 is now available.  For those that attended, please review and suggest corrections.
>>
>>    * https://github.com/sg16-unicode/sg16-meetings#april-13th-2022
>>
>> No decisions were made at this meeting.
>>
>> I again apologize for being so delinquent getting the summary published.
>>
>> Jens, I fear I misunderstood or incorrectly captured some of your comments. Please see the editor's note starting with "This behavior doesn't seem related to the proposed change since ...". If you recall the discussion being different than I wrote, I'll update it to reflect your recollection.
> I think you should just strike all of this:
>
> Jens stated that this makes such intended use in identifiers ill-formed since, after this change, such a character would appear as a lone preprocessing-token.
> [ Editor's note: This behavior doesn't seem related to the proposed change since, previously, a UCN naming one of these characters would also appear as a lone preprocessing-token. The editor is concerned that this portion of the discussion was not captured accurately. ]

Done, thank you!

>
> I think there was some development during the discussion
> about the current and future state with these new
> characters.  Having an updated paper clearly stating
> the current and with-paper situations would be helpful.

Agreed, I suspect Steve intends to provide that.

Tom.


That's what I'm planning. The complicated bit is the implications for the "C" locale, although it's not an issue for the "POSIX" locale, although I don't think it's a real world concern these days that the default encoded character set doesn't have what POSIX calls the portable character set. Tracing the requirements is tedious because C++ defers to C, which in turn defers to the ISO version of the POSIX specification for much of the locale machinery.

Hey Steve, 
Can you please explain in the paper what this buys us?

Aside from C compatibility, I think there are two benefits:


* It doesn't change the set of identifiers (nor should it)
* It makes \N{DOLLAR} ill-formed. Is that desirable?
Only in syntactic contexts, if I have understood correctly. You could use that in a literal, but not in an identifier. 

* It makes an hypothetical implementation that would not support $ in the literal encoding non-conforming, even if no such character is present in any source file. Is that desirable?
C and POSIX already require this. POSIX in straight out normative text. You can't even write a POSIX charmap for a locale without specifying $.

For reference, the POSIX portable character set is defined here.

IBM documents the invariant subset of EBCDIC here (for IBMi) along with at least some of the EBCDIC code pages that do not align with it. I find the presentation on Wikipedia easier to view. There are a few interesting things to note:

  1. The following current members of the basic character set are not in the invariant EBCDIC set:
    |, !, #, ~, ^, [, ], {, }, \
  2. Alternative tokens are defined for these with the following exceptions:
    \
  3. @ and $ are both members of the portable EBCDIC set, ` is not. This is weird as this set is apparently intended to align with the POSIX portable character set, but includes U+00B4 (ACUTE ACCENT) and not U+0060 (GRAVE ACCENT). I suppose that could be a doc bug.
  4. If we were to start using the new characters outside of literals, adding new alternative tokens would be appropriate (but perhaps unnecessary).

Steve, perhaps this would be useful information to add to the paper?

Tom.