ISOCPP sg16 List: Re: Suggested wording change for non-Unicode cases in P2286R7: Formatting Ranges

From: Barry Revzin <barry.revzin_at_[hidden]>
Date: Sun, 8 May 2022 15:04:23 -0500

On Sun, May 8, 2022 at 9:22 AM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:

> > One thing I noticed is that the wording about Grapheme_Extend is gone. I
> didn't know what this meant before, so I don't know now if this is a good
> removal or a bad removal.
>
> I don't recall any requests for removing it and think that it should be
> reintroduced.
>
> - Victor
>
> On Wed, May 4, 2022 at 10:44 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 05/05/2022 04.08, Barry Revzin wrote:
>> > I think I have applied this. Here's the rendered version:
>> https://brevzin.github.io/cpp_proposals/2286_fmt_ranges/p2286r8.html#pnum_12
>> <
>> https://brevzin.github.io/cpp_proposals/2286_fmt_ranges/p2286r8.html#pnum_12
>> >
>>
>> > How does this look?
>>
>> p2.2
>>
>> For each code sequence X in S that either encodes a single character or
>> encoding state transition or that is a sequence of ill-formed code units is
>> processed in order as follows:
>>
>> That feels like bad English grammar to me.
>>
>> Why "encoding", yet there is an "encodes" before that?
>> Why "either" and there are three things that don't
>> exactly correspond grammatically?
>>
>> Maybe make a bulleted sub-list with the three items
>> so that the structure is clear.
>>
>> "If C is one of the UCS scalar values the table below,"
>>
>> add "in"
>>
>> better clarify: "the two characters shown as the
>> corresponding escape sequence are appended to E"
>>
>>
>> after p2.3.4, p2.5
>>
>> "simple-hexadecimal-digit-sequence"
>>
>> I would not re-use lexing grammar for a local placeholder,
>> just say \u{/hex-digit-sequence/} or so.
>>
>>
>> p2.5
>>
>> "Otherwise, X is a sequence of ill-formed code units. Each"
>>
>> -> "Otherwise (X is a sequence of ill-formed code units), each code unit
>> ..."
>>
>>
>> "U+0027 APOSTROPHE is escaped as \' while U+0022 QUOTATION MARK is left
>> unchanged."
>>
>> Can we rephrase that to avoid "is escaped as"? We were on such a good
>> track to just append characters and avoid any judgment calls.
>>
>> suggestion "
>> - for each character U+0027 APOSTROPHE in S, the two characters \' are
>> appended to E
>> - U+0022 QUOTATION MARK is left unchanged"
>>
>>
>> Jens
>>
>
Thanks Jens and Victor! I did my best to apply the suggested changes:

   - Updated rendered wording:
   https://brevzin.github.io/cpp_proposals/2286_fmt_ranges/p2286r8.html#pnum_12
   - New diff:
   https://github.com/brevzin/cpp_proposals/commit/3d93043f5c296810d7e18b11d5b7083143554309

Hopefully, this gradient is slowly descending to the correct solution :-)

Barry

Received on 2022-05-08 20:04:33