C++ Logo

sg16

Advanced search

Re: Suggested wording change for non-Unicode cases in P2286R7: Formatting Ranges

From: Barry Revzin <barry.revzin_at_[hidden]>
Date: Mon, 16 May 2022 13:42:32 -0500
On Mon, May 16, 2022 at 12:11 PM Tom Honermann <tom_at_[hidden]> wrote:

> On 5/14/22 9:11 PM, Hubert Tong wrote:
>
> On Sat, May 14, 2022 at 6:08 PM Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 5/14/22 8:17 AM, Corentin Jabot wrote:
>>
>> Hey.
>> Thanks for the work Barry.
>>
>> I'm still concerned how long are we still going to keep using the term
>> character incorrectly and in context in which its meaning is ambiguous?
>>
>> Chair hat on: We did discuss this usage during the last telecon
>> <https://github.com/sg16-unicode/sg16-meetings#may-11th-2022> and
>> consensus was for this direction though I have no doubt that stronger
>> consensus could be found with adoption of new terms.
>>
>> Chair hat off ...
>>
>> I don't agree that this wording uses "character" incorrectly, but I do
>> agree that the use here is as ambiguous as usage elsewhere throughout the
>> standard.
>>
>> If we want to clean up our use of "character" (and I think we would all
>> like us to), then I think we need a paper that analyzes how it is currently
>> used and how many terms are needed to replace it. We could then identify
>> terms to fit to those uses. Unfortunately, such terms will likely have to
>> be distinct from what ISO/IEC 10646 provides since many of those terms are
>> defined in Unicode specific terms.
>>
>> Do we have precedence for the use of the term state-transition? (it's not
>> an industry term to the best of my knowledge).
>>
>> I'm not aware of any other uses of this term in the standard. I'll defer
>> to Hubert whether "state-transition" is an acceptable term of art or
>> whether there is another term that would be preferred.
>>
>
> The preferred term of art would be "shift sequence"; however, instead of
> saying "encodes a shift sequence", we should probably say "is a shift
> sequence".
>
> Ok, thanks, Hubert. Here are the changes I think are then desired (based
> on https://brevzin.github.io/cpp_proposals/2286_fmt_ranges/p2286r8.html
> which I think is still the most recent revision).
>
> In [format.string.escaped]p2.2:
>
> For each code unit sequence *X* in *S* that either encodes a single
> character, encodes a state transition*is a shift sequence*, or is a
> sequence of ill-formed code units, processing is in order as follows:
>
> In [format.string.escaped]p2.4:
>
> Otherwise, if *X* encodes a state transition*is a shift sequence*, the
> effect on *E* and further decoding of *S* is unspecified.
>
> *Recommended Practice*: a state transition*shift sequence* should be
> represented in *E* such that the original code unit sequence of *S* can
> be reconstructed.
>
> Barry, I know I had said we were done, but ... are you ok making these
> changes? The LWG chairs should of course be made aware of the additional
> changes so they can decide if they want LWG to re-re-review again.
>
> Tom.
>

Done. Can you check?


   -
   https://github.com/brevzin/cpp_proposals/commit/3b91a09d023acbed63e6d53ba652e0914cbf8e84
   - https://isocpp.org/files/papers/P2286R8.html#pnum_14


Barry

Received on 2022-05-16 18:42:44