C++ Logo


Advanced search

Re: Suggested wording change for non-Unicode cases in P2286R7: Formatting Ranges

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 16 May 2022 15:13:00 -0400
On 5/16/22 2:42 PM, Barry Revzin wrote:
> On Mon, May 16, 2022 at 12:11 PM Tom Honermann <tom_at_[hidden]> wrote:
> On 5/14/22 9:11 PM, Hubert Tong wrote:
>> On Sat, May 14, 2022 at 6:08 PM Tom Honermann <tom_at_[hidden]>
>> wrote:
>> On 5/14/22 8:17 AM, Corentin Jabot wrote:
>>> Hey.
>>> Thanks for the work Barry.
>>> I'm still concerned how long are we still going to keep
>>> using the term character incorrectly and in context in which
>>> its meaning is ambiguous?
>> Chair hat on: We did discuss this usage during the last
>> telecon
>> <https://github.com/sg16-unicode/sg16-meetings#may-11th-2022>
>> and consensus was for this direction though I have no doubt
>> that stronger consensus could be found with adoption of new
>> terms.
>> Chair hat off ...
>> I don't agree that this wording uses "character" incorrectly,
>> but I do agree that the use here is as ambiguous as usage
>> elsewhere throughout the standard.
>> If we want to clean up our use of "character" (and I think we
>> would all like us to), then I think we need a paper that
>> analyzes how it is currently used and how many terms are
>> needed to replace it. We could then identify terms to fit to
>> those uses. Unfortunately, such terms will likely have to be
>> distinct from what ISO/IEC 10646 provides since many of those
>> terms are defined in Unicode specific terms.
>>> Do we have precedence for the use of the term
>>> state-transition? (it's not an industry term to the best of
>>> my knowledge).
>> I'm not aware of any other uses of this term in the standard.
>> I'll defer to Hubert whether "state-transition" is an
>> acceptable term of art or whether there is another term that
>> would be preferred.
>> The preferred term of art would be "shift sequence"; however,
>> instead of saying "encodes a shift sequence", we should probably
>> say "is a shift sequence".
> Ok, thanks, Hubert. Here are the changes I think are then desired
> (based on
> https://brevzin.github.io/cpp_proposals/2286_fmt_ranges/p2286r8.html
> which I think is still the most recent revision).
> In [format.string.escaped]p2.2:
> For each code unit sequence /X/ in /S/ that either encodes a
> single character, encodes a state transition_is a shift
> sequence_, or is a sequence of ill-formed code units,
> processing is in order as follows:
> In [format.string.escaped]p2.4:
> Otherwise, if /X/ encodes a state transition_is a shift
> sequence_, the effect on /E/ and further decoding of /S/ is
> unspecified.
> /Recommended Practice/: a state transition_shift sequence_
> should be represented in /E/ such that the original code unit
> sequence of /S/ can be reconstructed.
> Barry, I know I had said we were done, but ... are you ok making
> these changes? The LWG chairs should of course be made aware of
> the additional changes so they can decide if they want LWG to
> re-re-review again.
> Tom.
> Done. Can you check?
> * https://github.com/brevzin/cpp_proposals/commit/3b91a09d023acbed63e6d53ba652e0914cbf8e84
> * https://isocpp.org/files/papers/P2286R8.html#pnum_14
That looks exactly right to me! Thanks, Barry!


> Barry

Received on 2022-05-16 19:13:02