On Mon, May 16, 2022 at 12:11 PM Tom Honermann <tom@honermann.net> wrote:
On 5/14/22 9:11 PM, Hubert Tong wrote:
On Sat, May 14, 2022 at 6:08 PM Tom Honermann <tom@honermann.net> wrote:
On 5/14/22 8:17 AM, Corentin Jabot wrote:
Hey.Thanks for the work Barry.
I'm still concerned how long are we still going to keep using the term character incorrectly and in context in which its meaning is ambiguous?
Chair hat on: We did discuss this usage during the last telecon and consensus was for this direction though I have no doubt that stronger consensus could be found with adoption of new terms.
Chair hat off ...
I don't agree that this wording uses "character" incorrectly, but I do agree that the use here is as ambiguous as usage elsewhere throughout the standard.
If we want to clean up our use of "character" (and I think we would all like us to), then I think we need a paper that analyzes how it is currently used and how many terms are needed to replace it. We could then identify terms to fit to those uses. Unfortunately, such terms will likely have to be distinct from what ISO/IEC 10646 provides since many of those terms are defined in Unicode specific terms.
I'm not aware of any other uses of this term in the standard. I'll defer to Hubert whether "state-transition" is an acceptable term of art or whether there is another term that would be preferred.Do we have precedence for the use of the term state-transition? (it's not an industry term to the best of my knowledge).
The preferred term of art would be "shift sequence"; however, instead of saying "encodes a shift sequence", we should probably say "is a shift sequence".
Ok, thanks, Hubert. Here are the changes I think are then desired (based on https://brevzin.github.io/cpp_proposals/2286_fmt_ranges/p2286r8.html which I think is still the most recent revision).
In [format.string.escaped]p2.2:
For each code unit sequence X in S that either encodes a single character,
encodes a state transitionis a shift sequence, or is a sequence of ill-formed code units, processing is in order as follows:
In [format.string.escaped]p2.4:
Otherwise, if X
encodes a state transitionis a shift sequence, the effect on E and further decoding of S is unspecified.
Recommended Practice: astate transitionshift sequence should be represented in E such that the original code unit sequence of S can be reconstructed.
Barry, I know I had said we were done, but ... are you ok making these changes? The LWG chairs should of course be made aware of the additional changes so they can decide if they want LWG to re-re-review again.
Tom.
Done. Can you check?
That looks exactly right to me! Thanks, Barry!
Tom.
Barry