ISOCPP sg16 List: Re: Width estimation

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 14 Dec 2022 13:58:03 -0500

On 12/14/22 1:39 PM, Corentin wrote:
> Tom,
>
> That analysis has been performed, many times over. Literally tens if
> not hundreds of hours at this point.
I believe that. But it isn't reflected well in the paper.
> I do not know how to convey that in other ways than what I already tried.
Replace the screenshots with an analysis like I provided. Not for all
the terminals of course, but for a select subset of them.
>
> All the terminal tested match the behavior of what is proposed in the
> paper modulo tofu and other rendering bugs
>
> I looked at code of all these terminal too, it's also in the paper.
> The codepoints in the screenshot are the codepoints that sg-16
> specifically asked about at the last meeting.

It is virtually impossible to correlate the characters displayed in the
screenshots with anything:

  * They don't show consistent sets of characters.
  * The don't identify which characters are being displayed.
  * Most of them present alignment that appears to be contrary to the
    paper with no analysis or explanation.

Tom.

>
> I am glad to see that your observations concur.
>
>
> On Wed, Dec 14, 2022, 19:30 Tom Honermann <tom_at_[hidden]> wrote:
>
> On 11/30/22 5:26 PM, Corentin via SG16 wrote:
>> Hello folks.
>> Here is a list of all the codepoint that change
>> https://gist.githubusercontent.com/cor3ntin/b7f4f52893b0b54890e970f7bbec6118/raw/720a910585d78c9ceb4e0458dcef87af2a436121/width.md
>
> Just a note: the gist has 8570 characters and that count matches
> the ranges specified in the D2675R1
> <https://isocpp.org/files/papers/D2675R1.pdf> annex.
>
>>
>> Simply cat that file in the terminal.
>> The screenshot below is a render on ITerm2
>> You will notice the tofu for reserved codepoints is considered narrow
>> but doesn't quite fit so it overlaps with the next cell, same for
>> the number in square.
>>
>>
>> Screenshot 2022-11-30 at 23.19.47.png
>
> As discussed previously, a single screen shot that only shows a
> small subset of the relevant characters is not sufficient to
> demonstrate that the conclusions of the paper are consistent with
> existing behavior. I continue to have reservations about the
> screen shots in the paper for this reason; I don't see how they
> provide useful information at all. I think they are actively
> misleading since they do not appear to show behavior that is
> consistent with the intent of the paper.
>
> I spent some time analyzing the behavior of all 8570 characters in
> the terminal I use (Konsole 12.12.3 with the Hack 10pt font). Here
> is what I found:
>
> * For the characters that the paper changes from width 1 to
> width 2 (based on the listings in the annex), the following
> are displayed with a width other than 2:
> o Width 0:
> + U+016FE4 (KHITAN SMALL SCRIPT FILLER)
> o Width 1: (These were all displayed as tofu; some are
> probably unassigned characters, others are probably
> unknown by the font)
> + U+01AFF0 .. U+01AFFE
> + U+01B11F ..
> + U+01F6DC .. U+01F6DF
> + U+01F7F0
> + U+01FA75 .. U+01FA77
> + U+01FA7B .. U+01FA7C
> + U+01FA87 .. U+01FA88
> + U+01FAA9 .. U+01FAAF
> + U+01FAB7 .. U+01FABF
> + U+01FAC3 .. U+01FACF
> + U+01FAD7 .. U+01FAF8
> * For the characters that the paper changes from width 1 to
> width 2 (based on the listings in the annex), the following
> are displayed with a width other than 1:
> o Width 2:
> + U+003248 .. U+00324F (CIRCLED NUMBER TEN ON BLACK
> SQUARE .. CIRCLED NUMBER EIGHTY ON BLACK SQUARE)
>
> These results strongly match the intent of the paper and that the
> open question regarding the last group of characters should be
> answered such that they do not change width.
>
> This is the kind of analysis I would like to see performed for
> other terminals so that we can qualitatively compare behavior
> between them. I attached C++ source code I used to display the
> characters.
>
> Tom.
>
>>
>>
>> FYI Iterm2 also uses Unicode UAX 44
>> https://github.com/gnachman/iTerm2/blob/master/sources/NSCharacterSet+iTerm.m#L464
>>

Received on 2022-12-14 18:58:04