C++ Logo

sg16

Advanced search

Re: Width estimation

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 14 Dec 2022 13:30:00 -0500
On 11/30/22 5:26 PM, Corentin via SG16 wrote:
> Hello folks.
> Here is a list of all the codepoint that change
> https://gist.githubusercontent.com/cor3ntin/b7f4f52893b0b54890e970f7bbec6118/raw/720a910585d78c9ceb4e0458dcef87af2a436121/width.md

Just a note: the gist has 8570 characters and that count matches the
ranges specified in the D2675R1
<https://isocpp.org/files/papers/D2675R1.pdf> annex.

>
> Simply cat that file in the terminal.
> The screenshot below is a render on ITerm2
> You will notice the tofu for reserved codepoints is considered narrow
> but doesn't quite fit so it overlaps with the next cell, same for the
> number in square.
>
>
> Screenshot 2022-11-30 at 23.19.47.png

As discussed previously, a single screen shot that only shows a small
subset of the relevant characters is not sufficient to demonstrate that
the conclusions of the paper are consistent with existing behavior. I
continue to have reservations about the screen shots in the paper for
this reason; I don't see how they provide useful information at all. I
think they are actively misleading since they do not appear to show
behavior that is consistent with the intent of the paper.

I spent some time analyzing the behavior of all 8570 characters in the
terminal I use (Konsole 12.12.3 with the Hack 10pt font). Here is what I
found:

  * For the characters that the paper changes from width 1 to width 2
    (based on the listings in the annex), the following are displayed
    with a width other than 2:
      o Width 0:
          + U+016FE4 (KHITAN SMALL SCRIPT FILLER)
      o Width 1: (These were all displayed as tofu; some are probably
        unassigned characters, others are probably unknown by the font)
          + U+01AFF0 .. U+01AFFE
          + U+01B11F .. U+01B132
          + U+01B155
          + U+01F6DC .. U+01F6DF
          + U+01F7F0
          + U+01FA75 .. U+01FA77
          + U+01FA7B .. U+01FA7C
          + U+01FA87 .. U+01FA88
          + U+01FAA9 .. U+01FAAF
          + U+01FAB7 .. U+01FABF
          + U+01FAC3 .. U+01FACF
          + U+01FAD7 .. U+01FAF8
  * For the characters that the paper changes from width 1 to width 2
    (based on the listings in the annex), the following are displayed
    with a width other than 1:
      o Width 2:
          + U+003248 .. U+00324F (CIRCLED NUMBER TEN ON BLACK SQUARE ..
            CIRCLED NUMBER EIGHTY ON BLACK SQUARE)

These results strongly match the intent of the paper and that the open
question regarding the last group of characters should be answered such
that they do not change width.

This is the kind of analysis I would like to see performed for other
terminals so that we can qualitatively compare behavior between them. I
attached C++ source code I used to display the characters.

Tom.

>
>
> FYI Iterm2 also uses Unicode UAX 44
> https://github.com/gnachman/iTerm2/blob/master/sources/NSCharacterSet+iTerm.m#L464
>
I2luY2x1ZGUgPGNhc3NlcnQ+CiNpbmNsdWRlIDxjc3RkaW8+CiNpbmNsdWRlIDxzdHJpbmc+
CiNpbmNsdWRlIDx1dGlsaXR5PgojaW5jbHVkZSA8dmVjdG9yPgoKc3RkOjp2ZWN0b3I8c3Rk
OjpwYWlyPGludCwgaW50Pj4gZnJvbV8xX3RvXzIgPSB7CiAgeyAweDIzMUEsIDB4MjMxQiB9
LAogIHsgMHgyM0U5LCAweDIzRUMgfSwKICB7IDB4MjNGMCwgMHgyM0YwIH0sCiAgeyAweDIz
RjMsIDB4MjNGMyB9LAogIHsgMHgyNUZELCAweDI1RkUgfSwKICB7IDB4MjYxNCwgMHgyNjE1
IH0sCiAgeyAweDI2NDgsIDB4MjY1MyB9LAogIHsgMHgyNjdGLCAweDI2N0YgfSwKICB7IDB4
MjY5MywgMHgyNjkzIH0sCiAgeyAweDI2QTEsIDB4MjZBMSB9LAogIHsgMHgyNkFBLCAweDI2
QUIgfSwKICB7IDB4MjZCRCwgMHgyNkJFIH0sCiAgeyAweDI2QzQsIDB4MjZDNSB9LAogIHsg
MHgyNkNFLCAweDI2Q0UgfSwKICB7IDB4MjZENCwgMHgyNkQ0IH0sCiAgeyAweDI2RUEsIDB4
MjZFQSB9LAogIHsgMHgyNkYyLCAweDI2RjMgfSwKICB7IDB4MjZGNSwgMHgyNkY1IH0sCiAg
eyAweDI2RkEsIDB4MjZGQSB9LAogIHsgMHgyNkZELCAweDI2RkQgfSwKICB7IDB4MjcwNSwg
MHgyNzA1IH0sCiAgeyAweDI3MEEsIDB4MjcwQiB9LAogIHsgMHgyNzI4LCAweDI3MjggfSwK
ICB7IDB4Mjc0QywgMHgyNzRDIH0sCiAgeyAweDI3NEUsIDB4Mjc0RSB9LAogIHsgMHgyNzUz
LCAweDI3NTUgfSwKICB7IDB4Mjc1NywgMHgyNzU3IH0sCiAgeyAweDI3OTUsIDB4Mjc5NyB9
LAogIHsgMHgyN0IwLCAweDI3QjAgfSwKICB7IDB4MjdCRiwgMHgyN0JGIH0sCiAgeyAweDJC
MUIsIDB4MkIxQyB9LAogIHsgMHgyQjUwLCAweDJCNTAgfSwKICB7IDB4MkI1NSwgMHgyQjU1
IH0sCiAgeyAweEE5NjAsIDB4QTk3QyB9LAogIHsgMHgxNkZFMCwgMHgxNkZFNCB9LAogIHsg
MHgxNkZGMCwgMHgxNkZGMSB9LAogIHsgMHgxNzAwMCwgMHgxODdGNyB9LAogIHsgMHgxODgw
MCwgMHgxOENENSB9LAogIHsgMHgxOEQwMCwgMHgxOEQwOCB9LAogIHsgMHgxQUZGMCwgMHgx
QUZGMyB9LAogIHsgMHgxQUZGNSwgMHgxQUZGQiB9LAogIHsgMHgxQUZGRCwgMHgxQUZGRSB9
LAogIHsgMHgxQjAwMCwgMHgxQjEyMiB9LAogIHsgMHgxQjEzMiwgMHgxQjEzMiB9LAogIHsg
MHgxQjE1MCwgMHgxQjE1MiB9LAogIHsgMHgxQjE1NSwgMHgxQjE1NSB9LAogIHsgMHgxQjE2
NCwgMHgxQjE2NyB9LAogIHsgMHgxQjE3MCwgMHgxQjJGQiB9LAogIHsgMHgxRjAwNCwgMHgx
RjAwNCB9LAogIHsgMHgxRjBDRiwgMHgxRjBDRiB9LAogIHsgMHgxRjE4RSwgMHgxRjE4RSB9
LAogIHsgMHgxRjE5MSwgMHgxRjE5QSB9LAogIHsgMHgxRjIwMCwgMHgxRjIwMiB9LAogIHsg
MHgxRjIxMCwgMHgxRjIzQiB9LAogIHsgMHgxRjI0MCwgMHgxRjI0OCB9LAogIHsgMHgxRjI1
MCwgMHgxRjI1MSB9LAogIHsgMHgxRjI2MCwgMHgxRjI2NSB9LAogIHsgMHgxRjY4MCwgMHgx
RjZDNSB9LAogIHsgMHgxRjZDQywgMHgxRjZDQyB9LAogIHsgMHgxRjZEMCwgMHgxRjZEMiB9
LAogIHsgMHgxRjZENSwgMHgxRjZENyB9LAogIHsgMHgxRjZEQywgMHgxRjZERiB9LAogIHsg
MHgxRjZFQiwgMHgxRjZFQyB9LAogIHsgMHgxRjZGNCwgMHgxRjZGQyB9LAogIHsgMHgxRjdF
MCwgMHgxRjdFQiB9LAogIHsgMHgxRjdGMCwgMHgxRjdGMCB9LAogIHsgMHgxRkE3MCwgMHgx
RkE3QyB9LAogIHsgMHgxRkE4MCwgMHgxRkE4OCB9LAogIHsgMHgxRkE5MCwgMHgxRkFCRCB9
LAogIHsgMHgxRkFCRiwgMHgxRkFDNSB9LAogIHsgMHgxRkFDRSwgMHgxRkFEQiB9LAogIHsg
MHgxRkFFMCwgMHgxRkFFOCB9LAogIHsgMHgxRkFGMCwgMHgxRkFGOCB9LAp9OwoKc3RkOjp2
ZWN0b3I8c3RkOjpwYWlyPGludCwgaW50Pj4gZnJvbV8yX3RvXzEgPSB7CiAgeyAweDJFOUEs
IDB4MkU5QSB9LAogIHsgMHgyRUY0LCAweDJFRkYgfSwKICB7IDB4MkZENiwgMHgyRkVGIH0s
CiAgeyAweDJGRkMsIDB4MkZGRiB9LAogIHsgMHgzMDQwLCAweDMwNDAgfSwKICB7IDB4MzA5
NywgMHgzMDk4IH0sCiAgeyAweDMxMDAsIDB4MzEwNCB9LAogIHsgMHgzMTMwLCAweDMxMzAg
fSwKICB7IDB4MzE4RiwgMHgzMThGIH0sCiAgeyAweDMxRTQsIDB4MzFFRiB9LAogIHsgMHgz
MjFGLCAweDMyMUYgfSwKICB7IDB4QTQ4RCwgMHhBNDhGIH0sCiAgeyAweEE0QzcsIDB4QTRD
RiB9LAogIHsgMHhGRTUzLCAweEZFNTMgfSwKICB7IDB4RkU2NywgMHhGRTY3IH0sCiAgeyAw
eEZFNkMsIDB4RkU2RiB9LAogIHsgMHhGRjAwLCAweEZGMDAgfSwKICB7IDB4MzI0OCwgMHgz
MjRGIH0sCn07CgpzdGQ6OnN0cmluZyBlbmNvZGVfdXRmOChpbnQgY3ApIHsKICBzdGQ6OnN0
cmluZyBzOwogIGlmIChjcCA8PSAweDAwMDAwMDdGKSB7CiAgICBzICs9ICh1bnNpZ25lZCBj
aGFyKShjcCk7CiAgfSBlbHNlIGlmIChjcCA8PSAweDAwMDAwN0ZGKSB7CiAgICBzICs9ICh1
bnNpZ25lZCBjaGFyKSgweEMwICsgKChjcCA+PiA2KSAmIDB4MUYpKTsKICAgIHMgKz0gKHVu
c2lnbmVkIGNoYXIpKDB4ODAgKyAoY3AgJiAweDNGKSk7CiAgfSBlbHNlIGlmIChjcCA8PSAw
eDAwMDBEN0ZGKSB7CiAgICBzICs9ICh1bnNpZ25lZCBjaGFyKSgweEUwICsgKChjcCA+PiAx
MikgJiAweDBGKSk7CiAgICBzICs9ICh1bnNpZ25lZCBjaGFyKSgweDgwICsgKChjcCA+PiA2
KSAmIDB4M0YpKTsKICAgIHMgKz0gKHVuc2lnbmVkIGNoYXIpKDB4ODAgKyAoY3AgJiAweDNG
KSk7CiAgfSBlbHNlIGlmIChjcCA8PSAweDAwMDBERkZGKSB7CiAgICBhc3NlcnQoMCk7CiAg
fSBlbHNlIGlmIChjcCA8PSAweDAwMDBGRkZGKSB7CiAgICBzICs9ICh1bnNpZ25lZCBjaGFy
KSgweEUwICsgKChjcCA+PiAxMikgJiAweDBGKSk7CiAgICBzICs9ICh1bnNpZ25lZCBjaGFy
KSgweDgwICsgKChjcCA+PiA2KSAmIDB4M0YpKTsKICAgIHMgKz0gKHVuc2lnbmVkIGNoYXIp
KDB4ODAgKyAoY3AgJiAweDNGKSk7CiAgfSBlbHNlIGlmIChjcCA8PSAweDAwMTBGRkZGKSB7
CiAgICBzICs9ICh1bnNpZ25lZCBjaGFyKSgweEYwICsgKChjcCA+PiAxOCkgJiAweDA3KSk7
CiAgICBzICs9ICh1bnNpZ25lZCBjaGFyKSgweDgwICsgKChjcCA+PiAxMikgJiAweDNGKSk7
CiAgICBzICs9ICh1bnNpZ25lZCBjaGFyKSgweDgwICsgKChjcCA+PiA2KSAmIDB4M0YpKTsK
ICAgIHMgKz0gKHVuc2lnbmVkIGNoYXIpKDB4ODAgKyAoY3AgJiAweDNGKSk7CiAgfSBlbHNl
IHsKICAgIGFzc2VydCgwKTsKICB9CiAgcmV0dXJuIHM7Cn0KCnZvaWQgcHJpbnRfY2hhcmFj
dGVycygpIHsKICBzdGQ6OnByaW50ZigiZnJvbV8xX3RvXzI6XG4iKTsKICBmb3IgKGNvbnN0
IGF1dG8gJnAgOiBmcm9tXzFfdG9fMikgewogICAgZm9yIChpbnQgY3AgPSBwLmZpcnN0OyBj
cCA8PSBwLnNlY29uZDsgKytjcCkgewogICAgICBzdGQ6OnByaW50ZigiICBVKyUwNlg6ID4l
czxcbiIsIGNwLCBlbmNvZGVfdXRmOChjcCkuY19zdHIoKSk7CiAgICB9CiAgfQogIHN0ZDo6
cHJpbnRmKCJmcm9tXzJfdG9fMTpcbiIpOwogIGZvciAoY29uc3QgYXV0byAmcCA6IGZyb21f
Ml90b18xKSB7CiAgICBmb3IgKGludCBjcCA9IHAuZmlyc3Q7IGNwIDw9IHAuc2Vjb25kOyAr
K2NwKSB7CiAgICAgIHN0ZDo6cHJpbnRmKCIgIFUrJTA2WDogPiVzPFxuIiwgY3AsIGVuY29k
ZV91dGY4KGNwKS5jX3N0cigpKTsKICAgIH0KICB9Cn0KCmludCBtYWluKCkgewogIHByaW50
X2NoYXJhY3RlcnMoKTsKfQo=

--------------Dp6SbSv2oKS996oUGk2IuM0u--

Received on 2022-12-14 18:30:08