On Wed, Sep 14, 2022 at 7:41 AM Corentin <corentin.jabot@gmail.com> wrote:

On Wed, Sep 14, 2022, 16:15 Victor Zverovich <victor.zverovich@gmail.com> wrote:

It is based on the wcswidth implementation that you linked to.

> I think a better specification would be given that we have a floating reference to UAX44,

> to say that codepoints that have the Unicode property "Emoji_Presentation" or

> East_Asian_Width="W" have a width of 2.

Not all emoji have a width of 2 and I'm not sure about East_Asian_Width being a reliable indicator either so if anyone is interested in writing a paper to improve width estimation (I'm not) at the very least I'd recommend checking presentations on several popular terminals.

The wcwidth implementation does use East Asian width. No magic there.

*Adds it to the pile of NB comments*

Cheers,

Victor

On Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot@gmail.com> wrote:

Hey folks.

How was the table of width in [format] derived?

http://eel.is/c++draft/format#string.std-12.sentence-3

We have 2 issues here: Lack of explanation in the standard makes it hard to evolve that table,

and it does require maintenance as the Unicode standard evolves.

Reading the intent of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,

We do want:

To treat 0-width codepoint as 1

To treat emojis as 2

To treat full width east asian as 2.

https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

I think a better specification would be given that we have a floating reference to UAX44,

to say that codepoints that have the Unicode property "Emoji_Presentation" or

East_Asian_Width="W" have a width of 2.

This ensures implementation remains coherent as Unicode evolves.

Thanks,

Corentin