C++ Logo

sg16

Advanced search

Re: Width estimation

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Wed, 14 Sep 2022 07:15:25 -0700
It is based on the wcswidth implementation that you linked to.

> I think a better specification would be given that we have a floating
reference to UAX44,
> to say that codepoints that have the Unicode property
"Emoji_Presentation" or
> East_Asian_Width="W" have a width of 2.

Not all emoji have a width of 2 and I'm not sure about East_Asian_Width
being a reliable indicator either so if anyone is interested in writing a
paper to improve width estimation (I'm not) at the very least I'd recommend
checking presentations on several popular terminals.

Cheers,
Victor

On Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot_at_[hidden]> wrote:

> Hey folks.
>
> How was the table of width in [format] derived?
> http://eel.is/c++draft/format#string.std-12.sentence-3
>
> We have 2 issues here: Lack of explanation in the standard makes it hard
> to evolve that table,
> and it does require maintenance as the Unicode standard evolves.
>
> Reading the intent of
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,
>
> We do want:
>
> - To treat 0-width codepoint as 1
> - To treat emojis as 2
> - To treat full width east asian as 2.
>
> https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>
> I think a better specification would be given that we have a floating
> reference to UAX44,
> to say that codepoints that have the Unicode property "Emoji_Presentation"
> or
> East_Asian_Width="W" have a width of 2.
>
> This ensures implementation remains coherent as Unicode evolves.
>
> Thanks,
> Corentin
>
>
>
>

Received on 2022-09-14 14:15:37