> The wcwidth implementation does use East Asian width. No magic there.

Sure.

> *Adds it to the pile of NB comments*

I don't think it can be an NB comment because width estimation was added in C++20.

Cheers,
Victor

On Wed, Sep 14, 2022 at 7:41 AM Corentin <corentin.jabot@gmail.com> wrote:


On Wed, Sep 14, 2022, 16:15 Victor Zverovich <victor.zverovich@gmail.com> wrote:
It is based on the wcswidth implementation that you linked to.

> I think a better specification would be given that we have a floating reference to UAX44,
> to say that codepoints that have the Unicode property "Emoji_Presentation" or 
> East_Asian_Width="W"  have a width of 2.

Not all emoji have a width of 2 and I'm not sure about East_Asian_Width being a reliable indicator either so if anyone is interested in writing a paper to improve width estimation (I'm not) at the very least I'd recommend checking presentations on several popular terminals.

The wcwidth implementation does use East Asian width. No magic there.

*Adds it to the pile of NB comments*


Cheers,
Victor

On Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot@gmail.com> wrote:
Hey folks.

How was the table of width in [format] derived? 

We have 2 issues here: Lack of explanation in the standard makes it hard to evolve that table,
and it does require maintenance as the Unicode standard evolves.


We do want: 
  • To treat 0-width codepoint as 1
  • To treat emojis as 2
  • To treat full width east asian as 2.  

I think a better specification would be given that we have a floating reference to UAX44,
to say that codepoints that have the Unicode property "Emoji_Presentation" or 
East_Asian_Width="W"  have a width of 2.

This ensures implementation remains coherent as Unicode evolves.

Thanks, 
Corentin