> The wcwidth implementation does use East Asian width. No magic there.
Sure.
> *Adds it to the pile of NB comments*
I don't think it can be an NB comment because width estimation was added in C++20.
I believe NB comments may cover any aspect of the standard
regardless of when the relevant text was introduced (and doing so
is appropriate because times change; e.g., updates needed for
newer Unicode versions).
Tom.
Cheers,Victor
On Wed, Sep 14, 2022 at 7:41 AM Corentin <corentin.jabot@gmail.com> wrote:
On Wed, Sep 14, 2022, 16:15 Victor Zverovich <victor.zverovich@gmail.com> wrote:
It is based on the wcswidth implementation that you linked to.
> I think a better specification would be given that we have a floating reference to UAX44,> to say that codepoints that have the Unicode property "Emoji_Presentation" or> East_Asian_Width="W" have a width of 2.
Not all emoji have a width of 2 and I'm not sure about East_Asian_Width being a reliable indicator either so if anyone is interested in writing a paper to improve width estimation (I'm not) at the very least I'd recommend checking presentations on several popular terminals.
The wcwidth implementation does use East Asian width. No magic there.
*Adds it to the pile of NB comments*
Cheers,Victor
On Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot@gmail.com> wrote:
Hey folks.
How was the table of width in [format] derived?
We have 2 issues here: Lack of explanation in the standard makes it hard to evolve that table,and it does require maintenance as the Unicode standard evolves.
Reading the intent of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,
We do want:
- To treat 0-width codepoint as 1
- To treat emojis as 2
- To treat full width east asian as 2.
I think a better specification would be given that we have a floating reference to UAX44,to say that codepoints that have the Unicode property "Emoji_Presentation" orEast_Asian_Width="W" have a width of 2.
This ensures implementation remains coherent as Unicode evolves.
Thanks,Corentin