On Wed, Sep 14, 2022, 16:15 Victor Zverovich <victor.zverovich@gmail.com> wrote:It is based on the wcswidth implementation that you linked to.> I think a better specification would be given that we have a floating reference to UAX44,> to say that codepoints that have the Unicode property "Emoji_Presentation" or> East_Asian_Width="W" have a width of 2.Not all emoji have a width of 2 and I'm not sure about East_Asian_Width being a reliable indicator either so if anyone is interested in writing a paper to improve width estimation (I'm not) at the very least I'd recommend checking presentations on several popular terminals.The wcwidth implementation does use East Asian width. No magic there.*Adds it to the pile of NB comments*Cheers,VictorOn Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot@gmail.com> wrote:Hey folks.How was the table of width in [format] derived?We have 2 issues here: Lack of explanation in the standard makes it hard to evolve that table,and it does require maintenance as the Unicode standard evolves.Reading the intent of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,We do want:
- To treat 0-width codepoint as 1
- To treat emojis as 2
- To treat full width east asian as 2.
I think a better specification would be given that we have a floating reference to UAX44,to say that codepoints that have the Unicode property "Emoji_Presentation" orEast_Asian_Width="W" have a width of 2.This ensures implementation remains coherent as Unicode evolves.Thanks,Corentin