C++ Logo

sg16

Advanced search

Re: Width estimation

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 14 Sep 2022 16:41:44 +0200
On Wed, Sep 14, 2022, 16:15 Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:

> It is based on the wcswidth implementation that you linked to.
>
> > I think a better specification would be given that we have a floating
> reference to UAX44,
> > to say that codepoints that have the Unicode property
> "Emoji_Presentation" or
> > East_Asian_Width="W" have a width of 2.
>
> Not all emoji have a width of 2 and I'm not sure about East_Asian_Width
> being a reliable indicator either so if anyone is interested in writing a
> paper to improve width estimation (I'm not) at the very least I'd recommend
> checking presentations on several popular terminals.
>

The wcwidth implementation does use East Asian width. No magic there.

*Adds it to the pile of NB comments*


> Cheers,
> Victor
>
> On Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot_at_[hidden]> wrote:
>
>> Hey folks.
>>
>> How was the table of width in [format] derived?
>> http://eel.is/c++draft/format#string.std-12.sentence-3
>>
>> We have 2 issues here: Lack of explanation in the standard makes it hard
>> to evolve that table,
>> and it does require maintenance as the Unicode standard evolves.
>>
>> Reading the intent of
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,
>>
>> We do want:
>>
>> - To treat 0-width codepoint as 1
>> - To treat emojis as 2
>> - To treat full width east asian as 2.
>>
>> https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>>
>> I think a better specification would be given that we have a floating
>> reference to UAX44,
>> to say that codepoints that have the Unicode property
>> "Emoji_Presentation" or
>> East_Asian_Width="W" have a width of 2.
>>
>> This ensures implementation remains coherent as Unicode evolves.
>>
>> Thanks,
>> Corentin
>>
>>
>>
>>

Received on 2022-09-14 14:41:56