C++ Logo

sg16

Advanced search

Re: Width estimation

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Wed, 14 Sep 2022 07:43:55 -0700
> The wcwidth implementation does use East Asian width. No magic there.

Sure.

> *Adds it to the pile of NB comments*

I don't think it can be an NB comment because width estimation was added in
C++20.

Cheers,
Victor

On Wed, Sep 14, 2022 at 7:41 AM Corentin <corentin.jabot_at_[hidden]> wrote:

>
>
> On Wed, Sep 14, 2022, 16:15 Victor Zverovich <victor.zverovich_at_[hidden]>
> wrote:
>
>> It is based on the wcswidth implementation that you linked to.
>>
>> > I think a better specification would be given that we have a floating
>> reference to UAX44,
>> > to say that codepoints that have the Unicode property
>> "Emoji_Presentation" or
>> > East_Asian_Width="W" have a width of 2.
>>
>> Not all emoji have a width of 2 and I'm not sure about East_Asian_Width
>> being a reliable indicator either so if anyone is interested in writing a
>> paper to improve width estimation (I'm not) at the very least I'd recommend
>> checking presentations on several popular terminals.
>>
>
> The wcwidth implementation does use East Asian width. No magic there.
>
> *Adds it to the pile of NB comments*
>
>
>> Cheers,
>> Victor
>>
>> On Wed, Sep 14, 2022 at 2:28 AM Corentin <corentin.jabot_at_[hidden]>
>> wrote:
>>
>>> Hey folks.
>>>
>>> How was the table of width in [format] derived?
>>> http://eel.is/c++draft/format#string.std-12.sentence-3
>>>
>>> We have 2 issues here: Lack of explanation in the standard makes it hard
>>> to evolve that table,
>>> and it does require maintenance as the Unicode standard evolves.
>>>
>>> Reading the intent of
>>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,
>>>
>>> We do want:
>>>
>>> - To treat 0-width codepoint as 1
>>> - To treat emojis as 2
>>> - To treat full width east asian as 2.
>>>
>>> https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>>>
>>> I think a better specification would be given that we have a floating
>>> reference to UAX44,
>>> to say that codepoints that have the Unicode property
>>> "Emoji_Presentation" or
>>> East_Asian_Width="W" have a width of 2.
>>>
>>> This ensures implementation remains coherent as Unicode evolves.
>>>
>>> Thanks,
>>> Corentin
>>>
>>>
>>>
>>>

Received on 2022-09-14 14:44:07