C++ Logo

sg16

Advanced search

Re: Width estimation

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 14 Sep 2022 14:07:34 -0400
On 9/14/22 10:43 AM, Victor Zverovich via SG16 wrote:
> > The wcwidth implementation does use East Asian width. No magic there.
>
> Sure.
>
> > *Adds it to the pile of NB comments*
>
> I don't think it can be an NB comment because width estimation was
> added in C++20.

I believe NB comments may cover any aspect of the standard regardless of
when the relevant text was introduced (and doing so is appropriate
because times change; e.g., updates needed for newer Unicode versions).

Tom.

>
> Cheers,
> Victor
>
> On Wed, Sep 14, 2022 at 7:41 AM Corentin <corentin.jabot_at_[hidden]> wrote:
>
>
>
> On Wed, Sep 14, 2022, 16:15 Victor Zverovich
> <victor.zverovich_at_[hidden]> wrote:
>
> It is based on the wcswidth implementation that you linked to.
>
> > I think a better specification would be given that we have a
> floating reference to UAX44,
> > to say that codepoints that have the Unicode property
> "Emoji_Presentation" or
> > East_Asian_Width="W" have a width of 2.
>
> Not all emoji have a width of 2 and I'm not sure
> about East_Asian_Width being a reliable indicator either so if
> anyone is interested in writing a paper to improve width
> estimation (I'm not) at the very least I'd recommend checking
> presentations on several popular terminals.
>
>
> The wcwidth implementation does use East Asian width. No magic there.
>
> *Adds it to the pile of NB comments*
>
>
> Cheers,
> Victor
>
> On Wed, Sep 14, 2022 at 2:28 AM Corentin
> <corentin.jabot_at_[hidden]> wrote:
>
> Hey folks.
>
> How was the table of width in [format] derived?
> http://eel.is/c++draft/format#string.std-12.sentence-3
>
> We have 2 issues here: Lack of explanation in the standard
> makes it hard to evolve that table,
> and it does require maintenance as the Unicode standard
> evolves.
>
> Reading the intent of
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,
>
> We do want:
>
> * To treat 0-width codepoint as 1
> * To treat emojis as 2
> * To treat full width east asian as 2.
>
> https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>
> I think a better specification would be given that we have
> a floating reference to UAX44,
> to say that codepoints that have the Unicode property
> "Emoji_Presentation" or
> East_Asian_Width="W" have a width of 2.
>
> This ensures implementation remains coherent as Unicode
> evolves.
>
> Thanks,
> Corentin
>
>
>
>

Received on 2022-09-14 18:07:37