ISOCPP sg16 List: Re: Width estimation

From: Corentin <corentin.jabot_at_[hidden]>
Date: Fri, 16 Sep 2022 00:17:19 +0200

After that initial message I proposed a fix
https://lists.isocpp.org/sg16/att-3402/attachment
I would be interested in your opinion.
Note that I'm not trying to handle any ambiguous, corner cases, or even 0
width characters, just trying to make the spec forward compatible by
replacing the ranges of east asian wide by mentioning the property directly.

Thanks!

On Fri, Sep 16, 2022 at 12:07 AM Corentin <corentin.jabot_at_[hidden]> wrote:

> Thanks a lot for your reply.
>
> To clarify, i meant the C++ standard.
> Ie this list http://eel.is/c++draft/format#string.std-12.sentence-3
>
> My understanding is that it was derived from what happened to be east
> Asian width in some older Unicode version (5.0) and then modified.
> I'm pushing C++ to treat all East_Asian_width W and F codepoints as being
> 2 for the purpose of terminal width estimation, which is standard practice.
> Of course, any help from Unicode for how to treat other codepoints would
> be much appreciated. Especially a list of 0 width codepoints!
>
>
>
> On Thu, Sep 15, 2022, 23:58 Steven R. Loomis <srl295_at_[hidden]> wrote:
>
>> Hi. Briefly, we’ve discussed this issue a little bit at UTC, and I’ve
>> tried to engage terminal emulator vendors, who are who probably need to be
>> part of the discussion.
>>
>> I’m not sure about "Lack of explanation in the standard”, I think wording
>> was added to make these updates out of scope.
>>
>> I can try to dig up previous discussion if needed.
>>
>> -s
>>
>> --
>> Steven R. Loomis
>> Code Hive Tx, LLC
>> https://codehivetx.us
>>
>>
>>
>> On Sep 14, 2022, at 4:28 AM, Corentin via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>> Hey folks.
>>
>> How was the table of width in [format] derived?
>> http://eel.is/c++draft/format#string.std-12.sentence-3
>>
>> We have 2 issues here: Lack of explanation in the standard makes it hard
>> to evolve that table,
>> and it does require maintenance as the Unicode standard evolves.
>>
>> Reading the intent of
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1868r2.html,
>>
>> We do want:
>>
>> - To treat 0-width codepoint as 1
>> - To treat emojis as 2
>> - To treat full width east asian as 2.
>>
>> https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>>
>> I think a better specification would be given that we have a floating
>> reference to UAX44,
>> to say that codepoints that have the Unicode property
>> "Emoji_Presentation" or
>> East_Asian_Width="W" have a width of 2.
>>
>> This ensures implementation remains coherent as Unicode evolves.
>>
>> Thanks,
>> Corentin
>>
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>>
>>

Received on 2022-09-15 22:17:32