After that initial message I proposed a fix https://lists.isocpp.org/sg16/att-3402/attachment
I would be interested in your opinion.
Note that I'm not trying to handle any ambiguous, corner cases, or even 0 width characters, just trying to make the spec forward compatible by replacing the ranges of east asian wide by mentioning the property directly.

Thanks! 

On Fri, Sep 16, 2022 at 12:07 AM Corentin <corentin.jabot@gmail.com> wrote:
Thanks a lot for your reply.

To clarify, i meant the C++ standard.
Ie this list http://eel.is/c++draft/format#string.std-12.sentence-3

My understanding is that it was derived from what happened to be east Asian width in some older Unicode version (5.0) and then modified.
I'm pushing C++ to treat all East_Asian_width W and F codepoints as being 2 for the purpose of terminal width estimation, which is standard practice.
Of course, any help from Unicode for how to treat other codepoints would be much appreciated. Especially a list of 0 width codepoints!



On Thu, Sep 15, 2022, 23:58 Steven R. Loomis <srl295@gmail.com> wrote:
Hi. Briefly, we’ve discussed this issue a little bit at UTC, and I’ve tried to engage terminal emulator vendors, who are who probably need to be part of the discussion.

I’m not sure about "Lack of explanation in the standard”, I think wording was added to make these updates out of scope. 

I can try to dig up previous discussion if needed. 

-s

--
Steven R. Loomis
Code Hive Tx, LLC



On Sep 14, 2022, at 4:28 AM, Corentin via SG16 <sg16@lists.isocpp.org> wrote:

Hey folks.

How was the table of width in [format] derived? 

We have 2 issues here: Lack of explanation in the standard makes it hard to evolve that table,
and it does require maintenance as the Unicode standard evolves.


We do want: 
  • To treat 0-width codepoint as 1
  • To treat emojis as 2
  • To treat full width east asian as 2.  

I think a better specification would be given that we have a floating reference to UAX44,
to say that codepoints that have the Unicode property "Emoji_Presentation" or 
East_Asian_Width="W"  have a width of 2.

This ensures implementation remains coherent as Unicode evolves.

Thanks, 
Corentin



--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16