C++ Logo

sg16

Advanced search

Re: Agenda for the 2022-11-30 SG16 telecon​; questions for P2675R0

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 30 Nov 2022 10:29:45 -0500
On 11/29/22 6:31 PM, Corentin Jabot wrote:
>
>
> On Wed, Nov 30, 2022 at 12:09 AM Tom Honermann <tom_at_[hidden]> wrote:
>
> On 11/29/22 6:04 PM, Corentin Jabot wrote:
>>
>>
>> On Tue, Nov 29, 2022 at 11:47 PM Tom Honermann
>> <tom_at_[hidden]> wrote:
>>
>> On 11/29/22 5:32 PM, Corentin Jabot wrote:
>>>
>>>
>>> On Tue, Nov 29, 2022 at 11:13 PM Tom Honermann via SG16
>>> <sg16_at_[hidden]> wrote:
>>>
>>> Corentin, I'm having a hard time understanding what the
>>> screenshots included in P2675R0 are showing. The paper
>>> doesn't explain (or I missed it) what was done to
>>> produce those screenshots. I can't tell what code points
>>> the displayed glyphs correspond to. The set of
>>> characters displayed seems to differ; the number of rows
>>> displayed is not consistent. It doesn't seem to be
>>> possible to dive into the data to determine which
>>> characters are being rendered differently. Basically,
>>> I'm unable to evaluate whether the screenshots support
>>> the proposal.
>>>
>>> It would also be helpful if the screenshots included
>>> information about the terminal encoding (almost always
>>> UTF-8 I would guess/hope) and the font(s) used. I
>>> recognize gathering such data could be challenging.
>>>
>>>
>>> Well, given the first collection took many days and involved
>>> a large number of people, that may prove challenging.
>>> I did intend to have the characters separated by |, but by
>>> the time I realized that it was too late.
>>> Knowing the fonts used would be even more challenging as in
>>> the absence of a codepoint renderers will try to find some
>>> other similar font.
>>> For east asian characters, they will be aligned no matter
>>> the font, for emoji they may or may not be rendered on the grid.
>>>
>>> The paper contains a script that was used to generate the
>>> list of codepoint rendered.
>>> I intended to include that list however it proved difficult
>>> to handle for google docs, and apparently
>>> subsequentely forgot to put a reference
>>> https://gist.githubusercontent.com/cor3ntin/e5731f77574b146d806e39283e8c7cb7/raw/01d760e7fbb6a56c637a3ce34688e1da286df287/full_width.hpp
>>> Feel free to try on your system.
>>>
>>> The TL;DR is that on a conforming system with sufficient
>>> fonts - which is not all systems, in part because Unicode 15
>>> is recent and not all terminals display wide tofu properly -
>>> what is proposed is coherent with existing terminal
>>> behaviors. And we should not try to placate over temporary
>>> deficiency of one specific terminal over another.
>>
>> Thanks, could you please add that kind of description to the
>> next revision of the paper? Not for tomorrow obviously.
>>
>> Having taken a look at the gist linked above, I have to
>> conclude that the screenshots currently in the paper are not
>> beneficial. Some screenshots show rows from the top of the
>> gist and others from the bottom, but I can only tell that by
>> guessing based on glyph shape and being locally able to look
>> at the entire gist.
>>
>> Having a large number of screenshots is not important to me.
>> I would be happy to have examples for just the following. But
>> I would like screenshots that provide an apples-to-apples
>> comparison and that show the entire gist.
>>
>> * Linux with Konsole.
>> * Linux with gnome-terminal.
>> * Linux with non-graphical terminal.
>> * Windows with cmd.exe.
>> * Windows with Windows Terminal.
>> * Windows with Putty.
>> * macOS with its default terminal.
>>
>>
>> All of these are provided in the paper.
> They are present, but they are incomplete and it is not possible
> for me to make sense of them.
>> You are free to collect your own of course, not everyone is able
>> to show every codepoint at once given screen size, etc.
> I understand that taking multiple screenshots would likely be
> required to show a complete presentation.
>> And it's beside the point.
>> My point is that the standard currently has arbitrary rules that
>> do not follow any existing practice.
> Victor obtained the list by looking at existing practice.
>
>
> Let's continue that tomorrow.
> That is explained in the paper, Victor looked at existing
> practice long before Unicode 15 (his work was based on 13),
> and based on an implementation of wcwidth which I also reference and
> explain.
>
> You can also look at other existing practice
>
> https://github.com/microsoft/terminal/blob/main/src/types/CodepointWidthDetector.cpp
> https://github.com/KDE/konsole/blob/b8325d2f1842e8f5b4999e7e4510093895818dee/tools/uni2characterwidth/uni2characterwidth.cpp#L892
> https://github.com/GNOME/vte/blob/master/doc/ambiguous.txt

Thank you for those links! Please add them to a revision of the paper,
preferably with some compare/contrast analysis.

Tom.

>
> (Note that terminals are much smarter than the standard but they all
> consider W and F codepoints as double width)
>
>> Can you justify the current list, as it is in the standard?
>> Can you justify https://godbolt.org/z/McoG64n1v ?
>
> I'm trying to evaluate your proposal against the status quo.
> Unfortunately, the paper is not proving all that helpful in doing so.
>
> Given the list of codepoints I sent you (the gist), would you argue
> any of them should have a narrow rendering?
>
> Tom.
>
>> Tom.
>>
>>>
>>> Tom.
>>>
>>> On 11/29/22 3:40 PM, Tom Honermann via SG16 wrote:
>>>>
>>>> SG16 will hold a telecon on Wednesday, November 30th,
>>>> at 19:30 UTC (timezone conversion
>>>> <https://www.timeanddate.com/worldclock/converter.html?iso=20221130T193000&p1=1440&p2=tz_pst&p3=tz_mst&p4=tz_cst&p5=tz_est&p6=tz_cet>).
>>>>
>>>> *This message will also serve as your friendly reminder
>>>> that this meeting is taking place tomorrow. **I'm sorry
>>>> for publishing an agenda so very late. *
>>>>
>>>> *For participants in the USA, please note that daylight
>>>> savings time ended 2022-11-06, so this telecon will
>>>> start one hour earlier than our last telecon.*
>>>>
>>>> The agenda follows. We won't get through all of these.
>>>> These are all of the NB comments we have left to
>>>> address. Whatever we don't get to in this meeting will
>>>> be scheduled for the December 14th meeting.
>>>>
>>>> * P2713R0: Escaping improvements in std::format
>>>> <https://wg21.link/p2713r0>
>>>> o US 38-098 22.14.6.4p1 [format.string.escaped]
>>>> Escaping for debugging and logging
>>>> <https://github.com/cplusplus/nbballot/issues/515>
>>>> o FR 005-134 22.14.6.4 [format.string.escaped]
>>>> Aggressive escaping
>>>> <https://github.com/cplusplus/nbballot/issues/408>
>>>> * P2693R0: Formatting thread::id and stacktrace
>>>> <https://wg21.link/p2693r0>
>>>> o FR-008-011 22.14 [format] Support formatting of
>>>> thread::id
>>>> <https://github.com/cplusplus/nbballot/issues/410>
>>>> * FR-010-133 [Bibliography] Unify references to
>>>> Unicode
>>>> <https://github.com/cplusplus/nbballot/issues/412> and
>>>> FR-021-013 5.3p5.2 [lex.charset] Codepoint names in
>>>> identifiers
>>>> <https://github.com/cplusplus/nbballot/issues/423>
>>>> * P2675R0: LWG3780: The Paper (format's width
>>>> estimation is too approximate and not forward
>>>> compatible) <https://wg21.link/p2675r0>
>>>> o LWG #3780: format's width estimation is too
>>>> approximate and not forward compatible
>>>> <https://cplusplus.github.io/LWG/issue3780>
>>>> o FR-007-012 22.14.2.2 [format.string.std]
>>>> codepoints with width 2
>>>> <https://github.com/cplusplus/nbballot/issues/409>
>>>> * FR-020-014 5.3 [lex.charset] Replace "translation
>>>> character set" by "Unicode"
>>>> <https://github.com/cplusplus/nbballot/issues/422>
>>>>
>>>> P2713R0 <https://wg21.link/p2713r0> (Escaping
>>>> improvements in std::format) implements the SG16
>>>> proposed resolutions for US 38-098
>>>> <https://github.com/cplusplus/nbballot/issues/515> (see
>>>> the 2022-10-19 SG16 meeting summary
>>>> <https://github.com/sg16-unicode/sg16-meetings#october-19th-2022>)
>>>> and FR 005-134
>>>> <https://github.com/cplusplus/nbballot/issues/408> (see
>>>> the 2022-11-02 SG16 meeting summary
>>>> <https://github.com/sg16-unicode/sg16-meetings#november-2nd-2022>).
>>>> We'll review the wording and then poll forwarding to
>>>> LEWG as the resolution of the two NB comments.
>>>>
>>>> Candidate Poll 1: P2713R0: Forward to LEWG as the
>>>> recommended resolution of US 38-098 and FR 005-134
>>>> [amended to ...].
>>>>
>>>> P2693R0 <https://wg21.link/p2693r0> (Formatting
>>>> thread::id and stacktrace) is intended to resolve
>>>> FR-008-011
>>>> <https://github.com/cplusplus/nbballot/issues/410>. I
>>>> did not initially tag this NB comment as needing SG16
>>>> review, but Bryce requested that SG16 take a look,
>>>> specifically with regard to narrow vs wide formatting.
>>>> Bryce has indicated this paper will need to be approved
>>>> soon in order for it to appear in the electronic
>>>> polling that will be conducted in January.
>>>>
>>>> Candidate Poll 2: P2693R0: Forward to LEWG as the
>>>> recommended resolution of FR-008-011 [amended to ...].
>>>>
>>>> FR-010-133
>>>> <https://github.com/cplusplus/nbballot/issues/412> and
>>>> FR-021-013
>>>> <https://github.com/cplusplus/nbballot/issues/423> were
>>>> discussed during the 2022-11-02 SG16 meeting
>>>> <https://github.com/sg16-unicode/sg16-meetings#november-2nd-2022>
>>>> and concluded with a recommendation to discuss with the
>>>> project editor the possibility of preferring the
>>>> Unicode Standard over ISO/IEC 10646 within the C++
>>>> standard. The project editor approved this direction
>>>> and we can now move forward with drafting wording
>>>> changes. This will require a paper produced in short
>>>> order if it is to be accepted for C++23.
>>>>
>>>> P2675R0 <https://wg21.link/p2675r0> (LWG3780: The Paper
>>>> (format's width estimation is too approximate and not
>>>> forward compatible)) is intended to resolve LWG #3780
>>>> <https://cplusplus.github.io/LWG/issue3780> and
>>>> FR-007-012
>>>> <https://github.com/cplusplus/nbballot/issues/409>. It
>>>> seeks to replace the explicit list of code point ranges
>>>> in [format.string.std]p12
>>>> <https://eel.is/c++draft/format.string.std#12> with
>>>> wording that derives substantially the same set of code
>>>> points using Unicode database properties.
>>>>
>>>> Candidate Poll 3.1: P2675R0: Forward to LEWG as the
>>>> recommended resolution of FR-007-012 [amended to ...].
>>>> Candidate Poll 3.2: P2675R0: Forward to LEWG for
>>>> C++26 [amended to ...].
>>>> Candidate Poll 3.3: Recommend to LEWG that
>>>> FR-007-012 be rejected.
>>>>
>>>> FR-020-014
>>>> <https://github.com/cplusplus/nbballot/issues/422>
>>>> raises concerns that were discussed as part of the
>>>> reviews of P2314 <https://wg21.link/p2314> and P2297
>>>> <https://wg21.link/p2297> during the 2021-03-24 SG16
>>>> meeting
>>>> <https://github.com/sg16-unicode/sg16-meetings/blob/master/README-2021.md#march-24th-2021>.
>>>> The comment does not appear to present new information.
>>>> If we choose to accept, a paper will need to be quickly
>>>> produced.
>>>>
>>>> Candidate poll 4.1: Recommend to CWG that
>>>> FR-020-014 be accepted.
>>>> Candidate poll 4.2: Recommend to CWG that
>>>> FR-020-014 be rejected.
>>>>
>>>> Tom.
>>>>
>>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>

Received on 2022-11-30 15:29:47