C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-direction] DG answer to the Unicode Direction paper (P1238R0)

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 9 Jan 2019 23:14:32 -0500
Thank you, Howard! A few inline comments below...

On 1/9/19 2:34 PM, Howard Hinnant wrote:
> Below is the Directions Group’s response to P1238R0 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1238r0.html).
>
> We welcome this effort. For now, we prefer to comment on general principles and direction only, rather than on technical details:

Understood. Do you have any feedback regarding the constraints listed
in the paper? Any desire to challenge one or more of them?

The constraint I'd most like feedback on is 1.1 (The ordinary and wide
execution encodings are implementation defined). If Microsoft were to
support use of UTF-8 as the execution encoding (something they are
making steps towards), it may be conceivable that we could standardize
the execution encoding as UTF-8 and have that actually reflect existing
practice (implementations would presumably continue to offer support for
legacy encodings as an extension). However, this would leave some
platforms behind; z/OS being the primary example. z/OS continues to
maintain a significant presence in the industry (as I understand it,
good numbers are hard to find), but IBM has not been keeping up with C++
standards. Some guidance regarding how to think about platforms that
are not keeping up with the standard would be appreciated.

>
> • The list of authors is suitably long. The task of formally bringing Unicode into C++ requires a breath of experience. Someone must look out for the interests of the various platforms (Linux, Windows, embedded, HPC, etc.) and the various groups of developers (OS, foundation library, end-user, etc.). We recommend trying to keep constant contact with people with current practical experiences in all of those fields. Also, Bob Steagall has done some work in this area based on his CPPCON talk; and IBM directions should be obtained from IBM representatives in the committee. Could you recruit them for this?
I have reached out to such representatives. Hubert follows along and
chimes in from time-to-time. Bob Steagall has joined us at times and he
and I recently coordinated scheduling to better enable him to join our
meetings. I'd love to have more representation from platform vendors
and probably do need to spend some more time recruiting again.
>
> • §3. Direction: We feel that the scope and end goal of this work is not crystal clear: what is the goal of this SG? What is its deliverables?
> • Is it trying to unify the many wide character sets into ISO C++ or trying to add more of the varying wide character sets into ISO C++, or even something else?
> • And maybe your goal is to give feedbacks and small tweaks to all these different wide character standards and see how they can best fit in ISO C++
> • Maybe §3 could be clarified?

We're still working to define our deliverables. As noted in the paper,
our short-term focus has been on small features for C++20. Now that
we're wrapping C++20 up, we'll need to get more focused on the big
picture for C++23 and beyond. While in San Diego we identified a set of
priorities for further work. Those can be found near the bottom of the
page at http://wiki.edg.com/bin/view/Wg21sandiego2018/P1238R0 (higher
numbers are higher priority). This list probably won't make much sense
in isolation though; we'll need to incorporate it into future direction
papers.

I'm not quite sure what is meant by "wide character sets" in the
questions above. Perhaps this is referring to legacy encodings like
Shift-JIS? Adding additional support for specific legacy encodings is
not something that we plan to work on. However, to the extent that we
interoperate with the implementation defined execution encoding (which
could be Shift-JIS or any other legacy encoding), then the interfaces we
design may have to accommodate such encodings to some extent. Our focus
is primarily providing feature support for Unicode encodings, the
Unicode character set, and Unicode algorithms.

>
> • §2. Guidelines: Beware of adjectives: “Avoid excessive inventiveness” and “avoid gratuitous departure from C”. These are good and necessary guidelines, but those adjectives can be awfully slippery. In particular, there is a danger of lowering the level of interfaces to the C level, causing verbosity and creating error- and security-hazards. Note: for about a decade, checking function argument in C++ was widely condemned as a gratuitous incompatibility with C.
Agreed, thanks for pointing this out.
>
> • §4.1. We like the idea of std::text and std::text_view with more suitable interfaces than the (bloated) std::string one. We wonder how encodings will be presented to/in the type system.
How encodings are presented in the type system is TBD. P0244
(text_view) proposes encoding classes that also serve as encoding tags.
However, we also have proponents for std::text and std::text_view being
UTF-8 only. More work is needed to drive consensus here.
>
> • §4.1. Without saying how, we suggest that std::text and std::text_view should be usable as ranges (the Ranges TS and moved to the WP). Wherever possible, the technical issue should be resolved in favor of simple, elegant, and correct use, rather than consistency with older STL rules crafted for fixed-sized elements.
Strongly agreed.
>
> We hope you find these brief comments constructive.

Yes, thank you!

Tom.

>
> DG
>
>
> _______________________________________________
> Direction mailing list
> Direction_at_[hidden]
> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/direction
> Searchable archives: http://lists.isocpp.org/direction/2019/01/index.php



Received on 2019-01-10 05:21:48