C++ Logo


Advanced search

Re: [SG16-Unicode] [isocpp-direction] DG answer to the Unicode Direction paper (P1238R0)

From: Tom Honermann <tom_at_[hidden]>
Date: Fri, 11 Jan 2019 23:48:29 -0500
On 1/10/19 12:14 AM, Tony V E wrote:
> If we think that in 5 or 10 or 15 years the world (ie platforms we
> care about) will finally realize UTF-8 is the right answer, maybe we
> should just support that, and just leave enough space that makes other
> encodings possible, but not required.

I think that is pretty much the status quo. The question is more, when
a platform is starting to lag the standard but isn't actually dead, how
long do we give it before giving up on it? Are there some guidelines we
can adopt?


> On Wed, Jan 9, 2019 at 11:21 PM Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> Thank you, Howard! A few inline comments below...
> On 1/9/19 2:34 PM, Howard Hinnant wrote:
>> Below is the Directions Group’s response to P1238R0 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1238r0.html).
>> We welcome this effort. For now, we prefer to comment on general principles and direction only, rather than on technical details:
> Understood. Do you have any feedback regarding the constraints
> listed in the paper? Any desire to challenge one or more of them?
> The constraint I'd most like feedback on is 1.1 (The ordinary and
> wide execution encodings are implementation defined). If
> Microsoft were to support use of UTF-8 as the execution encoding
> (something they are making steps towards), it may be conceivable
> that we could standardize the execution encoding as UTF-8 and have
> that actually reflect existing practice (implementations would
> presumably continue to offer support for legacy encodings as an
> extension). However, this would leave some platforms behind; z/OS
> being the primary example. z/OS continues to maintain a
> significant presence in the industry (as I understand it, good
> numbers are hard to find), but IBM has not been keeping up with
> C++ standards. Some guidance regarding how to think about
> platforms that are not keeping up with the standard would be
> appreciated.
>> • The list of authors is suitably long. The task of formally bringing Unicode into C++ requires a breath of experience. Someone must look out for the interests of the various platforms (Linux, Windows, embedded, HPC, etc.) and the various groups of developers (OS, foundation library, end-user, etc.). We recommend trying to keep constant contact with people with current practical experiences in all of those fields. Also, Bob Steagall has done some work in this area based on his CPPCON talk; and IBM directions should be obtained from IBM representatives in the committee. Could you recruit them for this?
> I have reached out to such representatives. Hubert follows along
> and chimes in from time-to-time. Bob Steagall has joined us at
> times and he and I recently coordinated scheduling to better
> enable him to join our meetings. I'd love to have more
> representation from platform vendors and probably do need to spend
> some more time recruiting again.
>> • §3. Direction: We feel that the scope and end goal of this work is not crystal clear: what is the goal of this SG? What is its deliverables?
>> • Is it trying to unify the many wide character sets into ISO C++ or trying to add more of the varying wide character sets into ISO C++, or even something else?
>> • And maybe your goal is to give feedbacks and small tweaks to all these different wide character standards and see how they can best fit in ISO C++
>> • Maybe §3 could be clarified?
> We're still working to define our deliverables. As noted in the
> paper, our short-term focus has been on small features for C++20.
> Now that we're wrapping C++20 up, we'll need to get more focused
> on the big picture for C++23 and beyond. While in San Diego we
> identified a set of priorities for further work. Those can be
> found near the bottom of the page at
> http://wiki.edg.com/bin/view/Wg21sandiego2018/P1238R0 (higher
> numbers are higher priority). This list probably won't make much
> sense in isolation though; we'll need to incorporate it into
> future direction papers.
> I'm not quite sure what is meant by "wide character sets" in the
> questions above. Perhaps this is referring to legacy encodings
> like Shift-JIS? Adding additional support for specific legacy
> encodings is not something that we plan to work on. However, to
> the extent that we interoperate with the implementation defined
> execution encoding (which could be Shift-JIS or any other legacy
> encoding), then the interfaces we design may have to accommodate
> such encodings to some extent. Our focus is primarily providing
> feature support for Unicode encodings, the Unicode character set,
> and Unicode algorithms.
>> • §2. Guidelines: Beware of adjectives: “Avoid excessive inventiveness” and “avoid gratuitous departure from C”. These are good and necessary guidelines, but those adjectives can be awfully slippery. In particular, there is a danger of lowering the level of interfaces to the C level, causing verbosity and creating error- and security-hazards. Note: for about a decade, checking function argument in C++ was widely condemned as a gratuitous incompatibility with C.
> Agreed, thanks for pointing this out.
>> • §4.1. We like the idea of std::text and std::text_view with more suitable interfaces than the (bloated) std::string one. We wonder how encodings will be presented to/in the type system.
> How encodings are presented in the type system is TBD. P0244
> (text_view) proposes encoding classes that also serve as encoding
> tags. However, we also have proponents for std::text and
> std::text_view being UTF-8 only. More work is needed to drive
> consensus here.
>> • §4.1. Without saying how, we suggest that std::text and std::text_view should be usable as ranges (the Ranges TS and moved to the WP). Wherever possible, the technical issue should be resolved in favor of simple, elegant, and correct use, rather than consistency with older STL rules crafted for fixed-sized elements.
> Strongly agreed.
>> We hope you find these brief comments constructive.
> Yes, thank you!
> Tom.
>> DG
>> _______________________________________________
>> Direction mailing list
>> Direction_at_[hidden] <mailto:Direction_at_[hidden]>
>> Subscription:http://lists.isocpp.org/mailman/listinfo.cgi/direction
>> Searchable archives:http://lists.isocpp.org/direction/2019/01/index.php
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden] <mailto:Unicode_at_[hidden]>
> http://www.open-std.org/mailman/listinfo/unicode
> --
> Be seeing you,
> Tony

Received on 2019-01-12 05:56:48