Date: Tue, 2 Oct 2018 23:27:53 -0400
Study Group 16 std::text Technical Direction
Table of Contents
- 1. Abstract <#org8834a1a>
- 2. Design Space and current decisions around std::text <#org666258c>
- 2.1. Areas of broad agreement <#orga44a941>
- 2.2. Areas of Discussion <#org5ac55b1>
- 3. Near Term Plans <#org8f98040>
- 4. 2019 <#org931780b>
1 Abstract
SG16 intends to produce a proposal for a vocabulary type to handle Unicode
text and associated Unicode algorithms, tentatively named std::text for C++
23. No significant text handling facility is targeted for C++ 20.
2 Design Space and current decisions around std::text
2.1 Areas of broad agreement
- There needs to be a type that which maintains the invariants of well
formed Unicode allowing for text manipulation, an analogue of std::string
.
- There needs to be an associated view type, std::text_view, an analogue
of std::string_view
- The type char8_t, although useful for distinguishing Unicode string
literals, does not guarantee well-formed UTF-8.
- The type std::text will not have the fat interface std::string does.
- Execution character encoding and compile time character encoding are
not changing, however std::text will be independent of that.
- Current locale support is insufficient for implementing Unicode
algorithms, such as tailoring, and probably not worth attempting to extend
to support Unicode algorithm needs.
- The default view of text as a sequence is NOT code units.
- The type std::text will support allocators.
- It will not be incorrect to use std::text instead of std::string,
however there may be performance penalties.
- The code unit sequence for std::text can be null terminated cheaply,
and this may be useful for OS APIs.
- UTF-8 is a safe choice for transporting Unicode across naive C and C++.
2.2 Areas of Discussion
- Is the internal representation of std::text a type parameter. That is,
is it configurable for UTF-8, 16, 32, LE and BE, or, is there a single
internal encoding which is an implementation detail.
- Is text kept in normalized form, or is normalization done on demand.
- Is there a default sequence view, and if so, code points or grapheme
clusters. Or are each to be requested explicitly.
- Does std::text_view meet the requirements to be a view, as many
operations may not be O(1) or even amortized O(1), although they may be
asymptotic O(1).
- Does std::text implement operator<=>(), only operator==, or are
comparisons only by specific named algorithms. There are trade-offs with
usability vs surprising run-time costs.
3 Near Term Plans
Zach Laine is near code complete on his text implementation and plans to
submit it for Boost review soon.
Continue syndicating the idea that text is more complicated than
programmers generally believe
4 2019
Engage with LEWG and LWG for a paper to land directly into the DIS post C++
2a. SG16 is concerned about having the bandwidth to do more than one core
wording paper. If there is a good publicly available reasonably licensed
implementation, the value of having a std::experimental::text seems low.
-----
I believe this accurately captures the current consensus of the group. I
would like, with permission of the group, to send something very much like
this in to the mailing, in order to appraise the community where we are,
and in particular that C++20 std::text isn't happening.
Formatting is awful, paper is short enough that the TOC isn't needed, other
comments and criticism welcome.
Table of Contents
- 1. Abstract <#org8834a1a>
- 2. Design Space and current decisions around std::text <#org666258c>
- 2.1. Areas of broad agreement <#orga44a941>
- 2.2. Areas of Discussion <#org5ac55b1>
- 3. Near Term Plans <#org8f98040>
- 4. 2019 <#org931780b>
1 Abstract
SG16 intends to produce a proposal for a vocabulary type to handle Unicode
text and associated Unicode algorithms, tentatively named std::text for C++
23. No significant text handling facility is targeted for C++ 20.
2 Design Space and current decisions around std::text
2.1 Areas of broad agreement
- There needs to be a type that which maintains the invariants of well
formed Unicode allowing for text manipulation, an analogue of std::string
.
- There needs to be an associated view type, std::text_view, an analogue
of std::string_view
- The type char8_t, although useful for distinguishing Unicode string
literals, does not guarantee well-formed UTF-8.
- The type std::text will not have the fat interface std::string does.
- Execution character encoding and compile time character encoding are
not changing, however std::text will be independent of that.
- Current locale support is insufficient for implementing Unicode
algorithms, such as tailoring, and probably not worth attempting to extend
to support Unicode algorithm needs.
- The default view of text as a sequence is NOT code units.
- The type std::text will support allocators.
- It will not be incorrect to use std::text instead of std::string,
however there may be performance penalties.
- The code unit sequence for std::text can be null terminated cheaply,
and this may be useful for OS APIs.
- UTF-8 is a safe choice for transporting Unicode across naive C and C++.
2.2 Areas of Discussion
- Is the internal representation of std::text a type parameter. That is,
is it configurable for UTF-8, 16, 32, LE and BE, or, is there a single
internal encoding which is an implementation detail.
- Is text kept in normalized form, or is normalization done on demand.
- Is there a default sequence view, and if so, code points or grapheme
clusters. Or are each to be requested explicitly.
- Does std::text_view meet the requirements to be a view, as many
operations may not be O(1) or even amortized O(1), although they may be
asymptotic O(1).
- Does std::text implement operator<=>(), only operator==, or are
comparisons only by specific named algorithms. There are trade-offs with
usability vs surprising run-time costs.
3 Near Term Plans
Zach Laine is near code complete on his text implementation and plans to
submit it for Boost review soon.
Continue syndicating the idea that text is more complicated than
programmers generally believe
4 2019
Engage with LEWG and LWG for a paper to land directly into the DIS post C++
2a. SG16 is concerned about having the bandwidth to do more than one core
wording paper. If there is a good publicly available reasonably licensed
implementation, the value of having a std::experimental::text seems low.
-----
I believe this accurately captures the current consensus of the group. I
would like, with permission of the group, to send something very much like
this in to the mailing, in order to appraise the community where we are,
and in particular that C++20 std::text isn't happening.
Formatting is awful, paper is short enough that the TOC isn't needed, other
comments and criticism welcome.
Received on 2018-10-03 05:28:07