Date: Mon, 23 Jul 2018 22:39:47 -0400
SG16 is seeking input from Swift and WebKit representatives to help
inform our work towards enhancing support for Unicode in the C++
standard. In particular, we recognize the significant amount of effort
that went into the design of the Swift String type and would like to
better understand the motivations that contributed to its current design
and any pressures that might encourage further evolution or refinement;
especially for any concerns that would be deemed significant enough to
warrant backward incompatible changes.
Though most of these questions specifically mention Swift, that is an
artifact of our being more familiar with Swift than the internal
workings of WebKit. Many of these questions would be applicable to any
string type designed to support Unicode. We are therefore also
interested in hearing about the string types used by WebKit, the
motivations that guided their design, and the trade offs that have been
made. Of particular interest would be the results of design decisions
that are contrast with the design of Swift's String type.
Thank you in advance for any time and expertise you are willing and able
to share with us.
1. The Swift string manifesto is about 1 1/2 years old. What have you
learned since writing it? What would you change? What have you
changed?
2. Swift strings are extended grapheme cluster (EGC) based. What have
been the best and worst consequences of this choice?
3. When porting code unit or code point based code to Swift strings
(e.g., when rewriting Objective-C code, or rewriting Swift code to
use String instead of NSString), has profiling revealed performance
regressions due to the switch to EGC based processing? If so, what
action was taken to correct it?
4. Swift strings do not enforce storage in any particular Unicode
normalization form. Was consideration given to forcing storage in a
particular form such as FCC or NFC?
5. Swift strings support comparison via normalization. Has use of
canonical string equality been a performance issue? Or been a
source of surprise to programmers?
6. Swift strings are not locale sensitive. Was any consideration given
to creation of a distinct locale sensitive string type?
7. Swift strings provide a count property as required to satisfy the
Collection protocol. How often do programmers use count (the number
of EGCs in the string) inappropriately?
8. Swift strings support several memory unsafe initializers and
methods. How frequently are these used incorrectly?
9. The Swift manifesto discussed three approaches to handling
substrings and Swift 4 changed from "same type, shared storage" to
"different type, shared storage". Any regrets?
10. How often do you find programmers doing work at the EGC level that
would be better performed at the code unit or code point level?
11. Likewise, how often do you find programmers working with
unicodeScalars, utf8, or utf16 views to do work better performed at
the EGC level? For what reasons does this occur? Perhaps to work
around differences in EGC boundaries across Unicode versions or the
underlying version of ICU in use?
12. Has consideration been given to exposing Unicode character database
properties? CharacterSet exposes some of these properties, but have
more been requested?
13. How firmly is the Swift string implementation tied to ICU? If the
C++ standard library were to add suitable Unicode support, what
would motivate reimplementing Swift strings on top of it?
14. Do Swift programmers tend to prefer string interpolation or string
formatting functions?
15. What enhancements would you most like to see in C++ to improve
Unicode support?
These questions were culled from various internal SG16 discussions.
Special thanks to JeanHeyd Meneide, Mark Zeren, and Thiago Macieira for
their contributions to crafting this list.
Tom.
inform our work towards enhancing support for Unicode in the C++
standard. In particular, we recognize the significant amount of effort
that went into the design of the Swift String type and would like to
better understand the motivations that contributed to its current design
and any pressures that might encourage further evolution or refinement;
especially for any concerns that would be deemed significant enough to
warrant backward incompatible changes.
Though most of these questions specifically mention Swift, that is an
artifact of our being more familiar with Swift than the internal
workings of WebKit. Many of these questions would be applicable to any
string type designed to support Unicode. We are therefore also
interested in hearing about the string types used by WebKit, the
motivations that guided their design, and the trade offs that have been
made. Of particular interest would be the results of design decisions
that are contrast with the design of Swift's String type.
Thank you in advance for any time and expertise you are willing and able
to share with us.
1. The Swift string manifesto is about 1 1/2 years old. What have you
learned since writing it? What would you change? What have you
changed?
2. Swift strings are extended grapheme cluster (EGC) based. What have
been the best and worst consequences of this choice?
3. When porting code unit or code point based code to Swift strings
(e.g., when rewriting Objective-C code, or rewriting Swift code to
use String instead of NSString), has profiling revealed performance
regressions due to the switch to EGC based processing? If so, what
action was taken to correct it?
4. Swift strings do not enforce storage in any particular Unicode
normalization form. Was consideration given to forcing storage in a
particular form such as FCC or NFC?
5. Swift strings support comparison via normalization. Has use of
canonical string equality been a performance issue? Or been a
source of surprise to programmers?
6. Swift strings are not locale sensitive. Was any consideration given
to creation of a distinct locale sensitive string type?
7. Swift strings provide a count property as required to satisfy the
Collection protocol. How often do programmers use count (the number
of EGCs in the string) inappropriately?
8. Swift strings support several memory unsafe initializers and
methods. How frequently are these used incorrectly?
9. The Swift manifesto discussed three approaches to handling
substrings and Swift 4 changed from "same type, shared storage" to
"different type, shared storage". Any regrets?
10. How often do you find programmers doing work at the EGC level that
would be better performed at the code unit or code point level?
11. Likewise, how often do you find programmers working with
unicodeScalars, utf8, or utf16 views to do work better performed at
the EGC level? For what reasons does this occur? Perhaps to work
around differences in EGC boundaries across Unicode versions or the
underlying version of ICU in use?
12. Has consideration been given to exposing Unicode character database
properties? CharacterSet exposes some of these properties, but have
more been requested?
13. How firmly is the Swift string implementation tied to ICU? If the
C++ standard library were to add suitable Unicode support, what
would motivate reimplementing Swift strings on top of it?
14. Do Swift programmers tend to prefer string interpolation or string
formatting functions?
15. What enhancements would you most like to see in C++ to improve
Unicode support?
These questions were culled from various internal SG16 discussions.
Special thanks to JeanHeyd Meneide, Mark Zeren, and Thiago Macieira for
their contributions to crafting this list.
Tom.
Received on 2018-07-24 04:46:43