sg16: [SG16-Unicode] SG16 Unicode related questions for Swift and WebKit representatives

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 23 Jul 2018 22:39:47 -0400

SG16 is seeking input from Swift and WebKit representatives to help
inform our work towards enhancing support for Unicode in the C++
standard. In particular, we recognize the significant amount of effort
that went into the design of the Swift String type and would like to
better understand the motivations that contributed to its current design
and any pressures that might encourage further evolution or refinement;
especially for any concerns that would be deemed significant enough to
warrant backward incompatible changes.

Though most of these questions specifically mention Swift, that is an
artifact of our being more familiar with Swift than the internal
workings of WebKit. Many of these questions would be applicable to any
string type designed to support Unicode. We are therefore also
interested in hearing about the string types used by WebKit, the
motivations that guided their design, and the trade offs that have been
made. Of particular interest would be the results of design decisions
that are contrast with the design of Swift's String type.

Thank you in advance for any time and expertise you are willing and able
to share with us.

1. The Swift string manifesto is about 1 1/2 years old. What have you
    learned since writing it? What would you change? What have you
    changed?
2. Swift strings are extended grapheme cluster (EGC) based. What have
    been the best and worst consequences of this choice?
3. When porting code unit or code point based code to Swift strings
    (e.g., when rewriting Objective-C code, or rewriting Swift code to
    use String instead of NSString), has profiling revealed performance
    regressions due to the switch to EGC based processing? If so, what
    action was taken to correct it?
4. Swift strings do not enforce storage in any particular Unicode
    normalization form. Was consideration given to forcing storage in a
    particular form such as FCC or NFC?
5. Swift strings support comparison via normalization. Has use of
    canonical string equality been a performance issue? Or been a
    source of surprise to programmers?
6. Swift strings are not locale sensitive. Was any consideration given
    to creation of a distinct locale sensitive string type?
7. Swift strings provide a count property as required to satisfy the
    Collection protocol. How often do programmers use count (the number
    of EGCs in the string) inappropriately?
8. Swift strings support several memory unsafe initializers and
    methods. How frequently are these used incorrectly?
9. The Swift manifesto discussed three approaches to handling
    substrings and Swift 4 changed from "same type, shared storage" to
    "different type, shared storage". Any regrets?
10. How often do you find programmers doing work at the EGC level that
    would be better performed at the code unit or code point level?
11. Likewise, how often do you find programmers working with
    unicodeScalars, utf8, or utf16 views to do work better performed at
    the EGC level? For what reasons does this occur? Perhaps to work
    around differences in EGC boundaries across Unicode versions or the
    underlying version of ICU in use?
12. Has consideration been given to exposing Unicode character database
    properties? CharacterSet exposes some of these properties, but have
    more been requested?
13. How firmly is the Swift string implementation tied to ICU? If the
    C++ standard library were to add suitable Unicode support, what
    would motivate reimplementing Swift strings on top of it?
14. Do Swift programmers tend to prefer string interpolation or string
    formatting functions?
15. What enhancements would you most like to see in C++ to improve
    Unicode support?

These questions were culled from various internal SG16 discussions.
Special thanks to JeanHeyd Meneide, Mark Zeren, and Thiago Macieira for
their contributions to crafting this list.

Tom.

Received on 2018-07-24 04:46:43