Date: Sat, 9 Jul 2022 18:25:29 +0200
Hey Tom.
If you are *really* bored, I guess you could take a gander at
https://isocpp.org/files/papers/P2620R0.pdf
https://isocpp.org/files/papers/P2621R0.pdf
I didn't tag them SG-16 because... well, they are trying to tie up loose
ends in lexing. Not very unicode-y.
I do have some plan to bring forth a paper sometime before the end of the
year for a number of (non-tailored) unicode algorithms (grapheme
clusterization, word segmentation, casing, normalization in the initial
work, once we agree on the shape of this things we can add more depending
on needs and resources) and related facilities, and more generally how to
design these things.
I would however not feel comfortable to present anything not backed by
implementation and it is, as you can imagine, a lot of work.
But the good news is, I do have a design in my head and I think it will
work, and be realistically standardizable - if and only if I get a lot of
help with the specification, once we agree on direction.
I hope you'll like it :)
There are things I do consider out of scope of my current effort, but that
needs to get done if we want to land on a good story in the end.
* uN support in fmt - this is a critical bit of infrastructure but it has
some implementation and specification challenges, namely it's not
implementable on platforms not supporting unicode in some form.
* An *explicit* cast char*<->char8_t*. This is outside of my area of
expertise but a zero-cost, contiguous utf view over char* known to be utf8
would be nice. without it the best we can do is random access and cheap but
not free.
* Generally improve the library support for u8. We won't see either love
and adoption until we can offer usability.
* I'm not touching transcoding with a 10 feet pole, I trust that this will
happen with JeanHeyd's work.
* Thinking about tailoring, unicode/cldr locales, localized numbers and
dates formatting. Personally I think this is out of scope for 26 and I'm
kinda hoping libicu 4x matures.
But there are questions worth asking. Namely can C++ mandate a dependency
on a lib like icu/icu4x, share a common implementation for all vendors, or
are we not concerned about the implementation burden of that? Because I am.
I think a locale object compliant with unicode is necessary, but if
implementers can't take dependencies, an implementation of the cldr
seems... asking too much.
I'm already not really comfortable with the implementation burden for non
tailored things (and at the same time unwilling to make the design amenable
to ICU).
* If you folks think text processing would need some rope-like structure,
I'm not sure it's text related but... I guess you could talk about that!
* Some unicode algorithms are unbounded, and may require allocation. A
small_vector would help specification, but again that doesn't seem very in
scope of this group.
* We need *some* unicode properties, I'm not sure which to be honest. My
current intent is to only provide things that would be generally useful
outside of the algorithms that are otherwise provided.
You could look at this swift paper
https://github.com/apple/swift-evolution/blob/main/proposals/0211-unicode-scalar-properties.md
Other random things in my mind
* char_traits specialization depreciation?
* What the heck do we do with ctypes and std::locale?
* can we mandate, in the meantime, that on UTF-8 platforms the locale at
the start of main is "C.UTF-8"?
* what the heck do we do with regex?
* Is there interest in pursuing the work of trying to get rid of the term
character in the core language to designate code points and code units or
does that only annoy me? Can we refer to Unicode directly?
With some optimism, a bit of luck and a lot of tears we might get to have
some usable unicode in 26?
I hope this inspires you.
Corentin.
On Sat, Jul 9, 2022 at 5:41 PM Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:
> There are currently no papers in a ready state
> <https://github.com/cplusplus/papers/issues?q=is%3Aissue+is%3Aopen+label%3Asg16+-label%3Aneeds-revision>
> for SG16 review, nor do I have other topics in a state ready for discussion
> at this time. The SG16 telecon scheduled for Wednesday, July 13th, at 19:30
> UTC (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220622T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>)
> is therefore canceled.
>
> However, I am still going to hold the Zoom session with the intent that
> anyone that has the time and inclination to attend spend that time working
> on new SG16 related papers or discussing SG16 related topics or concerns;
> the goal being to spur generation of new papers. Attendance and meeting
> minutes will not be taken.
>
> Inspiration for new papers to initiate may be obtained by reviewing the SG16
> open issues list <https://github.com/sg16-unicode/sg16/issues> (some of
> which are due to be closed following the upcoming plenary; I need to
> review). If you are aware of other ideas that are not currently being
> worked on and are not captured in the issues list, please feel free to
> reply with details or submit new issues.
>
> Tom.
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
If you are *really* bored, I guess you could take a gander at
https://isocpp.org/files/papers/P2620R0.pdf
https://isocpp.org/files/papers/P2621R0.pdf
I didn't tag them SG-16 because... well, they are trying to tie up loose
ends in lexing. Not very unicode-y.
I do have some plan to bring forth a paper sometime before the end of the
year for a number of (non-tailored) unicode algorithms (grapheme
clusterization, word segmentation, casing, normalization in the initial
work, once we agree on the shape of this things we can add more depending
on needs and resources) and related facilities, and more generally how to
design these things.
I would however not feel comfortable to present anything not backed by
implementation and it is, as you can imagine, a lot of work.
But the good news is, I do have a design in my head and I think it will
work, and be realistically standardizable - if and only if I get a lot of
help with the specification, once we agree on direction.
I hope you'll like it :)
There are things I do consider out of scope of my current effort, but that
needs to get done if we want to land on a good story in the end.
* uN support in fmt - this is a critical bit of infrastructure but it has
some implementation and specification challenges, namely it's not
implementable on platforms not supporting unicode in some form.
* An *explicit* cast char*<->char8_t*. This is outside of my area of
expertise but a zero-cost, contiguous utf view over char* known to be utf8
would be nice. without it the best we can do is random access and cheap but
not free.
* Generally improve the library support for u8. We won't see either love
and adoption until we can offer usability.
* I'm not touching transcoding with a 10 feet pole, I trust that this will
happen with JeanHeyd's work.
* Thinking about tailoring, unicode/cldr locales, localized numbers and
dates formatting. Personally I think this is out of scope for 26 and I'm
kinda hoping libicu 4x matures.
But there are questions worth asking. Namely can C++ mandate a dependency
on a lib like icu/icu4x, share a common implementation for all vendors, or
are we not concerned about the implementation burden of that? Because I am.
I think a locale object compliant with unicode is necessary, but if
implementers can't take dependencies, an implementation of the cldr
seems... asking too much.
I'm already not really comfortable with the implementation burden for non
tailored things (and at the same time unwilling to make the design amenable
to ICU).
* If you folks think text processing would need some rope-like structure,
I'm not sure it's text related but... I guess you could talk about that!
* Some unicode algorithms are unbounded, and may require allocation. A
small_vector would help specification, but again that doesn't seem very in
scope of this group.
* We need *some* unicode properties, I'm not sure which to be honest. My
current intent is to only provide things that would be generally useful
outside of the algorithms that are otherwise provided.
You could look at this swift paper
https://github.com/apple/swift-evolution/blob/main/proposals/0211-unicode-scalar-properties.md
Other random things in my mind
* char_traits specialization depreciation?
* What the heck do we do with ctypes and std::locale?
* can we mandate, in the meantime, that on UTF-8 platforms the locale at
the start of main is "C.UTF-8"?
* what the heck do we do with regex?
* Is there interest in pursuing the work of trying to get rid of the term
character in the core language to designate code points and code units or
does that only annoy me? Can we refer to Unicode directly?
With some optimism, a bit of luck and a lot of tears we might get to have
some usable unicode in 26?
I hope this inspires you.
Corentin.
On Sat, Jul 9, 2022 at 5:41 PM Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:
> There are currently no papers in a ready state
> <https://github.com/cplusplus/papers/issues?q=is%3Aissue+is%3Aopen+label%3Asg16+-label%3Aneeds-revision>
> for SG16 review, nor do I have other topics in a state ready for discussion
> at this time. The SG16 telecon scheduled for Wednesday, July 13th, at 19:30
> UTC (timezone conversion
> <https://www.timeanddate.com/worldclock/converter.html?iso=20220622T193000&p1=1440&p2=tz_pdt&p3=tz_mdt&p4=tz_cdt&p5=tz_edt&p6=tz_cest>)
> is therefore canceled.
>
> However, I am still going to hold the Zoom session with the intent that
> anyone that has the time and inclination to attend spend that time working
> on new SG16 related papers or discussing SG16 related topics or concerns;
> the goal being to spur generation of new papers. Attendance and meeting
> minutes will not be taken.
>
> Inspiration for new papers to initiate may be obtained by reviewing the SG16
> open issues list <https://github.com/sg16-unicode/sg16/issues> (some of
> which are due to be closed following the upcoming plenary; I need to
> review). If you are aware of other ideas that are not currently being
> worked on and are not captured in the issues list, please feel free to
> reply with details or submit new issues.
>
> Tom.
>
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2022-07-09 16:25:42