On 3/17/21 4:18 PM, Corentin Jabot via SG16 wrote:


On Wed, Mar 17, 2021, 21:09 Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
On 3/17/21 10:45 AM, Steve Downey via SG16 wrote:


On Wed, Mar 17, 2021 at 9:22 AM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
On 3/17/21 5:23 AM, Corentin Jabot wrote:


On Tue, Mar 16, 2021 at 3:59 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
  • P1628: Unicode character properties
As the author I do not expect to do further work on this in the 23 cycle
That matches my expectations, thanks for confirming.
 
Not that we have a paper, or that I'm asking for anyone to do work, but without access to the UCD, all the other algorithms are off the table. In theory it's an implementation detail, in practice it would be an ABI break to change the APIs for property lookup, and having two or more mechanisms in a std library implementation seems like a bad idea. 
Expectation setting is important. It's not clear to everyone how the Unicode layers work, even if the explanations are straight-forward.

We don't have proposals for any algorithms yet.  I would feel some concern about trying to standardize access to the UCD without algorithms to go with it; at least reference implementations showing what is possible.

Perhaps something that might be helpful is an audit of the algorithms and which UCD properties they each depend on.  That might help to prioritize which UCD properties to expose first, which ones we can guarantee stability for, etc...


Properties and algorithms are related in that some algorithms depend on some properties, but that doesn't inform the design of the properties paper.
At all.
I only mentioned algorithms as a means for determining priority.  The paper details choices made in terms of which properties are exposed and how they are named.  It will likely require a fair amount of discussion to get consensus on those choices.  We may be able to make progress more quickly by prioritizing subsets of properties.

For example, the XID_ properties are useful to write a compiler independently of Unicode algorithms.
Ignoring diatrics from search is something browsers and other applications implement by looking at properties without it being a Unicode defined mechanism.
Etc. 
I can definitely offer more use cases in the paper.

Some algorithms (notably regexes) may need the data presented in slightly different form.

We should have Unicode properties if we believe Unicode algorithms do not cover the set of use cases for Unicode, not because we want to implement Unicode algorithms

I think both perspectives make sense.  In general it is useful to provide the underlying capabilities to implement something outside the standard even if it is offered by the standard.

Tom.