C++ Logo

sg16

Advanced search

Re: Title casing & word breaking

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Fri, 24 Feb 2023 20:48:52 +0100
On 24/02/2023 20.16, Fraser Gordon via SG16 wrote:
> IIRC, the TR29 word boundaries rules are provided by the "title" break iterator in ICU4C and while the "word" break iterator implements language-specific dictionaries to better match user expectations of what a word is. This means that if/when we standardise word breaking, we will need to be clear which meaning is intended. My personal opinion would be to follow the ICU naming to reduce user & implementer confusion.

I'm wondering whether algorithms requiring more than a Unicode character
properties database should be out-of-scope for any near-future C++ endeavor.
This includes locale-specific tailoring and also algorithms based on language
dictionaries.

Once near-future is past, I think I'd like to see a paper
comparing the ICU interfaces with hypothetical ergonomic
C++ ones. Maybe the returns of a C++ interface are diminishing
rapidly here.

Jens

Received on 2023-02-24 19:48:57