I think so too, but would this be a useful exercise? I see it as one way to gauge progress. Failing tests and "not yet implemented" errors would give an indication of what work remains.I think we are quite far away from being to pull that off.
Yes, understood. I tried to call that out by indicating we don't yet have a paper for case conversion algorithms.It could probably achieved with boost text.Note that in particular the Unicode database is a very small part of the casing equation.
Agreed.We may be years from having a solution to locale (no one is actually working on it).
I do however think that api design should be usage and experience driven so it might be useful as a sandbox/proving ground more than as a way to show case what we already have.
Good. The sandbox/proving ground was part of what I intended.
Another part is that this provides an easy to explain direction
for (some of) what we intend to enable within the standard
library.
Tom.
On Thu, Feb 6, 2020, 16:27 Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
--SG16 has a number of papers and designs in flight targeting C++23 and I've been thinking about ways that we can:
- Demonstrate, in practical terms, how some of the parts fit together.
- Identify gaps that will need to be filled.
One idea I had was to create a textconv utility similar to iconv and ICU's uconv implemented using only features in the standard or that we are targeting for inclusion in the standard. Here is what the command line usage might look like:
Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>] [-x <transformation>] [-n <normalization>] [-l <locale>] [-o <file>] [file ...]
-h, --help Display help
--list Display supported encodings
-f <encoding> Specify the encoding of the source text
-t <encoding> Specify the encoding of the target text
-x <transformation> Apply a transformation to the text
caseconvert: Perform locale sensitive case conversions
transliterate: Substitute transliterations where applicable
e.g., substitute "ae" for "æ".
-n <normalization> For UTF encodings, apply a normalization form
nfc: Canonical composition
nfkc: Compatibility composition
nfd: Canonical decomposition
nfkd: Compatibility decomposition
fcc: Fast C Contiguous
-l <locale> Specify a locale to use for locale sensitive transformations
-o <file> Specify an output file (stdout bu default)
Encoding conversions would exercise P1629 (Standard Text Encoding).
Selection of encoding would exercise P1885 (Naming Text Encodings to Demystify Them). The default source and target encodings would depend on environment settings (e.g., POSIX locale settings or the Windows ACP).
Case conversion could exercise properties exposed via P1628 (Unicode character properties).
Locale support would exercise items under discussion for P2020 (Locales, Encodings and Unicode). The implementation of the suggested utility must not call setlocale(..., ""). but instead determine the locale from the environment (e.g., POSIX locale settings); this implies not using existing standard interfaces that are dependent on the global locale object.
We don't have papers yet that would address normalization, transliteration, case conversion algorithms, and locale names and properties.
Thoughts? Anyone interested in working on this?
Tom.
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16