SG16 has a number of papers and designs in flight targeting C++23 and I've been thinking about ways that we can:

Demonstrate, in practical terms, how some of the parts fit together.
Identify gaps that will need to be filled.

One idea I had was to create a textconv utility similar to iconv and ICU's uconv implemented using only features in the standard or that we are targeting for inclusion in the standard. Here is what the command line usage might look like:

Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>] [-x <transformation>] [-n <normalization>] [-l <locale>] [-o <file>] [file ...] -h, --help Display help --list Display supported encodings -f <encoding> Specify the encoding of the source text -t <encoding> Specify the encoding of the target text -x <transformation> Apply a transformation to the text caseconvert: Perform locale sensitive case conversions transliterate: Substitute transliterations where applicable e.g., substitute "ae" for "æ". -n <normalization> For UTF encodings, apply a normalization form nfc: Canonical composition nfkc: Compatibility composition nfd: Canonical decomposition nfkd: Compatibility decomposition fcc: Fast C Contiguous -l <locale> Specify a locale to use for locale sensitive transformations -o <file> Specify an output file (stdout bu default)

Encoding conversions would exercise P1629 (Standard Text Encoding).

Selection of encoding would exercise P1885 (Naming Text Encodings to Demystify Them). The default source and target encodings would depend on environment settings (e.g., POSIX locale settings or the Windows ACP).

Case conversion could exercise properties exposed via P1628 (Unicode character properties).

Locale support would exercise items under discussion for P2020 (Locales, Encodings and Unicode). The implementation of the suggested utility must not call setlocale(..., ""). but instead determine the locale from the environment (e.g., POSIX locale settings); this implies not using existing standard interfaces that are dependent on the global locale object.

We don't have papers yet that would address normalization, transliteration, case conversion algorithms, and locale names and properties.

Thoughts? Anyone interested in working on this?

Tom.