SG16 has a number of papers and designs in flight targeting C++23 and I've been thinking about ways that we can:

  1. Demonstrate, in practical terms, how some of the parts fit together.
  2. Identify gaps that will need to be filled.

One idea I had was to create a textconv utility similar to iconv and ICU's uconv implemented using only features in the standard or that we are targeting for inclusion in the standard.  Here is what the command line usage might look like:

Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>] [-x <transformation>] [-n <normalization>] [-l <locale>] [-o <file>] [file ...]
-h, --help          Display help
--list              Display supported encodings
-f <encoding>       Specify the encoding of the source text
-t <encoding>       Specify the encoding of the target text
-x <transformation> Apply a transformation to the text
                      caseconvert:   Perform locale sensitive case conversions
                      transliterate: Substitute transliterations where applicable
                                     e.g., substitute "ae" for "æ".
-n <normalization>  For UTF encodings, apply a normalization form
                      nfc: 
Canonical composition
                      nfkc: Compatibility composition
                      nfd:  Canonical decomposition
                      nfkd: Compatibility decomposition
                      fcc:  Fast C Contiguous
-l <locale>         Specify a locale to use for locale sensitive transformations
-o <file>           Specify an output file (stdout bu default)

Encoding conversions would exercise P1629 (Standard Text Encoding).

Selection of encoding would exercise P1885 (Naming Text Encodings to Demystify Them).  The default source and target encodings would depend on environment settings (e.g., POSIX locale settings or the Windows ACP).

Case conversion could exercise properties exposed via P1628 (Unicode character properties).

Locale support would exercise items under discussion for P2020 (Locales, Encodings and Unicode).  The implementation of the suggested utility must not call setlocale(..., ""). but instead determine the locale from the environment (e.g., POSIX locale settings); this implies not using existing standard interfaces that are dependent on the global locale object.

We don't have papers yet that would address normalization, transliteration, case conversion algorithms, and locale names and properties.

Thoughts?  Anyone interested in working on this?

Tom.