sg16: [SG16] Proposal for showcasing SG16 work in progress

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 6 Feb 2020 10:27:47 -0500

SG16 has a number of papers and designs in flight targeting C++23 and
I've been thinking about ways that we can:

1. Demonstrate, in practical terms, how some of the parts fit together.
2. Identify gaps that will need to be filled.

One idea I had was to create a textconv utility similar to iconv and
ICU's uconv implemented using only features in the standard or that we
are targeting for inclusion in the standard. Here is what the command
line usage might look like:

    Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>]
    [-x <transformation>] [-n <normalization>] [-l <locale>] [-o <file>]
    [file ...]
    -h, --help Display help
    --list Display supported encodings
    -f <encoding> Specify the encoding of the source text
    -t <encoding> Specify the encoding of the target text
    -x <transformation> Apply a transformation to the text
                           caseconvert: Perform locale sensitive case
    conversions
                           transliterate: Substitute transliterations
    where applicable
                                          e.g., substitute "ae" for "æ".
    -n <normalization> For UTF encodings, apply a normalization form
                           nfc: Canonical composition
                           nfkc: Compatibility composition
                           nfd: Canonical decomposition
                           nfkd: Compatibility decomposition
                           fcc: Fast C Contiguous
    -l <locale> Specify a locale to use for locale sensitive
    transformations
    -o <file> Specify an output file (stdout bu default)

Encoding conversions would exercise P1629 (Standard Text Encoding)
<https://wg21.link/p1629>.

Selection of encoding would exercise P1885 (Naming Text Encodings to
Demystify Them) <https://wg21.link/p1885>. The default source and
target encodings would depend on environment settings (e.g., POSIX
locale settings or the Windows ACP).

Case conversion could exercise properties exposed via P1628 (Unicode
character properties) <https://wg21.link/p1628>.

Locale support would exercise items under discussion for P2020 (Locales,
Encodings and Unicode) <https://wg21.link/p2020>. The implementation of
the suggested utility must *not* call setlocale(..., ""). but instead
determine the locale from the environment (e.g., POSIX locale settings);
this implies not using existing standard interfaces that are dependent
on the global locale object.

We don't have papers yet that would address normalization,
transliteration, case conversion algorithms, and locale names and
properties.

Thoughts? Anyone interested in working on this?

Tom.

Received on 2020-02-06 09:30:25