Date: Thu, 6 Feb 2020 10:27:47 -0500
SG16 has a number of papers and designs in flight targeting C++23 and
I've been thinking about ways that we can:
1. Demonstrate, in practical terms, how some of the parts fit together.
2. Identify gaps that will need to be filled.
One idea I had was to create a textconv utility similar to iconv and
ICU's uconv implemented using only features in the standard or that we
are targeting for inclusion in the standard. Here is what the command
line usage might look like:
Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>]
[-x <transformation>] [-n <normalization>] [-l <locale>] [-o <file>]
[file ...]
-h, --help Display help
--list Display supported encodings
-f <encoding> Specify the encoding of the source text
-t <encoding> Specify the encoding of the target text
-x <transformation> Apply a transformation to the text
caseconvert: Perform locale sensitive case
conversions
transliterate: Substitute transliterations
where applicable
e.g., substitute "ae" for "æ".
-n <normalization> For UTF encodings, apply a normalization form
nfc: Canonical composition
nfkc: Compatibility composition
nfd: Canonical decomposition
nfkd: Compatibility decomposition
fcc: Fast C Contiguous
-l <locale> Specify a locale to use for locale sensitive
transformations
-o <file> Specify an output file (stdout bu default)
Encoding conversions would exercise P1629 (Standard Text Encoding)
<https://wg21.link/p1629>.
Selection of encoding would exercise P1885 (Naming Text Encodings to
Demystify Them) <https://wg21.link/p1885>. The default source and
target encodings would depend on environment settings (e.g., POSIX
locale settings or the Windows ACP).
Case conversion could exercise properties exposed via P1628 (Unicode
character properties) <https://wg21.link/p1628>.
Locale support would exercise items under discussion for P2020 (Locales,
Encodings and Unicode) <https://wg21.link/p2020>. The implementation of
the suggested utility must *not* call setlocale(..., ""). but instead
determine the locale from the environment (e.g., POSIX locale settings);
this implies not using existing standard interfaces that are dependent
on the global locale object.
We don't have papers yet that would address normalization,
transliteration, case conversion algorithms, and locale names and
properties.
Thoughts? Anyone interested in working on this?
Tom.
I've been thinking about ways that we can:
1. Demonstrate, in practical terms, how some of the parts fit together.
2. Identify gaps that will need to be filled.
One idea I had was to create a textconv utility similar to iconv and
ICU's uconv implemented using only features in the standard or that we
are targeting for inclusion in the standard. Here is what the command
line usage might look like:
Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>]
[-x <transformation>] [-n <normalization>] [-l <locale>] [-o <file>]
[file ...]
-h, --help Display help
--list Display supported encodings
-f <encoding> Specify the encoding of the source text
-t <encoding> Specify the encoding of the target text
-x <transformation> Apply a transformation to the text
caseconvert: Perform locale sensitive case
conversions
transliterate: Substitute transliterations
where applicable
e.g., substitute "ae" for "æ".
-n <normalization> For UTF encodings, apply a normalization form
nfc: Canonical composition
nfkc: Compatibility composition
nfd: Canonical decomposition
nfkd: Compatibility decomposition
fcc: Fast C Contiguous
-l <locale> Specify a locale to use for locale sensitive
transformations
-o <file> Specify an output file (stdout bu default)
Encoding conversions would exercise P1629 (Standard Text Encoding)
<https://wg21.link/p1629>.
Selection of encoding would exercise P1885 (Naming Text Encodings to
Demystify Them) <https://wg21.link/p1885>. The default source and
target encodings would depend on environment settings (e.g., POSIX
locale settings or the Windows ACP).
Case conversion could exercise properties exposed via P1628 (Unicode
character properties) <https://wg21.link/p1628>.
Locale support would exercise items under discussion for P2020 (Locales,
Encodings and Unicode) <https://wg21.link/p2020>. The implementation of
the suggested utility must *not* call setlocale(..., ""). but instead
determine the locale from the environment (e.g., POSIX locale settings);
this implies not using existing standard interfaces that are dependent
on the global locale object.
We don't have papers yet that would address normalization,
transliteration, case conversion algorithms, and locale names and
properties.
Thoughts? Anyone interested in working on this?
Tom.
Received on 2020-02-06 09:30:25