C++ Logo

sg16

Advanced search

Re: [SG16] Proposal for showcasing SG16 work in progress

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 6 Feb 2020 16:44:46 +0100
I think we are quite far away from being to pull that off.
It could probably achieved with boost text.
Note that in particular the Unicode database is a very small part of the
casing equation.
We may be years from having a solution to locale (no one is actually
working on it).

I do however think that api design should be usage and experience driven so
it might be useful as a sandbox/proving ground more than as a way to show
case what we already have.



On Thu, Feb 6, 2020, 16:27 Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:

> SG16 has a number of papers and designs in flight targeting C++23 and I've
> been thinking about ways that we can:
>
> 1. Demonstrate, in practical terms, how some of the parts fit together.
> 2. Identify gaps that will need to be filled.
>
> One idea I had was to create a textconv utility similar to iconv and ICU's
> uconv implemented using only features in the standard or that we are
> targeting for inclusion in the standard. Here is what the command line
> usage might look like:
>
> Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t <encoding>] [-x
> <transformation>] [-n <normalization>] [-l <locale>] [-o <file>] [file ...]
> -h, --help Display help
> --list Display supported encodings
> -f <encoding> Specify the encoding of the source text
> -t <encoding> Specify the encoding of the target text
> -x <transformation> Apply a transformation to the text
> caseconvert: Perform locale sensitive case
> conversions
> transliterate: Substitute transliterations where
> applicable
> e.g., substitute "ae" for "æ".
> -n <normalization> For UTF encodings, apply a normalization form
> nfc: Canonical composition
> nfkc: Compatibility composition
> nfd: Canonical decomposition
> nfkd: Compatibility decomposition
> fcc: Fast C Contiguous
> -l <locale> Specify a locale to use for locale sensitive
> transformations
> -o <file> Specify an output file (stdout bu default)
>
> Encoding conversions would exercise P1629 (Standard Text Encoding)
> <https://wg21.link/p1629>.
>
> Selection of encoding would exercise P1885 (Naming Text Encodings to
> Demystify Them) <https://wg21.link/p1885>. The default source and target
> encodings would depend on environment settings (e.g., POSIX locale settings
> or the Windows ACP).
>
> Case conversion could exercise properties exposed via P1628 (Unicode
> character properties) <https://wg21.link/p1628>.
>
> Locale support would exercise items under discussion for P2020 (Locales,
> Encodings and Unicode) <https://wg21.link/p2020>. The implementation of
> the suggested utility must *not* call setlocale(..., ""). but instead
> determine the locale from the environment (e.g., POSIX locale settings);
> this implies not using existing standard interfaces that are dependent on
> the global locale object.
>
> We don't have papers yet that would address normalization,
> transliteration, case conversion algorithms, and locale names and
> properties.
>
> Thoughts? Anyone interested in working on this?
>
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-02-06 09:47:35