Date: Thu, 6 Feb 2020 10:52:01 -0500
On 2/6/20 10:44 AM, Corentin Jabot wrote:
> I think we are quite far away from being to pull that off.
I think so too, but would this be a useful exercise? I see it as one
way to gauge progress. Failing tests and "not yet implemented" errors
would give an indication of what work remains.
> It could probably achieved with boost text.
> Note that in particular the Unicode database is a very small part of
> the casing equation.
Yes, understood. I tried to call that out by indicating we don't yet
have a paper for case conversion algorithms.
> We may be years from having a solution to locale (no one is actually
> working on it).
Agreed.
>
> I do however think that api design should be usage and experience
> driven so it might be useful as a sandbox/proving ground more than as
> a way to show case what we already have.
Good. The sandbox/proving ground was part of what I intended. Another
part is that this provides an easy to explain direction for (some of)
what we intend to enable within the standard library.
Tom.
>
>
>
> On Thu, Feb 6, 2020, 16:27 Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> SG16 has a number of papers and designs in flight targeting C++23
> and I've been thinking about ways that we can:
>
> 1. Demonstrate, in practical terms, how some of the parts fit
> together.
> 2. Identify gaps that will need to be filled.
>
> One idea I had was to create a textconv utility similar to iconv
> and ICU's uconv implemented using only features in the standard or
> that we are targeting for inclusion in the standard. Here is what
> the command line usage might look like:
>
> Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t
> <encoding>] [-x <transformation>] [-n <normalization>] [-l
> <locale>] [-o <file>] [file ...]
> -h, --help Display help
> --list Display supported encodings
> -f <encoding> Specify the encoding of the source text
> -t <encoding> Specify the encoding of the target text
> -x <transformation> Apply a transformation to the text
> caseconvert: Perform locale sensitive
> case conversions
> transliterate: Substitute
> transliterations where applicable
> e.g., substitute "ae" for
> "æ".
> -n <normalization> For UTF encodings, apply a normalization form
> nfc: Canonical composition
> nfkc: Compatibility composition
> nfd: Canonical decomposition
> nfkd: Compatibility decomposition
> fcc: Fast C Contiguous
> -l <locale> Specify a locale to use for locale
> sensitive transformations
> -o <file> Specify an output file (stdout bu default)
>
> Encoding conversions would exercise P1629 (Standard Text Encoding)
> <https://wg21.link/p1629>.
>
> Selection of encoding would exercise P1885 (Naming Text Encodings
> to Demystify Them) <https://wg21.link/p1885>. The default source
> and target encodings would depend on environment settings (e.g.,
> POSIX locale settings or the Windows ACP).
>
> Case conversion could exercise properties exposed via P1628
> (Unicode character properties) <https://wg21.link/p1628>.
>
> Locale support would exercise items under discussion for P2020
> (Locales, Encodings and Unicode) <https://wg21.link/p2020>. The
> implementation of the suggested utility must *not* call
> setlocale(..., ""). but instead determine the locale from the
> environment (e.g., POSIX locale settings); this implies not using
> existing standard interfaces that are dependent on the global
> locale object.
>
> We don't have papers yet that would address normalization,
> transliteration, case conversion algorithms, and locale names and
> properties.
>
> Thoughts? Anyone interested in working on this?
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> I think we are quite far away from being to pull that off.
I think so too, but would this be a useful exercise? I see it as one
way to gauge progress. Failing tests and "not yet implemented" errors
would give an indication of what work remains.
> It could probably achieved with boost text.
> Note that in particular the Unicode database is a very small part of
> the casing equation.
Yes, understood. I tried to call that out by indicating we don't yet
have a paper for case conversion algorithms.
> We may be years from having a solution to locale (no one is actually
> working on it).
Agreed.
>
> I do however think that api design should be usage and experience
> driven so it might be useful as a sandbox/proving ground more than as
> a way to show case what we already have.
Good. The sandbox/proving ground was part of what I intended. Another
part is that this provides an easy to explain direction for (some of)
what we intend to enable within the standard library.
Tom.
>
>
>
> On Thu, Feb 6, 2020, 16:27 Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> SG16 has a number of papers and designs in flight targeting C++23
> and I've been thinking about ways that we can:
>
> 1. Demonstrate, in practical terms, how some of the parts fit
> together.
> 2. Identify gaps that will need to be filled.
>
> One idea I had was to create a textconv utility similar to iconv
> and ICU's uconv implemented using only features in the standard or
> that we are targeting for inclusion in the standard. Here is what
> the command line usage might look like:
>
> Usage: textconv [-h|--help] [--list] [-f <encoding>] [-t
> <encoding>] [-x <transformation>] [-n <normalization>] [-l
> <locale>] [-o <file>] [file ...]
> -h, --help Display help
> --list Display supported encodings
> -f <encoding> Specify the encoding of the source text
> -t <encoding> Specify the encoding of the target text
> -x <transformation> Apply a transformation to the text
> caseconvert: Perform locale sensitive
> case conversions
> transliterate: Substitute
> transliterations where applicable
> e.g., substitute "ae" for
> "æ".
> -n <normalization> For UTF encodings, apply a normalization form
> nfc: Canonical composition
> nfkc: Compatibility composition
> nfd: Canonical decomposition
> nfkd: Compatibility decomposition
> fcc: Fast C Contiguous
> -l <locale> Specify a locale to use for locale
> sensitive transformations
> -o <file> Specify an output file (stdout bu default)
>
> Encoding conversions would exercise P1629 (Standard Text Encoding)
> <https://wg21.link/p1629>.
>
> Selection of encoding would exercise P1885 (Naming Text Encodings
> to Demystify Them) <https://wg21.link/p1885>. The default source
> and target encodings would depend on environment settings (e.g.,
> POSIX locale settings or the Windows ACP).
>
> Case conversion could exercise properties exposed via P1628
> (Unicode character properties) <https://wg21.link/p1628>.
>
> Locale support would exercise items under discussion for P2020
> (Locales, Encodings and Unicode) <https://wg21.link/p2020>. The
> implementation of the suggested utility must *not* call
> setlocale(..., ""). but instead determine the locale from the
> environment (e.g., POSIX locale settings); this implies not using
> existing standard interfaces that are dependent on the global
> locale object.
>
> We don't have papers yet that would address normalization,
> transliteration, case conversion algorithms, and locale names and
> properties.
>
> Thoughts? Anyone interested in working on this?
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2020-02-06 09:54:39