C++ Logo

SG16

Advanced search

Subject: Re: Wording strategy for Unicode std::format
From: Peter Brett (pbrett_at_[hidden])
Date: 2021-04-19 04:00:50


Good, I think that deliberately forbidding implicit transcoding is the way to go here.

What should the following code do?

#include <locale>
#include <format>
std::locale::global(std::locale("en_GB"));
auto a = std::format("{:} {:L}", 10000, 20000);
auto b = std::format(u8"{:} {:L}", 10000, 20000);

Options:


  1. a = "10000 20,000", b = u8"10000 20,000". In effect, mandate that implementations supply UTF-8/16/32 locale.
  2. a = "10000 20,000", b = u8"10000 20000". Silently ignore the ‘L’ option, setting up for an ABI break if we want to introduce Better Global Locale at some point in the future. It may also cause non-obvious changes in behaviour for users who start using std::format with 'L', but then want to u8-ify it later.
  3. a = "10000 20,000", b = <std::format_error>. Forbid implementers from even attempting to make 'L' option work in UTF-8/16/32 formatting.
  4. A = "10000 20,000", b = <conditionally-supported>. Many implementers will choose to throw std::format_error. However, if wchar_t is 32-bit, and the wide execution encoding is UTF-32, then some implementers will actually be able to ‘L’ Just Work in UTF-32 formatting, and it doesn’t seem unreasonable to allow them to do that.

“std::locale in its current form is pretty much useless,” may be a true statement but it doesn’t help me make progress.

Peter

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Sent: 18 April 2021 23:23
To: Peter Brett <pbrett_at_[hidden]>
Cc: SG16 <sg16_at_[hidden]>
Subject: Re: Wording strategy for Unicode std::format

EXTERNAL MAIL
> {fmt} already does exactly this, right?

{fmt} explicitly disallows any implicit transcoding at the formatting level.

> How can we word this so as to make `{L}` substitutions for UTF-8/16/32
> formatting conditionally-supported, depending on whether the
> implementation provides the necessary specializations of <locale>
> facilities?

I'm less concerned with how we word this and more with the fact that std::locale in its current form is pretty much useless. I would recommend looking into this instead of blindly extending it to new code unit types.

Cheers,
Victor

On Fri, Apr 16, 2021 at 10:08 AM Peter Brett <pbrett_at_[hidden]<mailto:pbrett_at_[hidden]>> wrote:
Hi all (esp. Victor),

We discussed adding C++23 support for homogeneous formatting in UTF-8,
UTF-16 and UTF-32. For C++23, we would like to allow UTF-8 format
strings with UTF-8 substitutions, UTF-16 format strings with UTF-16
substitutions, etc. In a future version of the standard (where UTF
transcoding is guaranteed to be available) we would like to extend this
to allowing e.g. UTF-32 substitutions into UTF-8 format strings.

Victor: {fmt} already does exactly this, right?

As far as I can tell, most of the wording is already in place for this,
and it will only be necessary to mandate the addition of specific
overloads and template specialisations.

My current sticking point is the way we have specified the
locale-specific form (with the `L` option). Take the `{L}` substitution
for bool, for example. In P1892 I chose to specify this in terms of
std::numpunct<charT>, but the standard only requires the standard
library to provide numpunct<char> and numpunct<wchar_t> specializations.
Similar problems arise for `L` with other standard format specifiers.

How can we word this so as to make `{L}` substitutions for UTF-8/16/32
formatting conditionally-supported, depending on whether the
implementation provides the necessary specializations of <locale>
facilities?

Advice appreciated.

                            Peter



SG16 list run by sg16-owner@lists.isocpp.org