C++ Logo

SG16

Advanced search

Subject: Re: Wording strategy for Unicode std::format
From: Victor Zverovich (victor.zverovich_at_[hidden])
Date: 2021-04-18 17:41:36


> Do we think we *need* to have a dependency between transcoding and format
in terms of schedule?

In my opinion, not only there shouldn't be dependency but we should
explicitly ban implicit transcoding during formatting. Experience with
existing facilities showed that nothing good came out of it and people
have been happy with transcoding being banned in {fmt}.

> Same question applies for utf format string & non-utf arguments

Same but with even stronger "no".

> Do we think we can't provide utf-8 support for print without worrying
about locale? (which does the wrong thing as, utf8 or not the whole
interface assumes 1 code unit == 1 codepoint)?

This would create a confusing inconsistency between different formatting
functions so I would be strongly opposed.

- Victor

On Fri, Apr 16, 2021 at 12:09 PM Corentin Jabot via SG16 <
sg16_at_[hidden]> wrote:

>
>
> On Fri, Apr 16, 2021 at 7:08 PM Peter Brett via SG16 <
> sg16_at_[hidden]> wrote:
>
>> Hi all (esp. Victor),
>>
>> We discussed adding C++23 support for homogeneous formatting in UTF-8,
>> UTF-16 and UTF-32. For C++23, we would like to allow UTF-8 format
>> strings with UTF-8 substitutions, UTF-16 format strings with UTF-16
>> substitutions, etc. In a future version of the standard (where UTF
>> transcoding is guaranteed to be available) we would like to extend this
>> to allowing e.g. UTF-32 substitutions into UTF-8 format strings.
>>
>> Victor: {fmt} already does exactly this, right?
>>
>> As far as I can tell, most of the wording is already in place for this,
>> and it will only be necessary to mandate the addition of specific
>> overloads and template specialisations.
>>
>> My current sticking point is the way we have specified the
>> locale-specific form (with the `L` option). Take the `{L}` substitution
>> for bool, for example. In P1892 I chose to specify this in terms of
>> std::numpunct<charT>, but the standard only requires the standard
>> library to provide numpunct<char> and numpunct<wchar_t> specializations.
>> Similar problems arise for `L` with other standard format specifiers.
>>
>> How can we word this so as to make `{L}` substitutions for UTF-8/16/32
>> formatting conditionally-supported, depending on whether the
>> implementation provides the necessary specializations of <locale>
>> facilities?
>>
>
> Let me ask more questions:
>
> - Do we think we *need* to have a dependency between transcoding and
> format in terms of schedule?
> - Same question applies for utf format string & non-utf arguments
> - Do we think we can't provide utf-8 support for print without
> worrying about locale? (which does the wrong thing as, utf8 or not the
> whole interface assumes 1 code unit == 1 codepoint)?
>
>
>
>>
>> Advice appreciated.
>>
>> Peter
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org