sg16: Re: [SG16] Wording strategy for Unicode std::format

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 22 Apr 2021 17:16:12 +0200

Hey Victor,
I hope you are doing well, thanks for your reply!
A few comments below.

On Thu, Apr 22, 2021 at 4:26 PM Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:

> > Peter:
> > What should the following code do?
>
> I think (1) is the only acceptable option because all the rest are
> inconsistent with existing std::format overloads.
>
> > “std::locale in its current form is pretty much useless,” may be a true
> statement but it doesn’t help me make progress.
>
>
> Maybe we are trying to make "progress" in the wrong direction? We don't
> have to quickly hack something together for new std::format overloads. We
> didn't have a chance to look at locale in C++20 but now is a great time.
>
> > Corentin:
> > Converting between UTF-X and UTF-Y is a lossless operation.
>
> Only valid ones. There is still a question of handling transcoding errors.
>

I think to make progress SG16 will need to resolve "what if a string
pretends to be in encoding X but isn't".
In this case, what happens when a u8 string isn't a u8 string? (this is a
rhetorical question, I don't expect SG16 will come to an agreement any time
soon).
Only then can we make progress - P1880 had a point.

>
> > what is it that we gain by not allowing format(u8"{}", u"");
> and format(u8"{}", U"");?
>
> We gain consistency between all std::format overloads, simple
> specification, not having to deal with transcoding errors. I am not
> suggesting that it shouldn't be possible but that it should be explicit,
> e.g.
>
> format(u8"{}", xcode(u""))
>
> With explicit approach you can easily configure error handling.
>
> > we could provide only the u8 overload
>
> Sure provided that we have transcoding facilities. I don't think u16 and
> u32 overloads are particularly useful since you can't do much with the
> result.
>
> > mandates that the existing locale be specialize for char8_t
>
> If this specialization inherits all existing locale problems then I think
> it's not a good idea.
>

If our requirements are

   - The u8 overload should be consistent with the narrow one and
   therefore support locale
   - locale support should not be broken
   - users need u8 overloads sooner rather than later

- all of which I agree with individually - I don't see a path forward.

A non-broken Unicode locale support calls for research and
implementation experience outside of the standard.
Providing something that is consistent and *not more* broken than the
narrow overload seems useful in the short term.
And does not really worry me too much.

I see two futures

   - We can keep the std::locale class without breaking ABI, and use it as
   a unicode locale identifier. In which case the format interface won't have
   to change and support more locales. Existing locales won't change if they
   are well specified. This seems optimistic
   - We have to introduce a new locale object, so a new set of std::format
   overloads

In either cases I think adding a utf-8 overload today that works exactly to
the extent the narrow one does is useful for our users.

>
> - Victor
>
> On Mon, Apr 19, 2021 at 3:18 AM Corentin Jabot via SG16 <
> sg16_at_[hidden]> wrote:
>
>> Talking with Peter, we realized we could provide only the u8 overload and
>> mandates that the existing locale be specialize for char8_t
>> We believe this would
>>
>> - Satisfy Victor's excellent remark about the need not to be
>> gratuitously inconsistent
>> - Put minimum strain on implementers
>> - Let us move forward with having a Unicode overload in 23.
>>
>>
>>
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2021-04-22 10:16:26