sg16: Re: [SG16] Wording strategy for Unicode std::format

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 26 Apr 2021 23:26:31 -0400

On 4/26/21 1:35 PM, Victor Zverovich via SG16 wrote:
> > Corentin:
> > If our requirements are ... - all of which I agree with individually
> - I don't see a path forward.
>
> I don't see any problem. Since we are in the beginning of the C++23
> cycle it makes sense to tackle more difficult problems such as locale
> first. Adding format overloads is not hard and makes more sense to do
> later when we have a better understanding and hopefully solution for
> locale issues.
std::format has already taken a dependency on std::locale. I haven't
given this much thought yet, but am wondering if and/or how that limits
what kinds of changes we can make going forward.
>
> > A non-broken Unicode locale support calls for research and
> implementation experience outside of the standard.
>
> That's exactly what I intend to do, contributors are welcome.
Ohh, another volunteer willing to work on locale! That makes me almost
unreasonably happy!
>
> > Providing something that is consistent and *not more* broken than
> the narrow overload seems useful in the short term.
>
> First I don't think it's necessary per my comment above and second it
> will likely severely constrain our design space for fixing these
> issues in the future.
>
> > Peter:
> > Just for clarity, what does {fmt} currently do?
>
> Unfortunately {fmt} cannot fix the standard library so it does
> something completely crazy in char*_t overloads, namely uses a narrow
> locale and casts characters. There has been very little interest in
> char*_t overloads though.

That lack of interest thus far doesn't surprise me. And that "something
crazy" doesn't seem that crazy given what you have to work with.

Tom.

>
> Cheers,
> Victor
>
> On Thu, Apr 22, 2021 at 8:57 AM Peter Brett <pbrett_at_[hidden]
> <mailto:pbrett_at_[hidden]>> wrote:
>
> Hi Victor,
>
> This is helpful, thank you. I will put full ‘L’ handling for
> UTF-8/16/32 as the preferred option in the paper.
>
> Just for clarity, what does {fmt} currently do? Obviously if it
> currently does something different then I will have to do some
> work to demonstrate implementability.
>
> Best wishes,
>
> Peter
>
> *From:*SG16 <sg16-bounces_at_[hidden]
> <mailto:sg16-bounces_at_[hidden]>> *On Behalf Of *Victor
> Zverovich via SG16
> *Sent:* 22 April 2021 15:27
> *To:* SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>>
> *Cc:* Victor Zverovich <victor.zverovich_at_[hidden]
> <mailto:victor.zverovich_at_[hidden]>>
> *Subject:* Re: [SG16] Wording strategy for Unicode std::format
>
> EXTERNAL MAIL
>
> > Peter:
>
> > What should the following code do?
>
> I think (1) is the only acceptable option because all the rest are
> inconsistent with existing std::format overloads.
>
> > “std::locale in its current form is pretty much useless,” may be
> a true statement but it doesn’t help me make progress.
>
> Maybe we are trying to make "progress" in the wrong direction? We
> don't have to quickly hack something together for new std::format
> overloads. We didn't have a chance to look at locale in C++20 but
> now is a great time.
>
> > Corentin:
>
> > Converting between UTF-X and UTF-Y is a lossless operation.
>
> Only valid ones. There is still a question of handling transcoding
> errors.
>
> > what is it that we gain by not allowing format(u8"{}", u"");
> and format(u8"{}", U"");?
>
> We gain consistency between all std::format overloads, simple
> specification, not having to deal with transcoding errors. I am
> not suggesting that it shouldn't be possible but that it should be
> explicit, e.g.
>
> format(u8"{}", xcode(u""))
>
> With explicit approach you can easily configure error handling.
>
> > we could provide only the u8 overload
>
> Sure provided that we have transcoding facilities. I don't think
> u16 and u32 overloads are particularly useful since you can't do
> much with the result.
>
> > mandates that the existing locale be specialize for char8_t
>
> If this specialization inherits all existing locale problems then
> I think it's not a good idea.
>
> - Victor
>
> On Mon, Apr 19, 2021 at 3:18 AM Corentin Jabot via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> Talking with Peter, we realized we could provide only the u8
> overload and mandates that the existing locale be specialize
> for char8_t
>
> We believe this would
>
> * Satisfy Victor's excellent remark about the need not to be
> gratuitously inconsistent
> * Put minimum strain on implementers
> * Let us move forward with having a Unicode overload in 23.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://urldefense.com/v3/__https:/lists.isocpp.org/mailman/listinfo.cgi/sg16__;!!EHscmS1ygiU1lA!WueeYVkg4epLn98-McfxKUi3lJONY6lPzMPbUArFN5V6WCOZxR45PasGv15tlA$>
>
>

Received on 2021-04-26 22:26:34