Hey Victor,
I hope you are doing well, thanks for your reply!
A few comments below.

On Thu, Apr 22, 2021 at 4:26 PM Victor Zverovich <victor.zverovich@gmail.com> wrote:
> Peter:
> What should the following code do?

I think (1) is the only acceptable option because all the rest are inconsistent with existing std::format overloads.

> “std::locale in its current form is pretty much useless,” may be a true statement but it doesn’t help me make progress.


Maybe we are trying to make "progress" in the wrong direction? We don't have to quickly hack something together for new std::format overloads. We didn't have a chance to look at locale in C++20 but now is a great time.

> Corentin:
> Converting between UTF-X and UTF-Y is a lossless operation.

Only valid ones. There is still a question of handling transcoding errors.

I think to make progress SG16 will need to resolve "what if a string pretends to be in encoding X but isn't".
In this case, what happens when a u8 string isn't a u8 string? (this is a rhetorical question, I don't expect SG16 will come to an agreement any time soon).
Only then can we make progress - P1880 had a point.
 

> what is it that we gain by not allowing format(u8"{}", u""); and format(u8"{}", U"");?

We gain consistency between all std::format overloads, simple specification, not having to deal with transcoding errors. I am not suggesting that it shouldn't be possible but that it should be explicit, e.g.

  format(u8"{}", xcode(u""))

With explicit approach you can easily configure error handling.

> we could provide only the u8 overload

Sure provided that we have transcoding facilities. I don't think u16 and u32 overloads are particularly useful since you can't do much with the result.

> mandates that the existing locale be specialize for char8_t

If this specialization inherits all existing locale problems then I think it's not a good idea.

If our requirements are
  • The u8 overload should be consistent with the narrow one and therefore support locale
  • locale support should not be broken
  • users need u8 overloads sooner rather than later
- all of which I agree with individually - I don't see a path forward.

A non-broken Unicode locale support calls for research and implementation experience outside of the standard.
Providing something that is consistent and *not more* broken than the narrow overload seems useful in the short term.
And does not really worry me too much.

I see two futures
  • We can keep the std::locale class without breaking ABI, and use it as a unicode locale identifier. In which case the format interface won't have to change and support more locales. Existing locales won't change if they are well specified. This seems optimistic
  • We have to introduce a new locale object, so a new set of std::format overloads
In either cases I think adding a utf-8 overload today that works exactly to the extent the narrow one does is useful for our users.
 

- Victor

On Mon, Apr 19, 2021 at 3:18 AM Corentin Jabot via SG16 <sg16@lists.isocpp.org> wrote:
Talking with Peter, we realized we could provide only the u8 overload and mandates that the existing locale be specialize for char8_t
We believe this would
  • Satisfy Victor's excellent remark about the need not to be gratuitously inconsistent
  • Put minimum strain on implementers
  • Let us move forward with having a Unicode overload in 23.




--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16