C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Replacement for codecvt

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 29 Aug 2019 08:25:30 -0400
Hi, Niall. See JeanHeyd’s http://wg21.link/p1629. This is the direction SG16 is currently heading for encoding conversion functions.

JeanHeyd has an implementation available and it would be great to get feedback on how well it works for you!

(I agree codecvt is unfit for nearly all purposes)

Tom.

> On Aug 29, 2019, at 6:57 AM, Niall Douglas <s_sourceforge_at_[hidden]> wrote:
>
> As SG16 knows, I've been busy reworking path_view to meet your feedback.
> I am finding std::codecvt to be a steaming pile of poo, and I was
> wondering if anyone on SG16 plans to propose a more usable, modern,
> Unicode library API?
>
> I'm not looking for much here. I just want to do the following:
>
> - UTF8 to UTF16
> - UTF8 to narrow native encoding
> - UTF8 to wide native encoding
> - UTF16 to UTF8
> - UTF16 to narrow native encoding
> - UTF16 to wide native encoding
> - narrow native encoding to UTF8
> - narrow native encoding to UTF16
> - narrow native encoding to wide native encoding
> - wide native encoding to UTF8
> - wide native encoding to UTF16
> - wide native encoding to narrow native encoding
>
> These match all the formats which filesystem::path can construct from.
>
> I also want:
>
> - Estimate of output buffer needed for some input buffer
> - Lexicographic comparison as well as reencoding
> - More choice for handling invalid UTF input than refusing to continue
> e.g. replacement with space characters
>
>
> As far as I can tell, std::codecvt can be beaten with a stick into
> (sometimes very inefficiently) implementing most of the above. So the
> desired functionality is present, just with an awful API which is
> extremely prone to using incorrectly, as I can attest to.
>
> What I'd much prefer is something simple, like:
>
> template<class Char1T, class Char2T>
> int codecvt_compare(basic_string_view<Char1T> a,
> basic_string_view<Char2T> b) noexcept;
>
> And that's it for comparison. It should "just work".
>
> For reencoding:
>
> template<class DestT, class SrcT>
> struct codecvt_reencode;
>
> And it would have a call operator, for feeding it more source data, so
> conversion becomes looping call, handling surprise, until conversion is
> done.
>
>
> Before I go ahead and implement my new API, is there anything better
> than codecvt I can use instead?
>
> Thanks,
> Niall
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-08-29 14:25:35