C++ Logo

sg16

Advanced search

[SG16-Unicode] Replacement for codecvt

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Thu, 29 Aug 2019 11:57:49 +0100
As SG16 knows, I've been busy reworking path_view to meet your feedback.
I am finding std::codecvt to be a steaming pile of poo, and I was
wondering if anyone on SG16 plans to propose a more usable, modern,
Unicode library API?

I'm not looking for much here. I just want to do the following:

- UTF8 to UTF16
- UTF8 to narrow native encoding
- UTF8 to wide native encoding
- UTF16 to UTF8
- UTF16 to narrow native encoding
- UTF16 to wide native encoding
- narrow native encoding to UTF8
- narrow native encoding to UTF16
- narrow native encoding to wide native encoding
- wide native encoding to UTF8
- wide native encoding to UTF16
- wide native encoding to narrow native encoding

These match all the formats which filesystem::path can construct from.

I also want:

- Estimate of output buffer needed for some input buffer
- Lexicographic comparison as well as reencoding
- More choice for handling invalid UTF input than refusing to continue
e.g. replacement with space characters


As far as I can tell, std::codecvt can be beaten with a stick into
(sometimes very inefficiently) implementing most of the above. So the
desired functionality is present, just with an awful API which is
extremely prone to using incorrectly, as I can attest to.

What I'd much prefer is something simple, like:

template<class Char1T, class Char2T>
int codecvt_compare(basic_string_view<Char1T> a,
basic_string_view<Char2T> b) noexcept;

And that's it for comparison. It should "just work".

For reencoding:

template<class DestT, class SrcT>
struct codecvt_reencode;

And it would have a call operator, for feeding it more source data, so
conversion becomes looping call, handling surprise, until conversion is
done.


Before I go ahead and implement my new API, is there anything better
than codecvt I can use instead?

Thanks,
Niall

Received on 2019-08-29 12:57:56