C++ Logo


Advanced search

Subject: [SG16-Unicode] Replacement for codecvt
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2019-08-29 05:57:49

As SG16 knows, I've been busy reworking path_view to meet your feedback.
I am finding std::codecvt to be a steaming pile of poo, and I was
wondering if anyone on SG16 plans to propose a more usable, modern,
Unicode library API?

I'm not looking for much here. I just want to do the following:

- UTF8 to UTF16
- UTF8 to narrow native encoding
- UTF8 to wide native encoding
- UTF16 to UTF8
- UTF16 to narrow native encoding
- UTF16 to wide native encoding
- narrow native encoding to UTF8
- narrow native encoding to UTF16
- narrow native encoding to wide native encoding
- wide native encoding to UTF8
- wide native encoding to UTF16
- wide native encoding to narrow native encoding

These match all the formats which filesystem::path can construct from.

I also want:

- Estimate of output buffer needed for some input buffer
- Lexicographic comparison as well as reencoding
- More choice for handling invalid UTF input than refusing to continue
e.g. replacement with space characters

As far as I can tell, std::codecvt can be beaten with a stick into
(sometimes very inefficiently) implementing most of the above. So the
desired functionality is present, just with an awful API which is
extremely prone to using incorrectly, as I can attest to.

What I'd much prefer is something simple, like:

template<class Char1T, class Char2T>
int codecvt_compare(basic_string_view<Char1T> a,
basic_string_view<Char2T> b) noexcept;

And that's it for comparison. It should "just work".

For reencoding:

template<class DestT, class SrcT>
struct codecvt_reencode;

And it would have a call operator, for feeding it more source data, so
conversion becomes looping call, handling surprise, until conversion is

Before I go ahead and implement my new API, is there anything better
than codecvt I can use instead?


SG16 list run by sg16-owner@lists.isocpp.org