Subject: [SG16-Unicode] Replacement for codecvt
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2019-08-29 05:57:49
As SG16 knows, I've been busy reworking path_view to meet your feedback.
I am finding std::codecvt to be a steaming pile of poo, and I was
wondering if anyone on SG16 plans to propose a more usable, modern,
Unicode library API?
I'm not looking for much here. I just want to do the following:
- UTF8 to UTF16
- UTF8 to narrow native encoding
- UTF8 to wide native encoding
- UTF16 to UTF8
- UTF16 to narrow native encoding
- UTF16 to wide native encoding
- narrow native encoding to UTF8
- narrow native encoding to UTF16
- narrow native encoding to wide native encoding
- wide native encoding to UTF8
- wide native encoding to UTF16
- wide native encoding to narrow native encoding
These match all the formats which filesystem::path can construct from.
I also want:
- Estimate of output buffer needed for some input buffer
- Lexicographic comparison as well as reencoding
- More choice for handling invalid UTF input than refusing to continue
e.g. replacement with space characters
As far as I can tell, std::codecvt can be beaten with a stick into
(sometimes very inefficiently) implementing most of the above. So the
desired functionality is present, just with an awful API which is
extremely prone to using incorrectly, as I can attest to.
What I'd much prefer is something simple, like:
template<class Char1T, class Char2T>
int codecvt_compare(basic_string_view<Char1T> a,
basic_string_view<Char2T> b) noexcept;
And that's it for comparison. It should "just work".
template<class DestT, class SrcT>
And it would have a call operator, for feeding it more source data, so
conversion becomes looping call, handling surprise, until conversion is
Before I go ahead and implement my new API, is there anything better
than codecvt I can use instead?
SG16 list run by firstname.lastname@example.org