Date: Thu, 29 Aug 2019 11:57:49 +0100
As SG16 knows, I've been busy reworking path_view to meet your feedback.
I am finding std::codecvt to be a steaming pile of poo, and I was
wondering if anyone on SG16 plans to propose a more usable, modern,
Unicode library API?
I'm not looking for much here. I just want to do the following:
- UTF8 to UTF16
- UTF8 to narrow native encoding
- UTF8 to wide native encoding
- UTF16 to UTF8
- UTF16 to narrow native encoding
- UTF16 to wide native encoding
- narrow native encoding to UTF8
- narrow native encoding to UTF16
- narrow native encoding to wide native encoding
- wide native encoding to UTF8
- wide native encoding to UTF16
- wide native encoding to narrow native encoding
These match all the formats which filesystem::path can construct from.
I also want:
- Estimate of output buffer needed for some input buffer
- Lexicographic comparison as well as reencoding
- More choice for handling invalid UTF input than refusing to continue
e.g. replacement with space characters
As far as I can tell, std::codecvt can be beaten with a stick into
(sometimes very inefficiently) implementing most of the above. So the
desired functionality is present, just with an awful API which is
extremely prone to using incorrectly, as I can attest to.
What I'd much prefer is something simple, like:
template<class Char1T, class Char2T>
int codecvt_compare(basic_string_view<Char1T> a,
basic_string_view<Char2T> b) noexcept;
And that's it for comparison. It should "just work".
For reencoding:
template<class DestT, class SrcT>
struct codecvt_reencode;
And it would have a call operator, for feeding it more source data, so
conversion becomes looping call, handling surprise, until conversion is
done.
Before I go ahead and implement my new API, is there anything better
than codecvt I can use instead?
Thanks,
Niall
I am finding std::codecvt to be a steaming pile of poo, and I was
wondering if anyone on SG16 plans to propose a more usable, modern,
Unicode library API?
I'm not looking for much here. I just want to do the following:
- UTF8 to UTF16
- UTF8 to narrow native encoding
- UTF8 to wide native encoding
- UTF16 to UTF8
- UTF16 to narrow native encoding
- UTF16 to wide native encoding
- narrow native encoding to UTF8
- narrow native encoding to UTF16
- narrow native encoding to wide native encoding
- wide native encoding to UTF8
- wide native encoding to UTF16
- wide native encoding to narrow native encoding
These match all the formats which filesystem::path can construct from.
I also want:
- Estimate of output buffer needed for some input buffer
- Lexicographic comparison as well as reencoding
- More choice for handling invalid UTF input than refusing to continue
e.g. replacement with space characters
As far as I can tell, std::codecvt can be beaten with a stick into
(sometimes very inefficiently) implementing most of the above. So the
desired functionality is present, just with an awful API which is
extremely prone to using incorrectly, as I can attest to.
What I'd much prefer is something simple, like:
template<class Char1T, class Char2T>
int codecvt_compare(basic_string_view<Char1T> a,
basic_string_view<Char2T> b) noexcept;
And that's it for comparison. It should "just work".
For reencoding:
template<class DestT, class SrcT>
struct codecvt_reencode;
And it would have a call operator, for feeding it more source data, so
conversion becomes looping call, handling surprise, until conversion is
done.
Before I go ahead and implement my new API, is there anything better
than codecvt I can use instead?
Thanks,
Niall
Received on 2019-08-29 12:57:56