C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Replacement for codecvt

From: JeanHeyd Meneide <phdofthehouse_at_[hidden]>
Date: Thu, 29 Aug 2019 17:54:18 -0400
On Thu, Aug 29, 2019 at 4:08 PM Tom Honermann <tom_at_[hidden]> wrote:

> I should really just let JeanHeyd respond here, but he has also been
> pursuing low level C interfaces that fit in with wcsrtombs and friends
> and provide translation to/from UTF encodings. Those may end up being a
> better fit for your specific needs.
>

Those are going to be fundamentally different, though. C has almost never
respected the idea of [pointer, size] input when it comes to handling
strings, and most of its encoding functions -- with the rare exception --
play nice. Even if we write the C functions they'll likely be
null-termination based and therefore just a tiny bit hostile to certain
kinds of input. I could propose pointer, size functions but the entire C
committee will likely balk at such functions and throw them out the window.

They tried before in the Annex functions, and they crafted the horrible
"rsize" functions that had implementation-defined limits on the number of
characters you could possibly convert. This ended up resulting in RSIZE_MAX
turning out to be something ridiculously tiny on certain machines, and
surprising developers who tried to use it by serving as fun little DDoS
points on certain platforms when text buffers exceeded the suddenly tiny
RSIZE_MAX constant. We can argue that RSIZE_MAX was an incredibly silly
function specification to address potential overflow concerns to begin
with, but I don't think that's going to make [ptr, size] overloads
compatible with std::span sail through the C any easier.

That being said, maybe the C Committee can be talked into allowing sized
overloads, but that's another 2x string functions to propose...

Received on 2019-08-29 23:54:31