C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Convert between std::u8string and std::string

From: Philipp Klaus Krause <krauseph_at_[hidden]>
Date: Mon, 6 May 2019 15:38:54 +0200
Am 04.05.19 um 01:44 schrieb JeanHeyd Meneide:
>
> To be honest with you, the whole situation is a bit awful and -- what's
> worse -- is that there are no string versions of any of these functions
> for fast, efficient processing (c8srtombs/mbsrtoc8s,
> c16srtombs/mbsrtoc16s, c32srtombs/mbsrtoc32s): they are just straight up
> missing. The latter 2 in that list are being fixed by Philipp K.
> Krause's N2282
> (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2282.htm) -- you
> should write to your C and/or C++ representatives in your country (or,
> really, anyone who's listening) and tell them that we need these for
> fast, competitive implementations that hope to hold a candle to proper
> Unicode conversion utilities employed around the world. (One of the
> kickbacks surrounding that paper is "waiting for implementation
> experience and feedback", I think?) I don't know how Tom feels about
> jumping the gun and writing c8srtombs/mbsrtoc8s for the C++ standard
> before its friends ( c16srtombs/mbsrtoc16s, c32srtombs/mbsrtoc32s) are
> accepted into the C standard, but I would highly encourage that to be a
> thing we do because one-by-one code point processing is a mistake for
> efficient processing. In days gone by, the C Committee added mbsrtowcs
> and other multiple-code point functions to the C standard for a reason
> (this reason), why the C standard is about to wait on it to make the
> same mistake is something I do not quite understand.
>
> Maybe it's just a matter of being loud and vocal enough to the Committee
> and its representatives to have it put in?

My proposal N2282 adds mbstoc16s, c16stombs, mbstoc32s, c32stombs. That
is, non-restartable functions.
Non-restartable functions have a performance / code size advantage, are
always thread-safe, and don't read beyond the end of 0-terminated strings.
If there is sufficient demand for restartable ones (to be able to easily
handle incomplete chunks of text without needing the user to handle a
buffer), one could certainly consider adding those, too.

Philipp


Received on 2019-05-06 15:39:03