C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] Convert between std::u8string and std::string

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 6 May 2019 10:23:53 -0400
On 5/6/19 9:38 AM, Philipp Klaus Krause wrote:
> Am 04.05.19 um 01:44 schrieb JeanHeyd Meneide:
>> To be honest with you, the whole situation is a bit awful and -- what's
>> worse -- is that there are no string versions of any of these functions
>> for fast, efficient processing (c8srtombs/mbsrtoc8s,
>> c16srtombs/mbsrtoc16s, c32srtombs/mbsrtoc32s): they are just straight up
>> missing. The latter 2 in that list are being fixed by Philipp K.
>> Krause's N2282
>> (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2282.htm) -- you
>> should write to your C and/or C++ representatives in your country (or,
>> really, anyone who's listening) and tell them that we need these for
>> fast, competitive implementations that hope to hold a candle to proper
>> Unicode conversion utilities employed around the world. (One of the
>> kickbacks surrounding that paper is "waiting for implementation
>> experience and feedback", I think?) I don't know how Tom feels about
>> jumping the gun and writing c8srtombs/mbsrtoc8s for the C++ standard
>> before its friends ( c16srtombs/mbsrtoc16s, c32srtombs/mbsrtoc32s) are
>> accepted into the C standard, but I would highly encourage that to be a
>> thing we do because one-by-one code point processing is a mistake for
>> efficient processing. In days gone by, the C Committee added mbsrtowcs
>> and other multiple-code point functions to the C standard for a reason
>> (this reason), why the C standard is about to wait on it to make the
>> same mistake is something I do not quite understand.
>>
>> Maybe it's just a matter of being loud and vocal enough to the Committee
>> and its representatives to have it put in?
> My proposal N2282 adds mbstoc16s, c16stombs, mbstoc32s, c32stombs. That
> is, non-restartable functions.
> Non-restartable functions have a performance / code size advantage, are
> always thread-safe, and don't read beyond the end of 0-terminated strings.
> If there is sufficient demand for restartable ones (to be able to easily
> handle incomplete chunks of text without needing the user to handle a
> buffer), one could certainly consider adding those, too.

Philipp, could you summarize the state of that proposal? It looks like
it was discussed at the Pittsburgh meeting according to N2307 [1] and
there was a request for additional implementation experience. I don't
see an updated paper in the London pre-meeting mailing. Was this
discussed there? Do you plan to continue pushing this forward?

Tom.

[1]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2307.pdf

>
> Philipp
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode



Received on 2019-05-06 16:23:57