C++ Logo


Advanced search

Subject: Re: [SG16-Unicode] Convert between std::u8string and std::string
From: Tom Honermann (tom_at_[hidden])
Date: 2019-05-06 09:23:53

On 5/6/19 9:38 AM, Philipp Klaus Krause wrote:
> Am 04.05.19 um 01:44 schrieb JeanHeyd Meneide:
>> To be honest with you, the whole situation is a bit awful and -- what's
>> worse -- is that there are no string versions of any of these functions
>> for fast, efficient processing (c8srtombs/mbsrtoc8s,
>> c16srtombs/mbsrtoc16s, c32srtombs/mbsrtoc32s): they are just straight up
>> missing. The latter 2 in that list are being fixed by Philipp K.
>> Krause's N2282
>> (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2282.htm) -- you
>> should write to your C and/or C++ representatives in your country (or,
>> really, anyone who's listening) and tell them that we need these for
>> fast, competitive implementations that hope to hold a candle to proper
>> Unicode conversion utilities employed around the world. (One of the
>> kickbacks surrounding that paper is "waiting for implementation
>> experience and feedback", I think?) I don't know how Tom feels about
>> jumping the gun and writing c8srtombs/mbsrtoc8s for the C++ standard
>> before its friends ( c16srtombs/mbsrtoc16s, c32srtombs/mbsrtoc32s) are
>> accepted into the C standard, but I would highly encourage that to be a
>> thing we do because one-by-one code point processing is a mistake for
>> efficient processing. In days gone by, the C Committee added mbsrtowcs
>> and other multiple-code point functions to the C standard for a reason
>> (this reason), why the C standard is about to wait on it to make the
>> same mistake is something I do not quite understand.
>> Maybe it's just a matter of being loud and vocal enough to the Committee
>> and its representatives to have it put in?
> My proposal N2282 adds mbstoc16s, c16stombs, mbstoc32s, c32stombs. That
> is, non-restartable functions.
> Non-restartable functions have a performance / code size advantage, are
> always thread-safe, and don't read beyond the end of 0-terminated strings.
> If there is sufficient demand for restartable ones (to be able to easily
> handle incomplete chunks of text without needing the user to handle a
> buffer), one could certainly consider adding those, too.

Philipp, could you summarize the state of that proposal?  It looks like
it was discussed at the Pittsburgh meeting according to N2307 [1] and
there was a request for additional implementation experience.  I don't
see an updated paper in the London pre-meeting mailing.  Was this
discussed there?  Do you plan to continue pushing this forward?


[1]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2307.pdf

> Philipp
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

SG16 list run by sg16-owner@lists.isocpp.org