ISOCPP std-proposals List: Re: [std-proposals] charN_t (was: TBAA and extended floating-point types)

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Mon, 1 Sep 2025 16:20:15 -0400

On Mon, Sep 1, 2025 at 3:59 PM zxuiji <gb2985_at_[hidden]> wrote:
>
> On Mon, 1 Sept 2025 at 14:37, Jason McKesson via Std-Proposals <std-proposals_at_[hidden]> wrote:
>>
>> On Mon, Sep 1, 2025 at 3:02 AM zxuiji <gb2985_at_[hidden]> wrote:
>> >
>> > Oh but contrare, such an API SHOULD be standardised. How do you think iconv(), WideCharToMuliByte() and MultiBytToWideCharf exist? By ignoring assumptions in the data types. They're explicitly stating (with the void* pointer parts) they they give no s**ts about assumed encodings associated with whatever type is being used for the strings passed in.
>>
>> That's because they're made in a language which has no function
>> overloading. They *cannot* care about the type.
>>
>> C++ has overloading; taking advantage of that is the whole point of
>> having UTF-based character types. Unless a function is being given a
>> non-constant expression parameter that defines the encoding of the
>> input or output, there is no reason to use a `void*`.
>>
>> > It is in such cases that the "API that ignores assumptions" is standardised, at least within there own ecosystem. Frankly I would like a standard variant of that combines the best parts of both APIs. something like:
>> >
>> > /// @brief get codepage
>> > long getstrcp( char const *name );
>> > /** @brief try to convert from source codepage (scp) to destination codepage (dcp)
>> > @return If dst is NULL or cap < 1 then the size of the allocation needed for dst to be completely converted (including final \0),
>> > otherwise returns the number of characters converted, not including the \0 character appended at the end
>> > **/
>> > long strconv( long dcp, void *dst, long cap, long scp, void const *src, long end );
>> >
>> > along with some optional inlines like these to make it easier to use when said types are used:
>> >
>> > inline utfconv8to16( char16_t *dst, long max, char8_t const *src, long len )
>> > { return strconv( 1200, dst, max * sizeof(char16_t), 65001, src, len * sizeof(char8_t) ); }
>> > inline utfconv8to32( .char32_t *dst, ... )
>> > { return strconv( 12000, dst, ... ); }
>> > inline utfconv16to8( char8_t *dst ... )
>> > ...
>> >
>> > Would make things much more convenient to work with and the API would be conveying the assumptions it's applying directly to the developer
>>
>> Again, `filesystem::path`'s constructors are illustrative here. There
>> is no reason for any language with overloading to do things the way
>> you describe. If we have a type system, `utfconv` should be enough;
>> everything else can be taken from the types of the parameters.
>>
>> Similarly, such functions should be given `std::span<char*_t>`s, not
>> bare pointers and sizes.
>> --
>> Std-Proposals mailing list
>> Std-Proposals_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
> Really? So you'd rather make overloads for 1000s of codepages?

Allow me to quote myself: "Unless a function is being given a
non-constant expression parameter that defines the encoding of the
input or output, there is no reason to use a `void*`."

You *can* have such a function. But the functions that specifically
convert from one particular encoding to another particular encoding
(such as your hypothetical `utfconv` functions) should use overloading
based on the types that define what those encodings are, not different
function names. There's no reason not to just have `utfconv`, and pass
it `span`s of `char8/16/32_t`s as needed, and let function overloading
handle the rest.

Received on 2025-09-01 20:20:29