ISOCPP std-proposals List: Re: [std-proposals] charN_t (was: TBAA and extended floating-point types)

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Tue, 2 Sep 2025 17:10:13 -0400

On Tue, Sep 2, 2025 at 4:59 PM zxuiji <gb2985_at_[hidden]> wrote:
>
> On Mon, 1 Sept 2025 at 21:20, Jason McKesson via Std-Proposals <std-proposals_at_[hidden]> wrote:
>>
>> On Mon, Sep 1, 2025 at 3:59 PM zxuiji <gb2985_at_[hidden]> wrote:
>> >
>> > On Mon, 1 Sept 2025 at 14:37, Jason McKesson via Std-Proposals <std-proposals_at_[hidden]> wrote:
>> >>
>> >> On Mon, Sep 1, 2025 at 3:02 AM zxuiji <gb2985_at_[hidden]> wrote:
>> >> >
>> >> > Oh but contrare, such an API SHOULD be standardised. How do you think iconv(), WideCharToMuliByte() and MultiBytToWideCharf exist? By ignoring assumptions in the data types. They're explicitly stating (with the void* pointer parts) they they give no s**ts about assumed encodings associated with whatever type is being used for the strings passed in.
>> >>
>> >> That's because they're made in a language which has no function
>> >> overloading. They *cannot* care about the type.
>> >>
>> >> C++ has overloading; taking advantage of that is the whole point of
>> >> having UTF-based character types. Unless a function is being given a
>> >> non-constant expression parameter that defines the encoding of the
>> >> input or output, there is no reason to use a `void*`.
>> >>
>> >> > It is in such cases that the "API that ignores assumptions" is standardised, at least within there own ecosystem. Frankly I would like a standard variant of that combines the best parts of both APIs. something like:
>> >> >
>> >> > /// @brief get codepage
>> >> > long getstrcp( char const *name );
>> >> > /** @brief try to convert from source codepage (scp) to destination codepage (dcp)
>> >> > @return If dst is NULL or cap < 1 then the size of the allocation needed for dst to be completely converted (including final \0),
>> >> > otherwise returns the number of characters converted, not including the \0 character appended at the end
>> >> > **/
>> >> > long strconv( long dcp, void *dst, long cap, long scp, void const *src, long end );
>> >> >
>> >> > along with some optional inlines like these to make it easier to use when said types are used:
>> >> >
>> >> > inline utfconv8to16( char16_t *dst, long max, char8_t const *src, long len )
>> >> > { return strconv( 1200, dst, max * sizeof(char16_t), 65001, src, len * sizeof(char8_t) ); }
>> >> > inline utfconv8to32( .char32_t *dst, ... )
>> >> > { return strconv( 12000, dst, ... ); }
>> >> > inline utfconv16to8( char8_t *dst ... )
>> >> > ...
>> >> >
>> >> > Would make things much more convenient to work with and the API would be conveying the assumptions it's applying directly to the developer
>> >>
>> >> Again, `filesystem::path`'s constructors are illustrative here. There
>> >> is no reason for any language with overloading to do things the way
>> >> you describe. If we have a type system, `utfconv` should be enough;
>> >> everything else can be taken from the types of the parameters.
>> >>
>> >> Similarly, such functions should be given `std::span<char*_t>`s, not
>> >> bare pointers and sizes.
>> >> --
>> >> Std-Proposals mailing list
>> >> Std-Proposals_at_[hidden]
>> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>> >
>> > Really? So you'd rather make overloads for 1000s of codepages?
>>
>> Allow me to quote myself: "Unless a function is being given a
>> non-constant expression parameter that defines the encoding of the
>> input or output, there is no reason to use a `void*`."
>>
>> You *can* have such a function. But the functions that specifically
>> convert from one particular encoding to another particular encoding
>> (such as your hypothetical `utfconv` functions) should use overloading
>> based on the types that define what those encodings are, not different
>> function names. There's no reason not to just have `utfconv`, and pass
>> it `span`s of `char8/16/32_t`s as needed, and let function overloading
>> handle the rest.
>
> The functions I gave were for the C used under the hood because let's be real, no one in their right mind tries to re-implement text encoding management for 100s if not 1000s of text encodings in C++, the bain of sanity (I would take rust any day and I hate that s**t)

OK, but... how does that change anything I said?

If you have a function that uses a runtime parameter to decide which
encodings it is dealing with, then it *must* use `void*`s in its
parameter lists. So your `strconv` function is perfectly fine.

But the number of Unicode encodings is a finite, and very small, set.
And given that these represent the majority of encoding conversions
people need to deal with, we need conversion functions specific to
those encodings.

Such functions should be overloaded, such that the encodings they are
going from/to are built into the type system, not the function's name.
So `utfconv`, not `utfconv8to16`. I don't care if C has a long history
of functions like `utfconv8to16`; that's not a good reason for them to
exist in C++.

Received on 2025-09-02 21:10:26