C++ Logo

std-proposals

Advanced search

Re: [std-proposals] charN_t (was: TBAA and extended floating-point types)

From: zxuiji <gb2985_at_[hidden]>
Date: Tue, 2 Sep 2025 22:13:30 +0100
The functions I gave were for the C used under the hood because let's be
real, no one in their right mind tries to re-implement text encoding
management for 100s if not 1000s of text encodings in C++, the bain of
sanity (I would take rust any day and I hate that s**t)

On Mon, 1 Sept 2025 at 21:20, Jason McKesson via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> On Mon, Sep 1, 2025 at 3:59 PM zxuiji <gb2985_at_[hidden]> wrote:
> >
> > On Mon, 1 Sept 2025 at 14:37, Jason McKesson via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >>
> >> On Mon, Sep 1, 2025 at 3:02 AM zxuiji <gb2985_at_[hidden]> wrote:
> >> >
> >> > Oh but contrare, such an API SHOULD be standardised. How do you think
> iconv(), WideCharToMuliByte() and MultiBytToWideCharf exist? By ignoring
> assumptions in the data types. They're explicitly stating (with the void*
> pointer parts) they they give no s**ts about assumed encodings associated
> with whatever type is being used for the strings passed in.
> >>
> >> That's because they're made in a language which has no function
> >> overloading. They *cannot* care about the type.
> >>
> >> C++ has overloading; taking advantage of that is the whole point of
> >> having UTF-based character types. Unless a function is being given a
> >> non-constant expression parameter that defines the encoding of the
> >> input or output, there is no reason to use a `void*`.
> >>
> >> > It is in such cases that the "API that ignores assumptions" is
> standardised, at least within there own ecosystem. Frankly I would like a
> standard variant of that combines the best parts of both APIs. something
> like:
> >> >
> >> > /// @brief get codepage
> >> > long getstrcp( char const *name );
> >> > /** @brief try to convert from source codepage (scp) to destination
> codepage (dcp)
> >> > @return If dst is NULL or cap < 1 then the size of the allocation
> needed for dst to be completely converted (including final \0),
> >> > otherwise returns the number of characters converted, not including
> the \0 character appended at the end
> >> > **/
> >> > long strconv( long dcp, void *dst, long cap, long scp, void const
> *src, long end );
> >> >
> >> > along with some optional inlines like these to make it easier to use
> when said types are used:
> >> >
> >> > inline utfconv8to16( char16_t *dst, long max, char8_t const *src,
> long len )
> >> > { return strconv( 1200, dst, max * sizeof(char16_t), 65001, src,
> len * sizeof(char8_t) ); }
> >> > inline utfconv8to32( .char32_t *dst, ... )
> >> > { return strconv( 12000, dst, ... ); }
> >> > inline utfconv16to8( char8_t *dst ... )
> >> > ...
> >> >
> >> > Would make things much more convenient to work with and the API would
> be conveying the assumptions it's applying directly to the developer
> >>
> >> Again, `filesystem::path`'s constructors are illustrative here. There
> >> is no reason for any language with overloading to do things the way
> >> you describe. If we have a type system, `utfconv` should be enough;
> >> everything else can be taken from the types of the parameters.
> >>
> >> Similarly, such functions should be given `std::span<char*_t>`s, not
> >> bare pointers and sizes.
> >> --
> >> Std-Proposals mailing list
> >> Std-Proposals_at_[hidden]
> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >
> > Really? So you'd rather make overloads for 1000s of codepages?
>
> Allow me to quote myself: "Unless a function is being given a
> non-constant expression parameter that defines the encoding of the
> input or output, there is no reason to use a `void*`."
>
> You *can* have such a function. But the functions that specifically
> convert from one particular encoding to another particular encoding
> (such as your hypothetical `utfconv` functions) should use overloading
> based on the types that define what those encodings are, not different
> function names. There's no reason not to just have `utfconv`, and pass
> it `span`s of `char8/16/32_t`s as needed, and let function overloading
> handle the rest.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>

Received on 2025-09-02 20:59:19