Date: Wed, 3 Sep 2025 03:47:18 +0100
On Tue, 2 Sept 2025 at 22:10, Jason McKesson via Std-Proposals <
std-proposals_at_[hidden]> wrote:
> On Tue, Sep 2, 2025 at 4:59 PM zxuiji <gb2985_at_[hidden]> wrote:
> >
> > On Mon, 1 Sept 2025 at 21:20, Jason McKesson via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >>
> >> On Mon, Sep 1, 2025 at 3:59 PM zxuiji <gb2985_at_[hidden]> wrote:
> >> >
> >> > On Mon, 1 Sept 2025 at 14:37, Jason McKesson via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >> >>
> >> >> On Mon, Sep 1, 2025 at 3:02 AM zxuiji <gb2985_at_[hidden]> wrote:
> >> >> >
> >> >> > Oh but contrare, such an API SHOULD be standardised. How do you
> think iconv(), WideCharToMuliByte() and MultiBytToWideCharf exist? By
> ignoring assumptions in the data types. They're explicitly stating (with
> the void* pointer parts) they they give no s**ts about assumed encodings
> associated with whatever type is being used for the strings passed in.
> >> >>
> >> >> That's because they're made in a language which has no function
> >> >> overloading. They *cannot* care about the type.
> >> >>
> >> >> C++ has overloading; taking advantage of that is the whole point of
> >> >> having UTF-based character types. Unless a function is being given a
> >> >> non-constant expression parameter that defines the encoding of the
> >> >> input or output, there is no reason to use a `void*`.
> >> >>
> >> >> > It is in such cases that the "API that ignores assumptions" is
> standardised, at least within there own ecosystem. Frankly I would like a
> standard variant of that combines the best parts of both APIs. something
> like:
> >> >> >
> >> >> > /// @brief get codepage
> >> >> > long getstrcp( char const *name );
> >> >> > /** @brief try to convert from source codepage (scp) to
> destination codepage (dcp)
> >> >> > @return If dst is NULL or cap < 1 then the size of the allocation
> needed for dst to be completely converted (including final \0),
> >> >> > otherwise returns the number of characters converted, not
> including the \0 character appended at the end
> >> >> > **/
> >> >> > long strconv( long dcp, void *dst, long cap, long scp, void const
> *src, long end );
> >> >> >
> >> >> > along with some optional inlines like these to make it easier to
> use when said types are used:
> >> >> >
> >> >> > inline utfconv8to16( char16_t *dst, long max, char8_t const *src,
> long len )
> >> >> > { return strconv( 1200, dst, max * sizeof(char16_t), 65001,
> src, len * sizeof(char8_t) ); }
> >> >> > inline utfconv8to32( .char32_t *dst, ... )
> >> >> > { return strconv( 12000, dst, ... ); }
> >> >> > inline utfconv16to8( char8_t *dst ... )
> >> >> > ...
> >> >> >
> >> >> > Would make things much more convenient to work with and the API
> would be conveying the assumptions it's applying directly to the developer
> >> >>
> >> >> Again, `filesystem::path`'s constructors are illustrative here. There
> >> >> is no reason for any language with overloading to do things the way
> >> >> you describe. If we have a type system, `utfconv` should be enough;
> >> >> everything else can be taken from the types of the parameters.
> >> >>
> >> >> Similarly, such functions should be given `std::span<char*_t>`s, not
> >> >> bare pointers and sizes.
> >> >> --
> >> >> Std-Proposals mailing list
> >> >> Std-Proposals_at_[hidden]
> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >> >
> >> > Really? So you'd rather make overloads for 1000s of codepages?
> >>
> >> Allow me to quote myself: "Unless a function is being given a
> >> non-constant expression parameter that defines the encoding of the
> >> input or output, there is no reason to use a `void*`."
> >>
> >> You *can* have such a function. But the functions that specifically
> >> convert from one particular encoding to another particular encoding
> >> (such as your hypothetical `utfconv` functions) should use overloading
> >> based on the types that define what those encodings are, not different
> >> function names. There's no reason not to just have `utfconv`, and pass
> >> it `span`s of `char8/16/32_t`s as needed, and let function overloading
> >> handle the rest.
> >
> > The functions I gave were for the C used under the hood because let's be
> real, no one in their right mind tries to re-implement text encoding
> management for 100s if not 1000s of text encodings in C++, the bain of
> sanity (I would take rust any day and I hate that s**t)
>
> OK, but... how does that change anything I said?
>
> If you have a function that uses a runtime parameter to decide which
> encodings it is dealing with, then it *must* use `void*`s in its
> parameter lists. So your `strconv` function is perfectly fine.
>
> But the number of Unicode encodings is a finite, and very small, set.
> And given that these represent the majority of encoding conversions
> people need to deal with, we need conversion functions specific to
> those encodings.
>
> Such functions should be overloaded, such that the encodings they are
> going from/to are built into the type system, not the function's name.
> So `utfconv`, not `utfconv8to16`. I don't care if C has a long history
> of functions like `utfconv8to16`; that's not a good reason for them to
> exist in C++.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Because it's convenient to have them in C land, I don't care if a
subsequent utfconv function in C++ uses them under the hood, so long as if
and when the strconv is inevitably suggested to the C committee the
utfconvXtoY inlines are also suggested at the same time.
std-proposals_at_[hidden]> wrote:
> On Tue, Sep 2, 2025 at 4:59 PM zxuiji <gb2985_at_[hidden]> wrote:
> >
> > On Mon, 1 Sept 2025 at 21:20, Jason McKesson via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >>
> >> On Mon, Sep 1, 2025 at 3:59 PM zxuiji <gb2985_at_[hidden]> wrote:
> >> >
> >> > On Mon, 1 Sept 2025 at 14:37, Jason McKesson via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
> >> >>
> >> >> On Mon, Sep 1, 2025 at 3:02 AM zxuiji <gb2985_at_[hidden]> wrote:
> >> >> >
> >> >> > Oh but contrare, such an API SHOULD be standardised. How do you
> think iconv(), WideCharToMuliByte() and MultiBytToWideCharf exist? By
> ignoring assumptions in the data types. They're explicitly stating (with
> the void* pointer parts) they they give no s**ts about assumed encodings
> associated with whatever type is being used for the strings passed in.
> >> >>
> >> >> That's because they're made in a language which has no function
> >> >> overloading. They *cannot* care about the type.
> >> >>
> >> >> C++ has overloading; taking advantage of that is the whole point of
> >> >> having UTF-based character types. Unless a function is being given a
> >> >> non-constant expression parameter that defines the encoding of the
> >> >> input or output, there is no reason to use a `void*`.
> >> >>
> >> >> > It is in such cases that the "API that ignores assumptions" is
> standardised, at least within there own ecosystem. Frankly I would like a
> standard variant of that combines the best parts of both APIs. something
> like:
> >> >> >
> >> >> > /// @brief get codepage
> >> >> > long getstrcp( char const *name );
> >> >> > /** @brief try to convert from source codepage (scp) to
> destination codepage (dcp)
> >> >> > @return If dst is NULL or cap < 1 then the size of the allocation
> needed for dst to be completely converted (including final \0),
> >> >> > otherwise returns the number of characters converted, not
> including the \0 character appended at the end
> >> >> > **/
> >> >> > long strconv( long dcp, void *dst, long cap, long scp, void const
> *src, long end );
> >> >> >
> >> >> > along with some optional inlines like these to make it easier to
> use when said types are used:
> >> >> >
> >> >> > inline utfconv8to16( char16_t *dst, long max, char8_t const *src,
> long len )
> >> >> > { return strconv( 1200, dst, max * sizeof(char16_t), 65001,
> src, len * sizeof(char8_t) ); }
> >> >> > inline utfconv8to32( .char32_t *dst, ... )
> >> >> > { return strconv( 12000, dst, ... ); }
> >> >> > inline utfconv16to8( char8_t *dst ... )
> >> >> > ...
> >> >> >
> >> >> > Would make things much more convenient to work with and the API
> would be conveying the assumptions it's applying directly to the developer
> >> >>
> >> >> Again, `filesystem::path`'s constructors are illustrative here. There
> >> >> is no reason for any language with overloading to do things the way
> >> >> you describe. If we have a type system, `utfconv` should be enough;
> >> >> everything else can be taken from the types of the parameters.
> >> >>
> >> >> Similarly, such functions should be given `std::span<char*_t>`s, not
> >> >> bare pointers and sizes.
> >> >> --
> >> >> Std-Proposals mailing list
> >> >> Std-Proposals_at_[hidden]
> >> >> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
> >> >
> >> > Really? So you'd rather make overloads for 1000s of codepages?
> >>
> >> Allow me to quote myself: "Unless a function is being given a
> >> non-constant expression parameter that defines the encoding of the
> >> input or output, there is no reason to use a `void*`."
> >>
> >> You *can* have such a function. But the functions that specifically
> >> convert from one particular encoding to another particular encoding
> >> (such as your hypothetical `utfconv` functions) should use overloading
> >> based on the types that define what those encodings are, not different
> >> function names. There's no reason not to just have `utfconv`, and pass
> >> it `span`s of `char8/16/32_t`s as needed, and let function overloading
> >> handle the rest.
> >
> > The functions I gave were for the C used under the hood because let's be
> real, no one in their right mind tries to re-implement text encoding
> management for 100s if not 1000s of text encodings in C++, the bain of
> sanity (I would take rust any day and I hate that s**t)
>
> OK, but... how does that change anything I said?
>
> If you have a function that uses a runtime parameter to decide which
> encodings it is dealing with, then it *must* use `void*`s in its
> parameter lists. So your `strconv` function is perfectly fine.
>
> But the number of Unicode encodings is a finite, and very small, set.
> And given that these represent the majority of encoding conversions
> people need to deal with, we need conversion functions specific to
> those encodings.
>
> Such functions should be overloaded, such that the encodings they are
> going from/to are built into the type system, not the function's name.
> So `utfconv`, not `utfconv8to16`. I don't care if C has a long history
> of functions like `utfconv8to16`; that's not a good reason for them to
> exist in C++.
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
>
Because it's convenient to have them in C land, I don't care if a
subsequent utfconv function in C++ uses them under the hood, so long as if
and when the strconv is inevitably suggested to the C committee the
utfconvXtoY inlines are also suggested at the same time.
Received on 2025-09-03 02:33:06