C++ Logo

sg16

Advanced search

Re: [SG16] iconv-style interface for transcoding functions

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Thu, 28 Jan 2021 13:00:51 +0100
On Thu, Jan 28, 2021 at 12:27 PM Peter Brett via SG16 <sg16_at_[hidden]>
wrote:

> Hi Jens,
>
> Thank you for sending this suggestion through. It'll be good input for a
> discussion at the start of our next meeting.
>
> One of the features in the current paper that would be lost is described
> in (3.5):
>
> (3.5) - If output_size is not NULL, then *output_size will be
> decremented the amount of code units that would have
> been written to *output (even if output was NULL). If
> the output is exhausted (*output_size will be
> decremented below zero), the function returns
> MCHAR_INSUFFICIENT_OUTPUT.
>
> This allows the restartable transcoding functions to be used to measure
> the amount of space required to store the results of a transcoding
> operation without pre-allocating a buffer.
>
> This is valuable. How would it be provided in a hypothetical [begin, end)
> pointer interface?
>

I agree with Jens that the possibility to assume a large enough buffer is a
security nightmare.
I agree with you that the computation of the required size may be useful
(to some extent, given it's an O(n) operation depending on the encodings,
so allocating a huge buffer might actually be faster than trying to know
exactly how much space is needed)

In some APIs (for example in WIN32), if output is NULL and output_size is
not null, then *output_size will be the output buffer size required.
The combination that is problematic here is output not NULL and output_size
NULL

As for the performance around [begin, end) vs the current design, I would
like to see benchmarks.
Intuitively, I'm inclined to think that on a sufficiently large buffer the
difference is likely to not be meaningful. But again, it's a very
vacuous statement without numbers.


> Best regards,
>
> Peter
>
>
> > -----Original Message-----
> > From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer via
> SG16
> > Sent: 27 January 2021 21:32
> > To: SG16 <sg16_at_[hidden]>
> > Cc: Jens Maurer <Jens.Maurer_at_[hidden]>
> > Subject: [SG16] iconv-style interface for transcoding functions
> >
> > EXTERNAL MAIL
> >
> >
> > In today's teleconference, JeanHeyd suggested that the
> > "pointer + length" interface would allow to pass nullptr
> > for the output length, allowing the user to assert
> > "there is enough space", which, in turn, allows to forego
> > range checking during transcoding.
> >
> > At least the version of iconv on my current Ubuntu
> > system does not offer precedence for these semantics;
> > passing nullptr for the output length just crashes.
> > Test case:
> >
> > #include <iconv.h>
> > #include <stdlib.h>
> >
> > int main()
> > {
> > iconv_t cd = iconv_open("utf-8", "utf-8");
> > char in[] = "abcd";
> > char *pin = in;
> > size_t nin = 4;
> > char *pout = (char*)malloc(100);
> > size_t nout = 100;
> > size_t n = iconv(cd, &pin, &nin, &pout, nullptr);
> > }
> >
> > This behavior is consistent with the description in
> > the man page, where no mention of special handling
> > of nullptr length arguments appears.
> > Plus the POSIX specification agrees:
> >
> https://urldefense.com/v3/__https://pubs.opengroup.org/onlinepubs/9699919799
> >
> /functions/iconv.html__;!!EHscmS1ygiU1lA!R9gLUcalUHS_RvR92m7qRbPIefVW8CV2wNI
> > qm5qwaReR7OyQATrk1qRmyrbNEg$
> >
> > I'd also like to point out that interfaces that
> > assume "there will be enough space" are prone to
> > misuse, admitting buffer overflows. I'd also
> > like to point out that the perceived run-time
> > overhead of the extra length check is partially
> > mitigated by
> >
> > - the necessity to check the length pointer for nullptr
> > and branch to a special implementation
> >
> > - in a [begin, end) iterator range implementation,
> > the ability to determine the available space and omit
> > some or all length checks if ample space is provided.
> >
> > In short, I believe the core interface should treat in
> > [begin, end) iterator ranges, where "begin" is updated
> > by the function. If that doesn't materialize (for whatever
> > reason), I expressly do not want thin decorators to
> > be standardized, ballooning the number of functions even
> > more.
> >
> > Jens
> > --
> > SG16 mailing list
> > SG16_at_[hidden]
> >
> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
> >
> 16__;!!EHscmS1ygiU1lA!R9gLUcalUHS_RvR92m7qRbPIefVW8CV2wNIqm5qwaReR7OyQATrk1q
> > TqQcP2aQ$
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-01-28 06:01:04