C++ Logo

sg16

Advanced search

Re: [SG16] iconv-style interface for transcoding functions

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Thu, 28 Jan 2021 19:27:15 +0100
On 28/01/2021 12.27, Peter Brett wrote:
> Hi Jens,
>
> Thank you for sending this suggestion through. It'll be good input for a discussion at the start of our next meeting.
>
> One of the features in the current paper that would be lost is described in (3.5):
>
> (3.5) - If output_size is not NULL, then *output_size will be
> decremented the amount of code units that would have
> been written to *output (even if output was NULL). If
> the output is exhausted (*output_size will be
> decremented below zero), the function returns
> MCHAR_INSUFFICIENT_OUTPUT.

Wait, *output_size is of type size_t, so cannot ever become negative.

> This allows the restartable transcoding functions to be used to measure the amount of space required to store the results of a transcoding operation without pre-allocating a buffer.

So, if I have a large input buffer, but a small-ish output buffer
(maybe I need to fill one 4k page of memory at a time), I'll always
pay for processing the entire input buffer, just to do the count?
I'm strongly opposed to having such an interface.
(In the scope of this interface design, I'd turn to weakly opposed
when having this special behavior only if output_ptr == nullptr,
i.e. no output is produced.)

(Also, iconv doesn't do that.)

> This is valuable. How would it be provided in a hypothetical [begin, end) pointer interface?

Having the result in a negative size_t certainly doesn't work, either.

Hypothetically, you invoke this special behavior with end==nullptr
and just increment "begin" as far as you need to convey the count.
Except that this is undefined behavior, because "begin" might now
point beyond its array.

An alternative would be to have an alternative set of functions
that just does not the counting (no output), but the zoo we have
for transcoding is already large.

Regardless of the outcome of this discussion, please add the arguments (on both sides)
to the paper, together with the conclusion.

Jens


> Best regards,
>
> Peter
>
>
>> -----Original Message-----
>> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer via SG16
>> Sent: 27 January 2021 21:32
>> To: SG16 <sg16_at_[hidden]>
>> Cc: Jens Maurer <Jens.Maurer_at_[hidden]>
>> Subject: [SG16] iconv-style interface for transcoding functions
>>
>> EXTERNAL MAIL
>>
>>
>> In today's teleconference, JeanHeyd suggested that the
>> "pointer + length" interface would allow to pass nullptr
>> for the output length, allowing the user to assert
>> "there is enough space", which, in turn, allows to forego
>> range checking during transcoding.
>>
>> At least the version of iconv on my current Ubuntu
>> system does not offer precedence for these semantics;
>> passing nullptr for the output length just crashes.
>> Test case:
>>
>> #include <iconv.h>
>> #include <stdlib.h>
>>
>> int main()
>> {
>> iconv_t cd = iconv_open("utf-8", "utf-8");
>> char in[] = "abcd";
>> char *pin = in;
>> size_t nin = 4;
>> char *pout = (char*)malloc(100);
>> size_t nout = 100;
>> size_t n = iconv(cd, &pin, &nin, &pout, nullptr);
>> }
>>
>> This behavior is consistent with the description in
>> the man page, where no mention of special handling
>> of nullptr length arguments appears.
>> Plus the POSIX specification agrees:
>> https://urldefense.com/v3/__https://pubs.opengroup.org/onlinepubs/9699919799
>> /functions/iconv.html__;!!EHscmS1ygiU1lA!R9gLUcalUHS_RvR92m7qRbPIefVW8CV2wNI
>> qm5qwaReR7OyQATrk1qRmyrbNEg$
>>
>> I'd also like to point out that interfaces that
>> assume "there will be enough space" are prone to
>> misuse, admitting buffer overflows. I'd also
>> like to point out that the perceived run-time
>> overhead of the extra length check is partially
>> mitigated by
>>
>> - the necessity to check the length pointer for nullptr
>> and branch to a special implementation
>>
>> - in a [begin, end) iterator range implementation,
>> the ability to determine the available space and omit
>> some or all length checks if ample space is provided.
>>
>> In short, I believe the core interface should treat in
>> [begin, end) iterator ranges, where "begin" is updated
>> by the function. If that doesn't materialize (for whatever
>> reason), I expressly do not want thin decorators to
>> be standardized, ballooning the number of functions even
>> more.
>>
>> Jens
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
>> 16__;!!EHscmS1ygiU1lA!R9gLUcalUHS_RvR92m7qRbPIefVW8CV2wNIqm5qwaReR7OyQATrk1q
>> TqQcP2aQ$

Received on 2021-01-28 12:27:25