C++ Logo

sg16

Advanced search

Re: std::vprint_unicode vs invalid code units

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Wed, 22 Feb 2023 21:24:54 +0000
On Wed, 22 Feb 2023 at 20:05, Victor Zverovich <victor.zverovich_at_[hidden]>
wrote:

> > My concern is that if I silently substitute invalid code units for
> U+FFFD when transcoding, does that count as "diagnosing" it?
>
> Good catch. I don't think substitution counts as diagnosing. The
> "Recommended practice" clause was added per the following SG16 vote but I
> think we forgot to remove the earlier recommendation:
>
> Poll 7: P2093R6: <print> facility implementors are encouraged to
> substitute U+FFFD replacement characters following Unicode guidance when
> output is directed to a device and transcoding is necessary.
> Attendance: 9 (1 abstention)
>
> SF F N A SA
> 2 5 0 0 1
> Consensus: Consensus in favor.
>
> I suggest striking "and implementations are encouraged to diagnose it",
> possibly as an LWG issue.
>

OK, thanks. I'll file an issue.


>
> > And how should I diagnose invalid code units when not transcoding?
> > For Windows, should I write it out with substitutions, then throw/abort?
> > Or should substitutions be done for POSIX too, irrespective of whether
> transcoding?
>
> You can but don't have to do substitution on platforms where transcoding
> is not performed. For example, {fmt} doesn't do it on POSIX.
>

Yeah, I've got code to do it, but I'm undecided whether to enable it for
POSIX.



>
> HTH,
> Victor
>
>
> On Wed, Feb 22, 2023 at 9:20 AM Jonathan Wakely via SG16 <
> sg16_at_[hidden]> wrote:
>
>> [...] if out contains invalid code units, the behavior is undefined and
>> implementations
>> are encouraged to diagnose it.
>> [...]
>> Recommended practice: If invoking the native Unicode API requires
>> transcoding, implementations
>> should substitute invalid code units with u+fffd replacement character
>>
>> This is a *recommended practice* for handling UB, which seems novel, but
>> that's OK. I think it's good to say "undefined, but it should really be
>> diagnosed".
>>
>> My concern is that if I silently substitute invalid code units for U+FFFD
>> when transcoding, does that count as "diagnosing" it? If not, I don't see
>> how I can meet the encouragement to diagnose the UB, and also meet the
>> encouragement to do the U+FFFD substitutions.
>>
>> And how should I diagnose invalid code units when not transcoding? It
>> would seem strange to throw an exception or abort for POSIX (where no
>> transcoding happens) but silently substitute for U+FFFD on Windows (where
>> transcoding to UTF-16 happens).
>>
>> For Windows, should I write it out with substitutions, then throw/abort?
>>
>> Or should substitutions be done for POSIX too, irrespective of whether
>> transcoding?
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2023-02-22 21:25:09