C++ Logo

sg16

Advanced search

Re: std::vprint_unicode vs invalid code units

From: Victor Zverovich <victor.zverovich_at_[hidden]>
Date: Wed, 22 Feb 2023 12:04:49 -0800
> My concern is that if I silently substitute invalid code units for U+FFFD
when transcoding, does that count as "diagnosing" it?

Good catch. I don't think substitution counts as diagnosing. The
"Recommended practice" clause was added per the following SG16 vote but I
think we forgot to remove the earlier recommendation:

Poll 7: P2093R6: <print> facility implementors are encouraged to substitute
U+FFFD replacement characters following Unicode guidance when output is
directed to a device and transcoding is necessary.
Attendance: 9 (1 abstention)

SF F N A SA
2 5 0 0 1
Consensus: Consensus in favor.

I suggest striking "and implementations are encouraged to diagnose it",
possibly as an LWG issue.

> And how should I diagnose invalid code units when not transcoding?
> For Windows, should I write it out with substitutions, then throw/abort?
> Or should substitutions be done for POSIX too, irrespective of whether
transcoding?

You can but don't have to do substitution on platforms where transcoding is
not performed. For example, {fmt} doesn't do it on POSIX.

HTH,
Victor


On Wed, Feb 22, 2023 at 9:20 AM Jonathan Wakely via SG16 <
sg16_at_[hidden]> wrote:

> [...] if out contains invalid code units, the behavior is undefined and
> implementations
> are encouraged to diagnose it.
> [...]
> Recommended practice: If invoking the native Unicode API requires
> transcoding, implementations
> should substitute invalid code units with u+fffd replacement character
>
> This is a *recommended practice* for handling UB, which seems novel, but
> that's OK. I think it's good to say "undefined, but it should really be
> diagnosed".
>
> My concern is that if I silently substitute invalid code units for U+FFFD
> when transcoding, does that count as "diagnosing" it? If not, I don't see
> how I can meet the encouragement to diagnose the UB, and also meet the
> encouragement to do the U+FFFD substitutions.
>
> And how should I diagnose invalid code units when not transcoding? It
> would seem strange to throw an exception or abort for POSIX (where no
> transcoding happens) but silently substitute for U+FFFD on Windows (where
> transcoding to UTF-16 happens).
>
> For Windows, should I write it out with substitutions, then throw/abort?
>
> Or should substitutions be done for POSIX too, irrespective of whether
> transcoding?
>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2023-02-22 20:05:02