> My concern is that if I silently substitute invalid code units for U+FFFD when transcoding, does that count as "diagnosing" it?

Good catch. I don't think substitution counts as diagnosing. The "Recommended practice" clause was added per the following SG16 vote but I think we forgot to remove the earlier recommendation:

Poll 7: P2093R6: <print> facility implementors are encouraged to substitute U+FFFD replacement characters following Unicode guidance when output is directed to a device and transcoding is necessary.
Attendance: 9 (1 abstention)

SF F N A SA
2 5 0 0 1
Consensus: Consensus in favor.

I suggest striking "and implementations are encouraged to diagnose it", possibly as an LWG issue.

> And how should I diagnose invalid code units when not transcoding?
> For Windows, should I write it out with substitutions, then throw/abort?
> Or should substitutions be done for POSIX too, irrespective of whether transcoding?

You can but don't have to do substitution on platforms where transcoding is not performed. For example, {fmt} doesn't do it on POSIX.

HTH,
Victor


On Wed, Feb 22, 2023 at 9:20 AM Jonathan Wakely via SG16 <sg16@lists.isocpp.org> wrote:
[...] if out contains invalid code units, the behavior is undefined and implementations
are encouraged to diagnose it.
[...]
Recommended practice: If invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with u+fffd replacement character

This is a recommended practice for handling UB, which seems novel, but that's OK. I think it's good to say "undefined, but it should really be diagnosed".

My concern is that if I silently substitute invalid code units for U+FFFD when transcoding, does that count as "diagnosing" it? If not, I don't see how I can meet the encouragement to diagnose the UB, and also meet the encouragement to do the U+FFFD substitutions.

And how should I diagnose invalid code units when not transcoding? It would seem strange to throw an exception or abort for POSIX (where no transcoding happens) but silently substitute for U+FFFD on Windows (where transcoding to UTF-16 happens).

For Windows, should I write it out with substitutions, then throw/abort?

Or should substitutions be done for POSIX too, irrespective of whether transcoding?

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16