C++ Logo

sg16

Advanced search

std::vprint_unicode vs invalid code units

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Wed, 22 Feb 2023 17:20:04 +0000
[...] if out contains invalid code units, the behavior is undefined and
implementations
are encouraged to diagnose it.
[...]
Recommended practice: If invoking the native Unicode API requires
transcoding, implementations
should substitute invalid code units with u+fffd replacement character

This is a *recommended practice* for handling UB, which seems novel, but
that's OK. I think it's good to say "undefined, but it should really be
diagnosed".

My concern is that if I silently substitute invalid code units for U+FFFD
when transcoding, does that count as "diagnosing" it? If not, I don't see
how I can meet the encouragement to diagnose the UB, and also meet the
encouragement to do the U+FFFD substitutions.

And how should I diagnose invalid code units when not transcoding? It would
seem strange to throw an exception or abort for POSIX (where no transcoding
happens) but silently substitute for U+FFFD on Windows (where transcoding
to UTF-16 happens).

For Windows, should I write it out with substitutions, then throw/abort?

Or should substitutions be done for POSIX too, irrespective of whether
transcoding?

Received on 2023-02-22 17:20:20