> My concern is that if I silently substitute invalid code units for U+FFFD when transcoding, does that count as "diagnosing" it?
Good catch. I don't think substitution counts as diagnosing. The "Recommended practice" clause was added per the following SG16 vote but I think we forgot to remove the earlier recommendation:
Poll 7: P2093R6: <print> facility implementors are encouraged to substitute U+FFFD replacement characters following Unicode guidance when output is directed to a device and transcoding is necessary.
Attendance: 9 (1 abstention)
SF F N A SA
2 5 0 0 1
Consensus: Consensus in favor.
I suggest striking "and implementations are encouraged to diagnose it", possibly as an LWG issue.
> And how should I diagnose invalid code units when not transcoding?> For Windows, should I write it out with substitutions, then throw/abort?> Or should substitutions be done for POSIX too, irrespective of whether transcoding?
You can but don't have to do substitution on platforms where transcoding is not performed. For example, {fmt} doesn't do it on POSIX.
HTH,VictorOn Wed, Feb 22, 2023 at 9:20 AM Jonathan Wakely via SG16 <sg16@lists.isocpp.org> wrote:--[...] if out contains invalid code units, the behavior is undefined and implementationsare encouraged to diagnose it.[...]Recommended practice: If invoking the native Unicode API requires transcoding, implementations
should substitute invalid code units with u+fffd replacement characterThis is a recommended practice for handling UB, which seems novel, but that's OK. I think it's good to say "undefined, but it should really be diagnosed".My concern is that if I silently substitute invalid code units for U+FFFD when transcoding, does that count as "diagnosing" it? If not, I don't see how I can meet the encouragement to diagnose the UB, and also meet the encouragement to do the U+FFFD substitutions.And how should I diagnose invalid code units when not transcoding? It would seem strange to throw an exception or abort for POSIX (where no transcoding happens) but silently substitute for U+FFFD on Windows (where transcoding to UTF-16 happens).For Windows, should I write it out with substitutions, then throw/abort?Or should substitutions be done for POSIX too, irrespective of whether transcoding?
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16