C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: William M. (Mike) Miller <"William>
Date: Thu, 9 Jun 2022 14:41:23 -0400
On Thu, Jun 9, 2022 at 2:02 PM Corentin via Core <core_at_[hidden]>
wrote:

> Mike
> > I prefer option 2, but restoring some of the wording your tweak deleted
> from Hubert's suggestion <http://lists.isocpp.org/core/2022/03/12140.php>:
>
> I remove that sentence because we specify in the next paragraph "If a
> source file is determined to be a UTF-8 source file, then it shall be a
> well-formed UTF-8 code unit sequence" - I'd rather not repeat thinks
> multiple times.
>

The shorter wording in your original message leaves undefined what a "UTF-8
source file" is. The wording I want to see restored provides that
definition: it's a physical source file that consists of a sequence of
UTF-8 code units.

I'm not bothered by the repetition between the first and second paragraphs;
the first says that a UTF-8 source file is a sequence of UTF-8 code units,
and the second adds the further constraint that the sequence must be
well-formed. That seems reasonable to me.

If you wanted to avoid even that level of repetition, you could change
paragraph 2 to say something like

If a source file is determined to be a UTF-8 source file, its code unit
sequence shall be well-formed and its content is decoded...


I have a slight preference for the longer version, though, since it makes
clear that "well-formed" is referring to the UTF-8 requirements.

-- 
William M. (Mike) Miller | Edison Design Group
william.m.miller_at_[hidden]

Received on 2022-06-09 18:41:35