C++ Logo


Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: William M. (Mike) Miller <"William>
Date: Fri, 10 Jun 2022 11:16:00 -0400
On Fri, Jun 10, 2022 at 11:05 AM Hubert Tong via Core <core_at_[hidden]>

> I've merged the suggestions (add "physical", use the parenthetical for the
> non-UTF-8 case, use plural form for designating, have wider-scope
> implementation-defined wording for non-UTF-8 case that encompasses the
> permission from the parenthetical):

I'm happy with this, with one exception noted below:

> An implementation shall support physical source files that are a sequence
> of UTF-8 code units (UTF-8 source files). It may also support an
> implementation-defined set of other kinds of physical source files, and, if
> so, the kind of a physical source file is determined in an
> implementation-defined manner, which includes a means of designating
> physical source files as UTF-8 source files, independent of their content.
> [Note: In other words, recognizing the U+FEFF Byte Order Mark is not
> sufficient. --end note]
> If a physical source file is designated or otherwise determined

Per the preceding paragraph, "determined" includes "designated" -
"designating" is one mechanism for "determining" - so I'd be happier if
this were shortened to just "...file is determined..."

> to be a UTF-8 source file, then it shall be a well-formed UTF-8 code unit
> sequence and it is decoded to produce a sequence of UCS scalar values that
> constitutes the sequence of elements of the translation character set. For
> any other kind of physical source file supported by the implementation,
> characters are mapped, in an implementation-defined manner, to a sequence
> of translation character set elements (introducing new-line characters for
> end-of-line indicators).

Received on 2022-06-10 15:16:12