C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Fri, 10 Jun 2022 11:29:38 -0400
On Fri, Jun 10, 2022 at 11:16 AM William M. (Mike) Miller <
william.m.miller_at_[hidden]> wrote:

> On Fri, Jun 10, 2022 at 11:05 AM Hubert Tong via Core <
> core_at_[hidden]> wrote:
>
>> I've merged the suggestions (add "physical", use the parenthetical for
>> the non-UTF-8 case, use plural form for designating, have wider-scope
>> implementation-defined wording for non-UTF-8 case that encompasses the
>> permission from the parenthetical):
>>
>
> I'm happy with this, with one exception noted below:
>

Change made; I've also made the parenthetical about end-of-line indicators
into a note:

An implementation shall support physical source files that are a sequence
of UTF-8 code units (UTF-8 source files). It may also support an
implementation-defined set of other kinds of physical source files, and, if
so, the kind of a physical source file is determined in an
implementation-defined manner, which includes a means of designating
physical source files as UTF-8 source files, independent of their content.
[Note: In other words, recognizing the U+FEFF Byte Order Mark is not
sufficient. --end note]

If a physical source file is determined to be a UTF-8 source file, then it
shall be a well-formed UTF-8 code unit sequence and it is decoded to
produce a sequence of UCS scalar values that constitutes the sequence of
elements of the translation character set. For any other kind of physical
source file supported by the implementation, characters are mapped, in an
implementation-defined manner, to a sequence of translation character set
elements. [Note: This can introduce new-line characters for end-of-line
indicators --end note]


>
>
>> An implementation shall support physical source files that are a sequence
>> of UTF-8 code units (UTF-8 source files). It may also support an
>> implementation-defined set of other kinds of physical source files, and, if
>> so, the kind of a physical source file is determined in an
>> implementation-defined manner, which includes a means of designating
>> physical source files as UTF-8 source files, independent of their content.
>> [Note: In other words, recognizing the U+FEFF Byte Order Mark is not
>> sufficient. --end note]
>>
>> If a physical source file is designated or otherwise determined
>>
>
> Per the preceding paragraph, "determined" includes "designated" -
> "designating" is one mechanism for "determining" - so I'd be happier if
> this were shortened to just "...file is determined..."
>
>
>> to be a UTF-8 source file, then it shall be a well-formed UTF-8 code unit
>> sequence and it is decoded to produce a sequence of UCS scalar values that
>> constitutes the sequence of elements of the translation character set. For
>> any other kind of physical source file supported by the implementation,
>> characters are mapped, in an implementation-defined manner, to a sequence
>> of translation character set elements (introducing new-line characters for
>> end-of-line indicators).
>>
>
>

Received on 2022-06-10 15:30:07