C++ Logo

sg16

Advanced search

Re: [isocpp-core] P2295 Support for UTF-8 as a portable source file encoding

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sat, 11 Jun 2022 17:27:28 -0400
On Sat, Jun 11, 2022 at 4:01 AM Corentin via Core <core_at_[hidden]>
wrote:

> New draft, using that wording, except that I'm not touching the end of
> line indicators, so that we can do that in P2348
> https://isocpp.org/files/papers/D2295R6.pdf
>

The changes look fine to me, including the change to "input".
For the second paragraph, it would be a better transition from the first if
the new sentence instead said:
If the first <ins>translation </ins>character <ins>in the sequence </ins>is
U+FEFF BYTE ORDER MARK, it is deleted.


>
> On Fri, Jun 10, 2022 at 5:32 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 10/06/2022 17.16, William M. (Mike) Miller via Core wrote:
>> > On Fri, Jun 10, 2022 at 11:05 AM Hubert Tong via Core <
>> core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
>> >
>> > I've merged the suggestions (add "physical", use the parenthetical
>> for the non-UTF-8 case, use plural form for designating, have wider-scope
>> implementation-defined wording for non-UTF-8 case that encompasses the
>> permission from the parenthetical):
>> >
>> >
>> > I'm happy with this, with one exception noted below:
>> >
>> >
>> > An implementation shall support physical source files that are a
>> sequence of UTF-8 code units (UTF-8 source files). It may also support an
>> implementation-defined set of other kinds of physical source files, and, if
>> so, the kind of a physical source file is determined in an
>> implementation-defined manner, which includes a means of designating
>> physical source files as UTF-8 source files, independent of their content.
>> [Note: In other words, recognizing the U+FEFF Byte Order Mark is not
>> sufficient. --end note]
>> >
>> > If a physical source file is designated or otherwise determined
>> >
>> >
>> > Per the preceding paragraph, "determined" includes "designated" -
>> "designating" is one mechanism for "determining" - so I'd be happier if
>> this were shortened to just "...file is determined..."
>>
>> Agreed.
>>
>> Jens
>>
>>
>> >
>> > to be a UTF-8 source file, then it shall be a well-formed UTF-8
>> code unit sequence and it is decoded to produce a sequence of UCS scalar
>> values that constitutes the sequence of elements of the translation
>> character set. For any other kind of physical source file supported by the
>> implementation, characters are mapped, in an implementation-defined manner,
>> to a sequence of translation character set elements (introducing new-line
>> characters for end-of-line indicators).
>> >
>> >
>> >
>> > _______________________________________________
>> > Core mailing list
>> > Core_at_[hidden]
>> > Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
>> > Link to this post: http://lists.isocpp.org/core/2022/06/12698.php
>>
>> _______________________________________________
> Core mailing list
> Core_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2022/06/12702.php
>

Received on 2022-06-11 21:27:56