On Tue, 8 Sep 2020 at 17:18, Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

This is your friendly reminder that an SG16 telecon will be held tomorrow, Wednesday September 9th, at 19:30 UTC (timezone conversion).

This meeting will be conducted via Zoom. To attend, visit https://iso.zoom.us/j/8414530059 at the start of the meeting. Please contact me privately if necessary for the meeting password.

The agenda is:

P2178R1: Misc lexing and string handling improvements

Discuss proposal 1: Mandating support for UTF-8 encoded source files in phase 1

P2194R0: The character set of C++ source code is Unicode

For the UTF-8 discussion, please take some time ahead of the meeting to consider the following concerns:

Migration strategies for non-UTF-8 projects to transition to UTF-8, possibly incrementally.

We are not suggesting a change that would require any migration to any existing code

Migration strategies for implementors to transition system headers to UTF-8, possibly incrementally.

Ditto

Support for differently encoded source files within a single translation unit.

P2178R1 is not proposing anything that would relate to that

Support for differently encoded primary source file within a single project.

P2178R1 would have no bearing on that (notably because the internal representation is not observable by the program)

Error handling for ill-formed UTF-8 sequences in each of:

Comments

String literals

Elsewhere.

P2178R1 proposals 1 stand clear of that issue but it is definitively worth discussing

Handling of BOMs.

Whether an in-source encoding annotation is needed and what form is should take:

A magic comment (like Python)

A pragma directive (like xlC)

This is an orthogonal concern (one which Tom is working on, and which is worth discussing but falls out of scope of P2178)

Current status (C++20): the encoding of the source file is picked in an implementation defined manner from an implementation defined set of encoding

Proposed by P2178: The encoding of the source file is picked in an implementation defined manner from an implementation defined set of encoding which includes UTF-8. No further restriction is added to the set of implementation defined encodings or the implementation defined mechanism determining the encoding. There just must be _some_ documented way for the compiler to interpret files as UTF-8.

That's it.

A very rough draft of a paper discussing these concerns is available at https://rawgit.com/tahonermann/sg16/master/papers/dyyyyr0-utf-8-source-files.html. We will *not* discuss this paper at this meeting, but the Existing Practice section may be informative (please ignore the rest of the draft for now).

No decisions will be made at this meeting, but direction polls are expected.

Tom.
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16