Hi Tom,


Having reviewed the paper, I’m struggling to understand how most of those concerns are pertinent to discussing P2178R1 proposal 1.

This is possibly about being clear in terms of proactively dealing with assumptions that non-committee members may have when they hear the "UTF-8" source is "completely okay" for C++. Tom's questions would point out all sorts of caveats. For example, the compiler might say that UTF-8 source is supported with a flag that causes all file processing for that invocation to need UTF-8 source. This is going to cause problems for header inclusion.

There's a motivation in terms of user benefit for requiring support to consume UTF-8 encoded source. These questions are pertinent to ensuring that the benefits are actually realized.

Hubert's perception is exactly right.  I'm raising these questions because I believe more analysis is needed before we proceed in any specific direction.  Proposal 1 lacks sufficient analysis to inform direction other than to say, "we want to support UTF-8".  That is ok; proposal 1 clearly wasn't intended as a final proposal as written.  It is clear to me that we have consensus for support of UTF-8, but there are devils lurking in the details that we have yet to exorcise.  The discussion tomorrow will focus on those devils.



I’m not adverse to talking about them, because they are important and need to be addressed at some point, but it feels like giving them the attention that they deserve would not leave time for discussing P2194R0.

That is possible.  I am ok with not getting to P2194 tomorrow, or only getting a start on it, if the UTF-8 discussion is productive.  If it becomes unproductive, we'll switch.


Please could we consider scheduling a discussion of these points for another meeting when your draft paper is ready to discuss in detail?

My goal is that the discussion will help to inform further development of that draft paper and to attract collaborators.  Just as P2194 is the evolution of P2178 proposal 9, I hope that draft will become the evolution of proposal 1.  That implies that it must reflect consensus opinions as well as dissenting ones and offer a choice of options to present to EWG.



This is your friendly reminder that an SG16 telecon will be held tomorrow, Wednesday September 9th, at 19:30 UTC (timezone conversion).

This meeting will be conducted via Zoom.  To attend, visit https://iso.zoom.us/j/8414530059 at the start of the meeting.  Please contact me privately if necessary for the meeting password.

The agenda is:

    • Discuss proposal 1: Mandating support for UTF-8 encoded source files in phase 1

For the UTF-8 discussion, please take some time ahead of the meeting to consider the following concerns:

  • Migration strategies for non-UTF-8 projects to transition to UTF-8, possibly incrementally.
  • Migration strategies for implementors to transition system headers to UTF-8, possibly incrementally.
  • Support for differently encoded source files within a single translation unit.
  • Support for differently encoded primary source file within a single project.
  • Error handling for ill-formed UTF-8 sequences in each of:
    • Comments
    • String literals
    • Elsewhere.
  • Handling of BOMs.
  • Whether an in-source encoding annotation is needed and what form is should take:
    • A magic comment (like Python)
    • A pragma directive (like xlC)

A very rough draft of a paper discussing these concerns is available at https://rawgit.com/tahonermann/sg16/master/papers/dyyyyr0-utf-8-source-files.html.  We will *not* discuss this paper at this meeting, but the Existing Practice section may be informative (please ignore the rest of the draft for now).

No decisions will be made at this meeting, but direction polls are expected.


