On Wed, Nov 17, 2021 at 6:40 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

The addition of a table of contents and numbered sections would make it easier to discuss and provide feedback on the paper.

In the "Non-encodable character-literals" section:

Please add a brief description of what non-encodable characters are. Be sure to mention both the translation character set and the dependency on the choice of literal encoding. Provide an example.

"We believe an implementation should not be able to alter that meaning". This seems out of context here; the paper has not yet explained the problem it purports to address. Explain why substitution characters are a source of problems.

In the "Impact on the standard and implementations" section:

For the uninitiated, the fact that Clang uses UTF-8 doesn't mean anything here. The intended point is that the problem of non-encodable characters doesn't happen when the literal encoding is UTF-8. Please say that.

There is no example provided for gcc; just the diagnostic. How would a reader of the paper reproduce that diagnostic? Does gcc emit an error or warning by default? Yes, the Compiler Explorer link is present, but that comes later, so this makes for confusing presentation.

The MSVC example depends on the choice of literal encoding; no warning would be emitted when targeting UTF-8.

The MSVC example is confusing; what is ' ?? ?? ', 00H? Are those spaces? How do the '?' characters map back to the string? This needs more explanation of what is happening (so that you can then explain why it is bad).

In the "Are we removing a capability?" section:

The section title is confusing. The paper hasn't clearly articulated at this point what the proposal is.

"exact nature" => "choice"

I don't believe the following is true (or if it is, it needs some explanation): "in general the relying on non-encodable characters to detect the literal encoding is
non-portable as it can only work on windows". (I don't think I've seen code that uses non-encodable characters to try to detect the literal encoding; I have seen code that checks what integer value a character is mapped to).

"? can be inserted in string and character literals". What does have to do with this section? Don't leave your readers wondering how something relates; draw the picture for them. In this case, I would just drop this bullet; no one intentionally uses non-encodable characters to write "?".

"u8 strings can be used portably". Sure, why is that relevant?

"If the author of the code does not care about the content of a string being preserved, then presumably that character can be removed". This is projecting an intent that the code author probably would not agree with.

In the "Multi character literals" section:

"However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), read as single characters". The use of "read" here is confusing. I suggest, "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), visually present as single characters, but might not be represented with a single code unit (char)".

"dificile"

"fcould"

"into an int. in any sensible way"

In the "Impact on the standard and implementations" section:

I don't know what is meant by "Unicode" in "No compiler emit a warning for Unicode in multi-character literals".

In the "Feature macro" section:

"because the transformation to characters literals and string literals is not observable by the program". I agree that there is no need for a feature test macro, but not for this reason. From my perspective, the lack of a feature test macro is motivated by the proposal making some currently well-formed code ill-formed and the lack of introduction of any new syntax.

Wording for Character Literals:

The struck note needs to be retained for multicharacter literals.

Hey Tom,

Thanks for the feedback.

I will address the typos in a future revision (I hope you understand getting feedback right before the meeting doesn't give me time to address it),