On 11/17/21 12:34 PM, Corentin Jabot wrote:



On Wed, Nov 17, 2021 at 6:40 PM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

The addition of a table of contents and numbered sections would make it easier to discuss and provide feedback on the paper.

In the "Non-encodable character-literals" section:

  • Please add a brief description of what non-encodable characters are. Be sure to mention both the translation character set and the dependency on the choice of literal encoding. Provide an example.
  • "We believe an implementation should not be able to alter that meaning". This seems out of context here; the paper has not yet explained the problem it purports to address. Explain why substitution characters are a source of problems.

In the "Impact on the standard and implementations" section:

  • For the uninitiated, the fact that Clang uses UTF-8 doesn't mean anything here. The intended point is that the problem of non-encodable characters doesn't happen when the literal encoding is UTF-8. Please say that.
  • There is no example provided for gcc; just the diagnostic. How would a reader of the paper reproduce that diagnostic? Does gcc emit an error or warning by default? Yes, the Compiler Explorer link is present, but that comes later, so this makes for confusing presentation.
  • The MSVC example depends on the choice of literal encoding; no warning would be emitted when targeting UTF-8.
  • The MSVC example is confusing; what is ' ?? ?? ', 00H? Are those spaces? How do the '?' characters map back to the string? This needs more explanation of what is happening (so that you can then explain why it is bad).

In the "Are we removing a capability?" section:

  • The section title is confusing. The paper hasn't clearly articulated at this point what the proposal is.
  • "exact nature" => "choice"
  • I don't believe the following is true (or if it is, it needs some explanation): "in general the relying on non-encodable characters to detect the literal encoding is
    non-portable as it can only work on windows". (I don't think I've seen code that uses non-encodable characters to try to detect the literal encoding; I have seen code that checks what integer value a character is mapped to).
  • "? can be inserted in string and character literals". What does have to do with this section? Don't leave your readers wondering how something relates; draw the picture for them. In this case, I would just drop this bullet; no one intentionally uses non-encodable characters to write "?".
  • "u8 strings can be used portably". Sure, why is that relevant?
  • "If the author of the code does not care about the content of a string being preserved, then presumably that character can be removed". This is projecting an intent that the code author probably would not agree with.

In the "Multi character literals" section:

  • "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), read as single characters". The use of "read" here is confusing. I suggest, "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), visually present as single characters, but might not be represented with a single code unit (char)".
  • "dificile"
  • "fcould"
  • "into an int. in any sensible way"

In the "Impact on the standard and implementations" section:

  • I don't know what is meant by "Unicode" in "No compiler emit a warning for Unicode in multi-character literals".

In the "Feature macro" section:

  • "because the transformation to characters literals and string literals is not observable by the program". I agree that there is no need for a feature test macro, but not for this reason. From my perspective, the lack of a feature test macro is motivated by the proposal making some currently well-formed code ill-formed and the lack of introduction of any new syntax.

Wording for Character Literals:

  • The struck note needs to be retained for multicharacter literals.

Hey Tom,
Thanks for the feedback.
I will address the typos in a future revision (I hope you understand getting feedback right before the meeting doesn't give me time to address it),
Of course :)
however on this point.
Can you clarify which note and why do you think it needs to be retained?

This one:

[ Note: The associated character encoding for ordinary and wide character literals determines encodability, but does not determine the value of non-encodable ordinary or wide character literals or ordinary or wide multicharacter literals. The examples in [lex.ccon.literal] for non-encodable ordinary and wide character literals assume that the specified character lacks representation in the execution character set or execution wide-character set, respectively, or that encoding it would require more than one code unit. — end note ]

It's non-normative, so I guess "needs to be retained" is not true; the following paragraph states that multicharacter literal's have an implementation-defined value. I was thinking that the note is still helpful to explain why a multicharacter literal has an associated encoding that is not actually used to determine its value, but perhaps that isn't a big deal.

Also, you missed an R2 following Hubert and Jens feedback https://isocpp.org/files/papers/D1854R2.pdf

Thanks, I did see it, but not until after I had started writing the feedback above. I did check what I wrote against it, but missed reflecting that in the email subject.

Tom.


Regards,
Corentin



I find the presentation in this paper confusing. In general, I suggest writing your papers with the following outline in order to guide the reader through the problem and ultimately to the proposed solution; maintain a clear separation of concerns between sections.

  1. Abstract
  2. Problem
  3. Motivation
  4. Option(s)/Solution(s)
  5. Proposal
  6. Impact
  7. Wording

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16