sg16: [SG16] Feedback on P1854R1: Conversion to literal encoding should not lead to loss of, meaning

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 17 Nov 2021 11:40:45 -0600

The addition of a table of contents and numbered sections would make it
easier to discuss and provide feedback on the paper.

In the "Non-encodable character-literals" section:

  * Please add a brief description of what non-encodable characters are.
    Be sure to mention both the translation character set and the
    dependency on the choice of literal encoding. Provide an example.
  * "We believe an implementation should not be able to alter that
    meaning". This seems out of context here; the paper has not yet
    explained the problem it purports to address. Explain why
    substitution characters are a source of problems.

In the "Impact on the standard and implementations" section:

  * For the uninitiated, the fact that Clang uses UTF-8 doesn't mean
    anything here. The intended point is that the problem of
    non-encodable characters doesn't happen when the literal encoding is
    UTF-8. Please say that.
  * There is no example provided for gcc; just the diagnostic. How would
    a reader of the paper reproduce that diagnostic? Does gcc emit an
    error or warning by default? Yes, the Compiler Explorer link is
    present, but that comes later, so this makes for confusing presentation.
  * The MSVC example depends on the choice of literal encoding; no
    warning would be emitted when targeting UTF-8.
  * The MSVC example is confusing; what is ' ?? ?? ', 00H? Are those
    spaces? How do the '?' characters map back to the string? This needs
    more explanation of what is happening (so that you can then explain
    why it is bad).

In the "Are we removing a capability?" section:

  * The section title is confusing. The paper hasn't clearly articulated
    at this point what the proposal is.
  * "exact nature" => "choice"
  * I don't believe the following is true (or if it is, it needs some
    explanation): "in general the relying on non-encodable characters to
    detect the literal encoding is
    non-portable as it can only work on windows". (I don't think I've
    seen code that uses non-encodable characters to try to detect the
    literal encoding; I have seen code that checks what integer value a
    character is mapped to).
  * "? can be inserted in string and character literals". What does have
    to do with this section? Don't leave your readers wondering how
    something relates; draw the picture for them. In this case, I would
    just drop this bullet; no one intentionally uses non-encodable
    characters to write "?".
  * "u8 strings can be used portably". Sure, why is that relevant?
  * "If the author of the code does not care about the content of a
    string being preserved, then presumably that character can be
    removed". This is projecting an intent that the code author probably
    would not agree with.

In the "Multi character literals" section:

  * "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), read as
    single characters". The use of "read" here is confusing. I suggest,
    "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), visually
    present as single characters, but might not be represented with a
    single code unit (char)".
  * "dificile"
  * "fcould"
  * "into an int. in any sensible way"

In the "Impact on the standard and implementations" section:

  * I don't know what is meant by "Unicode" in "No compiler emit a
    warning for Unicode in multi-character literals".

In the "Feature macro" section:

  * "because the transformation to characters literals and string
    literals is not observable by the program". I agree that there is no
    need for a feature test macro, but not for this reason. From my
    perspective, the lack of a feature test macro is motivated by the
    proposal making some currently well-formed code ill-formed and the
    lack of introduction of any new syntax.

Wording for Character Literals:

  * The struck note needs to be retained for multicharacter literals.

I find the presentation in this paper confusing. In general, I suggest
writing your papers with the following outline in order to guide the
reader through the problem and ultimately to the proposed solution;
maintain a clear separation of concerns between sections.

1. Abstract
2. Problem
3. Motivation
4. Option(s)/Solution(s)
5. Proposal
6. Impact
7. Wording

Tom.

Received on 2021-11-17 11:40:48