Date: Wed, 17 Nov 2021 11:40:45 -0600
The addition of a table of contents and numbered sections would make it
easier to discuss and provide feedback on the paper.
In the "Non-encodable character-literals" section:
* Please add a brief description of what non-encodable characters are.
Be sure to mention both the translation character set and the
dependency on the choice of literal encoding. Provide an example.
* "We believe an implementation should not be able to alter that
meaning". This seems out of context here; the paper has not yet
explained the problem it purports to address. Explain why
substitution characters are a source of problems.
In the "Impact on the standard and implementations" section:
* For the uninitiated, the fact that Clang uses UTF-8 doesn't mean
anything here. The intended point is that the problem of
non-encodable characters doesn't happen when the literal encoding is
UTF-8. Please say that.
* There is no example provided for gcc; just the diagnostic. How would
a reader of the paper reproduce that diagnostic? Does gcc emit an
error or warning by default? Yes, the Compiler Explorer link is
present, but that comes later, so this makes for confusing presentation.
* The MSVC example depends on the choice of literal encoding; no
warning would be emitted when targeting UTF-8.
* The MSVC example is confusing; what is ' ?? ?? ', 00H? Are those
spaces? How do the '?' characters map back to the string? This needs
more explanation of what is happening (so that you can then explain
why it is bad).
In the "Are we removing a capability?" section:
* The section title is confusing. The paper hasn't clearly articulated
at this point what the proposal is.
* "exact nature" => "choice"
* I don't believe the following is true (or if it is, it needs some
explanation): "in general the relying on non-encodable characters to
detect the literal encoding is
non-portable as it can only work on windows". (I don't think I've
seen code that uses non-encodable characters to try to detect the
literal encoding; I have seen code that checks what integer value a
character is mapped to).
* "? can be inserted in string and character literals". What does have
to do with this section? Don't leave your readers wondering how
something relates; draw the picture for them. In this case, I would
just drop this bullet; no one intentionally uses non-encodable
characters to write "?".
* "u8 strings can be used portably". Sure, why is that relevant?
* "If the author of the code does not care about the content of a
string being preserved, then presumably that character can be
removed". This is projecting an intent that the code author probably
would not agree with.
In the "Multi character literals" section:
* "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), read as
single characters". The use of "read" here is confusing. I suggest,
"However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), visually
present as single characters, but might not be represented with a
single code unit (char)".
* "dificile"
* "fcould"
* "into an int. in any sensible way"
In the "Impact on the standard and implementations" section:
* I don't know what is meant by "Unicode" in "No compiler emit a
warning for Unicode in multi-character literals".
In the "Feature macro" section:
* "because the transformation to characters literals and string
literals is not observable by the program". I agree that there is no
need for a feature test macro, but not for this reason. From my
perspective, the lack of a feature test macro is motivated by the
proposal making some currently well-formed code ill-formed and the
lack of introduction of any new syntax.
Wording for Character Literals:
* The struck note needs to be retained for multicharacter literals.
I find the presentation in this paper confusing. In general, I suggest
writing your papers with the following outline in order to guide the
reader through the problem and ultimately to the proposed solution;
maintain a clear separation of concerns between sections.
1. Abstract
2. Problem
3. Motivation
4. Option(s)/Solution(s)
5. Proposal
6. Impact
7. Wording
Tom.
easier to discuss and provide feedback on the paper.
In the "Non-encodable character-literals" section:
* Please add a brief description of what non-encodable characters are.
Be sure to mention both the translation character set and the
dependency on the choice of literal encoding. Provide an example.
* "We believe an implementation should not be able to alter that
meaning". This seems out of context here; the paper has not yet
explained the problem it purports to address. Explain why
substitution characters are a source of problems.
In the "Impact on the standard and implementations" section:
* For the uninitiated, the fact that Clang uses UTF-8 doesn't mean
anything here. The intended point is that the problem of
non-encodable characters doesn't happen when the literal encoding is
UTF-8. Please say that.
* There is no example provided for gcc; just the diagnostic. How would
a reader of the paper reproduce that diagnostic? Does gcc emit an
error or warning by default? Yes, the Compiler Explorer link is
present, but that comes later, so this makes for confusing presentation.
* The MSVC example depends on the choice of literal encoding; no
warning would be emitted when targeting UTF-8.
* The MSVC example is confusing; what is ' ?? ?? ', 00H? Are those
spaces? How do the '?' characters map back to the string? This needs
more explanation of what is happening (so that you can then explain
why it is bad).
In the "Are we removing a capability?" section:
* The section title is confusing. The paper hasn't clearly articulated
at this point what the proposal is.
* "exact nature" => "choice"
* I don't believe the following is true (or if it is, it needs some
explanation): "in general the relying on non-encodable characters to
detect the literal encoding is
non-portable as it can only work on windows". (I don't think I've
seen code that uses non-encodable characters to try to detect the
literal encoding; I have seen code that checks what integer value a
character is mapped to).
* "? can be inserted in string and character literals". What does have
to do with this section? Don't leave your readers wondering how
something relates; draw the picture for them. In this case, I would
just drop this bullet; no one intentionally uses non-encodable
characters to write "?".
* "u8 strings can be used portably". Sure, why is that relevant?
* "If the author of the code does not care about the content of a
string being preserved, then presumably that character can be
removed". This is projecting an intent that the code author probably
would not agree with.
In the "Multi character literals" section:
* "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), read as
single characters". The use of "read" here is confusing. I suggest,
"However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), visually
present as single characters, but might not be represented with a
single code unit (char)".
* "dificile"
* "fcould"
* "into an int. in any sensible way"
In the "Impact on the standard and implementations" section:
* I don't know what is meant by "Unicode" in "No compiler emit a
warning for Unicode in multi-character literals".
In the "Feature macro" section:
* "because the transformation to characters literals and string
literals is not observable by the program". I agree that there is no
need for a feature test macro, but not for this reason. From my
perspective, the lack of a feature test macro is motivated by the
proposal making some currently well-formed code ill-formed and the
lack of introduction of any new syntax.
Wording for Character Literals:
* The struck note needs to be retained for multicharacter literals.
I find the presentation in this paper confusing. In general, I suggest
writing your papers with the following outline in order to guide the
reader through the problem and ultimately to the proposed solution;
maintain a clear separation of concerns between sections.
1. Abstract
2. Problem
3. Motivation
4. Option(s)/Solution(s)
5. Proposal
6. Impact
7. Wording
Tom.
Received on 2021-11-17 11:40:48