sg16: Re: [SG16] Feedback on P1854R1: Conversion to literal encoding should not lead to loss of, meaning

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 17 Nov 2021 19:34:39 +0100

On Wed, Nov 17, 2021 at 6:40 PM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:

> The addition of a table of contents and numbered sections would make it
> easier to discuss and provide feedback on the paper.
>
> In the "Non-encodable character-literals" section:
>
> - Please add a brief description of what non-encodable characters are.
> Be sure to mention both the translation character set and the dependency on
> the choice of literal encoding. Provide an example.
> - "We believe an implementation should not be able to alter that
> meaning". This seems out of context here; the paper has not yet explained
> the problem it purports to address. Explain why substitution characters are
> a source of problems.
>
> In the "Impact on the standard and implementations" section:
>
> - For the uninitiated, the fact that Clang uses UTF-8 doesn't mean
> anything here. The intended point is that the problem of non-encodable
> characters doesn't happen when the literal encoding is UTF-8. Please say
> that.
> - There is no example provided for gcc; just the diagnostic. How would
> a reader of the paper reproduce that diagnostic? Does gcc emit an error or
> warning by default? Yes, the Compiler Explorer link is present, but that
> comes later, so this makes for confusing presentation.
> - The MSVC example depends on the choice of literal encoding; no
> warning would be emitted when targeting UTF-8.
> - The MSVC example is confusing; what is ' ?? ?? ', 00H? Are those
> spaces? How do the '?' characters map back to the string? This needs more
> explanation of what is happening (so that you can then explain why it is
> bad).
>
> In the "Are we removing a capability?" section:
>
> - The section title is confusing. The paper hasn't clearly articulated
> at this point what the proposal is.
> - "exact nature" => "choice"
> - I don't believe the following is true (or if it is, it needs some
> explanation): "in general the relying on non-encodable characters to detect
> the literal encoding is
> non-portable as it can only work on windows". (I don't think I've seen
> code that uses non-encodable characters to try to detect the literal
> encoding; I have seen code that checks what integer value a character is
> mapped to).
> - "? can be inserted in string and character literals". What does have
> to do with this section? Don't leave your readers wondering how something
> relates; draw the picture for them. In this case, I would just drop this
> bullet; no one intentionally uses non-encodable characters to write "?".
> - "u8 strings can be used portably". Sure, why is that relevant?
> - "If the author of the code does not care about the content of a
> string being preserved, then presumably that character can be removed".
> This is projecting an intent that the code author probably would not agree
> with.
>
> In the "Multi character literals" section:
>
> - "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), read as
> single characters". The use of "read" here is confusing. I suggest,
> "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster), visually present
> as single characters, but might not be represented with a single code unit
> (char)".
> - "dificile"
> - "fcould"
> - "into an int. in any sensible way"
>
> In the "Impact on the standard and implementations" section:
>
> - I don't know what is meant by "Unicode" in "No compiler emit a
> warning for Unicode in multi-character literals".
>
> In the "Feature macro" section:
>
> - "because the transformation to characters literals and string
> literals is not observable by the program". I agree that there is no need
> for a feature test macro, but not for this reason. From my perspective, the
> lack of a feature test macro is motivated by the proposal making some
> currently well-formed code ill-formed and the lack of introduction of any
> new syntax.
>
> Wording for Character Literals:
>
> - The struck note needs to be retained for multicharacter literals.
>
>
Hey Tom,
Thanks for the feedback.
I will address the typos in a future revision (I hope you understand
getting feedback right before the meeting doesn't give me time to address
it), however on this point.
Can you clarify which note and why do you think it needs to be retained?
Also, you missed an R2 following Hubert and Jens feedback
https://isocpp.org/files/papers/D1854R2.pdf

Regards,
Corentin

> -
>
> I find the presentation in this paper confusing. In general, I suggest
> writing your papers with the following outline in order to guide the reader
> through the problem and ultimately to the proposed solution; maintain a
> clear separation of concerns between sections.
>
> 1. Abstract
> 2. Problem
> 3. Motivation
> 4. Option(s)/Solution(s)
> 5. Proposal
> 6. Impact
> 7. Wording
>
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2021-11-17 12:34:52