C++ Logo

sg16

Advanced search

Re: [SG16] Feedback on P1854R1: Conversion to literal encoding should not lead to loss of, meaning

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 17 Nov 2021 14:11:48 -0500
On 11/17/21 12:34 PM, Corentin Jabot wrote:
>
>
>
> On Wed, Nov 17, 2021 at 6:40 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> The addition of a table of contents and numbered sections would
> make it easier to discuss and provide feedback on the paper.
>
> In the "Non-encodable character-literals" section:
>
> * Please add a brief description of what non-encodable
> characters are. Be sure to mention both the translation
> character set and the dependency on the choice of literal
> encoding. Provide an example.
> * "We believe an implementation should not be able to alter that
> meaning". This seems out of context here; the paper has not
> yet explained the problem it purports to address. Explain why
> substitution characters are a source of problems.
>
> In the "Impact on the standard and implementations" section:
>
> * For the uninitiated, the fact that Clang uses UTF-8 doesn't
> mean anything here. The intended point is that the problem of
> non-encodable characters doesn't happen when the literal
> encoding is UTF-8. Please say that.
> * There is no example provided for gcc; just the diagnostic. How
> would a reader of the paper reproduce that diagnostic? Does
> gcc emit an error or warning by default? Yes, the Compiler
> Explorer link is present, but that comes later, so this makes
> for confusing presentation.
> * The MSVC example depends on the choice of literal encoding; no
> warning would be emitted when targeting UTF-8.
> * The MSVC example is confusing; what is ' ?? ?? ', 00H? Are
> those spaces? How do the '?' characters map back to the
> string? This needs more explanation of what is happening (so
> that you can then explain why it is bad).
>
> In the "Are we removing a capability?" section:
>
> * The section title is confusing. The paper hasn't clearly
> articulated at this point what the proposal is.
> * "exact nature" => "choice"
> * I don't believe the following is true (or if it is, it needs
> some explanation): "in general the relying on non-encodable
> characters to detect the literal encoding is
> non-portable as it can only work on windows". (I don't think
> I've seen code that uses non-encodable characters to try to
> detect the literal encoding; I have seen code that checks what
> integer value a character is mapped to).
> * "? can be inserted in string and character literals". What
> does have to do with this section? Don't leave your readers
> wondering how something relates; draw the picture for them. In
> this case, I would just drop this bullet; no one intentionally
> uses non-encodable characters to write "?".
> * "u8 strings can be used portably". Sure, why is that relevant?
> * "If the author of the code does not care about the content of
> a string being preserved, then presumably that character can
> be removed". This is projecting an intent that the code author
> probably would not agree with.
>
> In the "Multi character literals" section:
>
> * "However, ’é’ (e, ACUTE ACCENT) or ’ ’ (grapheme cluster),
> read as single characters". The use of "read" here is
> confusing. I suggest, "However, ’é’ (e, ACUTE ACCENT) or ’ ’
> (grapheme cluster), visually present as single characters, but
> might not be represented with a single code unit (char)".
> * "dificile"
> * "fcould"
> * "into an int. in any sensible way"
>
> In the "Impact on the standard and implementations" section:
>
> * I don't know what is meant by "Unicode" in "No compiler emit a
> warning for Unicode in multi-character literals".
>
> In the "Feature macro" section:
>
> * "because the transformation to characters literals and string
> literals is not observable by the program". I agree that there
> is no need for a feature test macro, but not for this reason.
> From my perspective, the lack of a feature test macro is
> motivated by the proposal making some currently well-formed
> code ill-formed and the lack of introduction of any new syntax.
>
> Wording for Character Literals:
>
> * The struck note needs to be retained for multicharacter literals.
>
>
> Hey Tom,
> Thanks for the feedback.
> I will address the typos in a future revision (I hope you understand
> getting feedback right before the meeting doesn't give me time to
> address it),
Of course :)
> however on this point.
> Can you clarify which note and why do you think it needs to be retained?

This one:

[ Note: The associated character encoding for ordinary and wide
character literals determines encodability, but does not determine the
value of non-encodable ordinary or wide character literals or ordinary
or wide multicharacter literals. The examples in [lex.ccon.literal] for
non-encodable ordinary and wide character literals assume that the
specified character lacks representation in the execution character set
or execution wide-character set, respectively, or that encoding it would
require more than one code unit. — end note ]

It's non-normative, so I guess "needs to be retained" is not true; the
following paragraph states that multicharacter literal's have an
implementation-defined value. I was thinking that the note is still
helpful to explain why a multicharacter literal has an associated
encoding that is not actually used to determine its value, but perhaps
that isn't a big deal.

> Also, you missed an R2 following Hubert and Jens feedback
> https://isocpp.org/files/papers/D1854R2.pdf
> <https://isocpp.org/files/papers/D1854R2.pdf>

Thanks, I did see it, but not until after I had started writing the
feedback above. I did check what I wrote against it, but missed
reflecting that in the email subject.

Tom.

>
> Regards,
> Corentin
>
>
> *
>
>
>
> I find the presentation in this paper confusing. In general, I
> suggest writing your papers with the following outline in order to
> guide the reader through the problem and ultimately to the
> proposed solution; maintain a clear separation of concerns between
> sections.
>
> 1. Abstract
> 2. Problem
> 3. Motivation
> 4. Option(s)/Solution(s)
> 5. Proposal
> 6. Impact
> 7. Wording
>
> Tom.
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
> <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>
>


Received on 2021-11-17 13:11:52