sg16: Re: [SG16] On the character encoding of diagnostic text

From: Peter Brett <pbrett_at_[hidden]>
Date: Tue, 8 Sep 2020 09:16:41 +0000

Hi Aaron,

Thank you for sharing this updated draft. It looks good to me, with the tweak suggested by JF having been applied.

I think the same approach could be applied successfully to the C++ standard.

Best regards,

Peter

> -----Original Message-----
> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Aaron Ballman via
> SG16
> Sent: 07 September 2020 15:14
> To: Tom Honermann <tom_at_[hidden]>
> Cc: Aaron Ballman <aaron_at_aaronballman.com>; SG16 <sg16_at_lists.isocpp.org>
> Subject: Re: [SG16] On the character encoding of diagnostic text
>
> EXTERNAL MAIL
>
>
> On Thu, Sep 3, 2020 at 10:41 PM Tom Honermann <tom_at_honermann.net> wrote:
> >
> > On 9/3/20 6:59 PM, JF Bastien wrote:
> >
> > On Thu, Sep 3, 2020 at 3:36 PM Tom Honermann via SG16
> <sg16_at_[hidden]> wrote:
> >>
> >> On 9/2/20 5:33 AM, Peter Brett via SG16 wrote:
> >>
> >> Hi all,
> >>
> >>
> >>
> >> We allow Unicode identifiers (if/when P1949 is adopted, UAX31
> identifiers). Implementations will therefore need to have a mechanism for
> communicating those identifiers to the user via their diagnostics. Let us
> assume that such mechanism exists as a necessary implementation detail of
> any reasonable C++ implementation.
> >>
> >>
> >>
> >> By the point at which static_assert() is evaluated, its string argument
> will have already been converted to its associated implementation-defined
> literal encoding. For some implementations, this may be a lossy conversion.
> >>
> >>
> >>
> >> I am wary of mandating specific handling of this in the standard because
> the way in which diagnostics are communicated to the user seems to be
> something that really should be a quality of implementation issue.
> >>
> >>
> >>
> >> If we were to adjust the standard, then the adjustment should not
> preclude constexpr computation of the static_assert message, in anticipation
> of reflection making it possible to format type names into it with constexpr
> std::format.
> >>
> >>
> >>
> >> static_assert(std::is_base_of_v<MyBase, Arg>,
> >> std::format("Cannot my_cast to {} because {} is not derived from
> MyBase",
> >> /* reflection expressions here */));
> >>
> >>
> >>
> >> We must not confuse 2 separate concerns:
> >>
> >>
> >>
> >> Whether implementations correctly process strings from their internal
> representation for display in diagnostic messages
> >> Whether implementations correctly handle situations in which the literal
> encoding and the encoding required for displaying diagnostic messages is
> different.
> >>
> >>
> >>
> >> I am strongly opposed to a solution that restricts static_assert()
> messages to the basic source character set.
> >>
> >> I would also be strongly opposed to a solution that prohibits characters
> outside of the basic source character set, but I think it would be
> reasonable to specify that characters outside the basic source character set
> may be subject to substitution (potentially lossy), presentation in non-
> glyph form (as a UCN), or perhaps even dropped (mildly opposed). Is that a
> view point that you could support?
> >
> >
> > I don't think we should specify what happens to the string. Rather we
> should specify what kind of string literals are accepted (and I'd accept any
> valid string literal).
> >
> > First, what happens to diagnostics is outside the abstract machine, we
> don't legislate that. Second, it's not the source character set nor is it
> the execution one. What I mean by this is that the source character set is
> what the compiler consumes and my editor shows, but diagnostics are what my
> shell shows (that's not the compiler, nor is it the editor), but it can be
> in an IDE. Imagine that I run clang in my favorite PDP-11 shell emulator...
> clang might be nice to check if Unicode is supported and then escape what's
> not supported, but does the Standard need to say anything? Now imagine I
> pipe stderr to /dev/null, have I now made my compiler non-conformant? What
> if I pipe it to a file? I can't see it unless I open the file... is it still
> conformant? What about diagnostics in an IDE, where I only see diagnostics
> for the code currently open in the IDE, the others are "hidden". Say the IDE
> colors the diagnostics, is it still conformant? etc. The Standard doesn't
> care about any of this, it's not useful for us to care, let's not say
> anything. Trying to say something is legislating away implementation
> freedom, let's just trust that implementation aren't adversarial and they're
> actually trying to help users.
> >
> > I agree with all of this, but there does seem to be some genuine interest
> in improving something here. The question I've posed have been intended to
> probe those interests.
> >
> > For reference, since I don't think it has been pointed out yet, here is
> the relevant wording from the C++ standard. What changes would make people
> more happy than the status quo?
> >
> > [intro.compliance]p2.2:
> >
> > If a program contains a violation of any diagnosable rule or an occurrence
> of a construct described in this document as “conditionally-supported”
> when the implementation does not support that construct, a conforming
> implementation shall issue at least one diagnostic message.
> >
> > [dcl.pre]p6:
> >
> > In a static_assert-declaration, the constant-expression shall be a
> contextually converted constant expression of type bool. If the value of
> the expression when so converted is true, the declaration has no effect.
> Otherwise, the program is ill-formed, and the resulting diagnostic message
> ([intro.compliance]) shall include the text of the string-literal, if one is
> supplied, except that characters not in the basic source character set are
> not required to appear in the diagnostic message. [Example:
> > static_assert(sizeof(int) == sizeof(void*), "wrong pointer size");
> > — end example]
> >
> > [dcl.attr.deprecated]p1:
> >
> > The attribute-token deprecated can be used to mark names and entities
> whose use is still allowed, but is discouraged for some reason. [Note: In
> particular, deprecated is appropriate for names and entities that are deemed
> obsolescent or unsafe. — end note] It shall appear at most once in each
> attribute-list. An attribute-argument-clause may be present and, if
> present, it shall have the form:
> > ( string-literal )
> > [Note: The string-literal in the attribute-argument-clause could be used
> to explain the rationale for deprecation and/or to suggest a replacing
> entity. — end note]
> >
> > [dcl.attr.deprecated]p4:
> >
> > Recommended practice: Implementations should use the deprecated attribute
> to produce a diagnostic message in case the program refers to a name or
> entity other than to declare it, after a declaration that specifies the
> attribute. The diagnostic message should include the text provided within
> the attribute-argument-clause of any deprecated attribute applied to the
> name or entity.
> >
> > [dcl.attr.nodiscard]p4:
> >
> > Recommended practice: Appearance of a nodiscard call as a potentially-
> evaluated discarded-value expression ([expr.prop]) is discouraged unless
> explicitly cast to void. Implementations should issue a warning in such
> cases. [Note: This is typically because discarding the return value of a
> nodiscard call has surprising consequences. — end note] The string-literal
> in a nodiscard attribute-argument-clause should be used in the message of
> the warning as the rationale for why the result should not be discarded.
> >
> > [cpp.error]p1:
> >
> > A preprocessing directive of the form
> > # error pp-tokensopt new-line
> > causes the implementation to produce a diagnostic message that includes
> the specified sequence of preprocessing tokens, and renders the program ill-
> formed.
> >
> > At a minimum, there do appear to be opportunities to improve consistency
> in wording here. We have an interesting mix of "shall", "should", "could",
> and ... "causes".
>
> With the exception of "causes", I think the words of power are
> generally correct (in C++). You use "should" in Recommended Practice
> sections to give normative suggestion and shall in Semantic sections
> to give normative requirements, which explains the difference between
> the attributes and the static_assert wording. The C wording had a bit
> of may/should confusion that I am correcting with my paper for that
> committee.
>
> I've attached an updated (rewritten) version of the paper that goes in
> the direction I've understood from SG16 so far. I do still have an
> open question as to whether anyone would like to see #error get some
> words to roll back to an earlier phase of translation like we do for
> _Pragma. There is implementation divergence in this space already with
> an example that may be compelling to some:
> https://urldefense.com/v3/__https://godbolt.org/z/zP44xE__;!!EHscmS1ygiU1lA!
> Taf4Xbrv3foV5Ok7R5Foyq3leiAGWQBaxFScPIlfETI1m3TQkW0AEW5tiqOCSg$
>
> ~Aaron
>
> >
> > Tom.
> >
> >
> >
> >>
> >> Tom.
> >>
> >>
> >>
> >> Best regards,
> >>
> >>
> >>
> >> Peter
> >>
> >>
> >>
> >>
> >>
> >> From: SG16 <sg16-bounces_at_lists.isocpp.org> On Behalf Of Martinho
> Fernandes via SG16
> >> Sent: 01 September 2020 18:16
> >> To: sg16_at_lists.isocpp.org
> >> Cc: Martinho Fernandes <rmf_at_mozilla.com>
> >> Subject: Re: [SG16] On the character encoding of diagnostic text
> >>
> >>
> >>
> >> EXTERNAL MAIL
> >>
> >>
> >>
> >> On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16
> <sg16_at_lists.isocpp.org> wrote:
> >>
> >> On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
> >> <sg16_at_[hidden]> wrote:
> >> >
> >> > For a cross compiler, the basic execution character set should
> correspond to the target platform, but the diagnostics character set should
> be for the host?
> >>
> >> That matches my understanding.
> >>
> >> I suppose a question I could add is whether anyone would like to see a
> >> new character set introduced for diagnostics. My intuition is that it
> >> would be a pretty heavy hammer to bring to bear and that the basic
> >> source character set is probably Good Enough (tm).
> >>
> >>
> >>
> >> Wouldn't these diagnostics be the place people are more likely to use
> non-basic source characters, though? When it comes to identifiers people
> will sometimes compromise and restrict themselves and e.g. avoid diacritics,
> but in error messages I feel like it makes a lot more sense to want to write
> with the full expression of their native script.
> >>
> >>
> >>
> >> --
> >> SG16 mailing list
> >> SG16_at_[hidden]cpp.org
> >>
> https://urldefense.com/v3/__https://lists.isocpp.org/mailman/listinfo.cgi/sg
> 16__;!!EHscmS1ygiU1lA!Taf4Xbrv3foV5Ok7R5Foyq3leiAGWQBaxFScPIlfETI1m3TQkW0AEW
> 56IOg11A$
> >
> >

Received on 2020-09-08 04:20:23