C++ Logo

sg16

Advanced search

Re: [SG16] On the character encoding of diagnostic text

From: JF Bastien <cxx_at_[hidden]>
Date: Mon, 7 Sep 2020 07:58:10 -0700
Looks good. Minor suggestion:

> Otherwise, the constraint is violated and the implementation shall
produce a diagnostic message that includes the text of the string literal,
if present.

The “shall” is important here because it’s about “you shall diagnose”, but
I think you could have a “should” after to legislate on content. Say:

> shall produce a diagnostic message which should includes the text


On Mon, Sep 7, 2020 at 7:14 AM Aaron Ballman <aaron_at_[hidden]> wrote:

> On Thu, Sep 3, 2020 at 10:41 PM Tom Honermann <tom_at_[hidden]> wrote:
>
> >
>
> > On 9/3/20 6:59 PM, JF Bastien wrote:
>
> >
>
> > On Thu, Sep 3, 2020 at 3:36 PM Tom Honermann via SG16 <
> sg16_at_[hidden]> wrote:
>
> >>
>
> >> On 9/2/20 5:33 AM, Peter Brett via SG16 wrote:
>
> >>
>
> >> Hi all,
>
> >>
>
> >>
>
> >>
>
> >> We allow Unicode identifiers (if/when P1949 is adopted, UAX31
> identifiers). Implementations will therefore need to have a mechanism for
> communicating those identifiers to the user via their diagnostics. Let us
> assume that such mechanism exists as a necessary implementation detail of
> any reasonable C++ implementation.
>
> >>
>
> >>
>
> >>
>
> >> By the point at which static_assert() is evaluated, its string argument
> will have already been converted to its associated implementation-defined
> literal encoding. For some implementations, this may be a lossy conversion.
>
> >>
>
> >>
>
> >>
>
> >> I am wary of mandating specific handling of this in the standard
> because the way in which diagnostics are communicated to the user seems to
> be something that really should be a quality of implementation issue.
>
> >>
>
> >>
>
> >>
>
> >> If we were to adjust the standard, then the adjustment should not
> preclude constexpr computation of the static_assert message, in
> anticipation of reflection making it possible to format type names into it
> with constexpr std::format.
>
> >>
>
> >>
>
> >>
>
> >> static_assert(std::is_base_of_v<MyBase, Arg>,
>
> >> std::format("Cannot my_cast to {} because {} is not derived from
> MyBase",
>
> >> /* reflection expressions here */));
>
> >>
>
> >>
>
> >>
>
> >> We must not confuse 2 separate concerns:
>
> >>
>
> >>
>
> >>
>
> >> Whether implementations correctly process strings from their internal
> representation for display in diagnostic messages
>
> >> Whether implementations correctly handle situations in which the
> literal encoding and the encoding required for displaying diagnostic
> messages is different.
>
> >>
>
> >>
>
> >>
>
> >> I am strongly opposed to a solution that restricts static_assert()
> messages to the basic source character set.
>
> >>
>
> >> I would also be strongly opposed to a solution that prohibits
> characters outside of the basic source character set, but I think it would
> be reasonable to specify that characters outside the basic source character
> set may be subject to substitution (potentially lossy), presentation in
> non-glyph form (as a UCN), or perhaps even dropped (mildly opposed). Is
> that a view point that you could support?
>
> >
>
> >
>
> > I don't think we should specify what happens to the string. Rather we
> should specify what kind of string literals are accepted (and I'd accept
> any valid string literal).
>
> >
>
> > First, what happens to diagnostics is outside the abstract machine, we
> don't legislate that. Second, it's not the source character set nor is it
> the execution one. What I mean by this is that the source character set is
> what the compiler consumes and my editor shows, but diagnostics are what my
> shell shows (that's not the compiler, nor is it the editor), but it can be
> in an IDE. Imagine that I run clang in my favorite PDP-11 shell emulator...
> clang might be nice to check if Unicode is supported and then escape what's
> not supported, but does the Standard need to say anything? Now imagine I
> pipe stderr to /dev/null, have I now made my compiler non-conformant? What
> if I pipe it to a file? I can't see it unless I open the file... is it
> still conformant? What about diagnostics in an IDE, where I only see
> diagnostics for the code currently open in the IDE, the others are
> "hidden". Say the IDE colors the diagnostics, is it still conformant? etc.
> The Standard doesn't care about any of this, it's not useful for us to
> care, let's not say anything. Trying to say something is legislating away
> implementation freedom, let's just trust that implementation aren't
> adversarial and they're actually trying to help users.
>
> >
>
> > I agree with all of this, but there does seem to be some genuine
> interest in improving something here. The question I've posed have been
> intended to probe those interests.
>
> >
>
> > For reference, since I don't think it has been pointed out yet, here is
> the relevant wording from the C++ standard. What changes would make people
> more happy than the status quo?
>
> >
>
> > [intro.compliance]p2.2:
>
> >
>
> > If a program contains a violation of any diagnosable rule or an
> occurrence of a construct described in this document as
> “conditionally-supported” when the implementation does not support that
> construct, a conforming implementation shall issue at least one diagnostic
> message.
>
> >
>
> > [dcl.pre]p6:
>
> >
>
> > In a static_assert-declaration, the constant-expression shall be a
> contextually converted constant expression of type bool. If the value of
> the expression when so converted is true, the declaration has no effect.
> Otherwise, the program is ill-formed, and the resulting diagnostic message
> ([intro.compliance]) shall include the text of the string-literal, if one
> is supplied, except that characters not in the basic source character set
> are not required to appear in the diagnostic message. [Example:
>
> > static_assert(sizeof(int) == sizeof(void*), "wrong pointer size");
>
> > — end example]
>
> >
>
> > [dcl.attr.deprecated]p1:
>
> >
>
> > The attribute-token deprecated can be used to mark names and entities
> whose use is still allowed, but is discouraged for some reason. [Note: In
> particular, deprecated is appropriate for names and entities that are
> deemed obsolescent or unsafe. — end note] It shall appear at most once in
> each attribute-list. An attribute-argument-clause may be present and, if
> present, it shall have the form:
>
> > ( string-literal )
>
> > [Note: The string-literal in the attribute-argument-clause could be used
> to explain the rationale for deprecation and/or to suggest a replacing
> entity. — end note]
>
> >
>
> > [dcl.attr.deprecated]p4:
>
> >
>
> > Recommended practice: Implementations should use the deprecated
> attribute to produce a diagnostic message in case the program refers to a
> name or entity other than to declare it, after a declaration that specifies
> the attribute. The diagnostic message should include the text provided
> within the attribute-argument-clause of any deprecated attribute applied to
> the name or entity.
>
> >
>
> > [dcl.attr.nodiscard]p4:
>
> >
>
> > Recommended practice: Appearance of a nodiscard call as a
> potentially-evaluated discarded-value expression ([expr.prop]) is
> discouraged unless explicitly cast to void. Implementations should issue a
> warning in such cases. [Note: This is typically because discarding the
> return value of a nodiscard call has surprising consequences. — end note]
> The string-literal in a nodiscard attribute-argument-clause should be used
> in the message of the warning as the rationale for why the result should
> not be discarded.
>
> >
>
> > [cpp.error]p1:
>
> >
>
> > A preprocessing directive of the form
>
> > # error pp-tokensopt new-line
>
> > causes the implementation to produce a diagnostic message that includes
> the specified sequence of preprocessing tokens, and renders the program
> ill-formed.
>
> >
>
> > At a minimum, there do appear to be opportunities to improve consistency
> in wording here. We have an interesting mix of "shall", "should", "could",
> and ... "causes".
>
>
>
> With the exception of "causes", I think the words of power are
>
> generally correct (in C++). You use "should" in Recommended Practice
>
> sections to give normative suggestion and shall in Semantic sections
>
> to give normative requirements, which explains the difference between
>
> the attributes and the static_assert wording. The C wording had a bit
>
> of may/should confusion that I am correcting with my paper for that
>
> committee.
>
>
>
> I've attached an updated (rewritten) version of the paper that goes in
>
> the direction I've understood from SG16 so far. I do still have an
>
> open question as to whether anyone would like to see #error get some
>
> words to roll back to an earlier phase of translation like we do for
>
> _Pragma. There is implementation divergence in this space already with
>
> an example that may be compelling to some:
>
> https://godbolt.org/z/zP44xE
>
>
>
> ~Aaron
>
>
>
> >
>
> > Tom.
>
> >
>
> >
>
> >
>
> >>
>
> >> Tom.
>
> >>
>
> >>
>
> >>
>
> >> Best regards,
>
> >>
>
> >>
>
> >>
>
> >> Peter
>
> >>
>
> >>
>
> >>
>
> >>
>
> >>
>
> >> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Martinho
> Fernandes via SG16
>
> >> Sent: 01 September 2020 18:16
>
> >> To: sg16_at_[hidden]
>
> >> Cc: Martinho Fernandes <rmf_at_[hidden]>
>
> >> Subject: Re: [SG16] On the character encoding of diagnostic text
>
> >>
>
> >>
>
> >>
>
> >> EXTERNAL MAIL
>
> >>
>
> >>
>
> >>
>
> >> On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16 <
> sg16_at_[hidden]> wrote:
>
> >>
>
> >> On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
>
> >> <sg16_at_[hidden]> wrote:
>
> >> >
>
> >> > For a cross compiler, the basic execution character set should
> correspond to the target platform, but the diagnostics character set should
> be for the host?
>
> >>
>
> >> That matches my understanding.
>
> >>
>
> >> I suppose a question I could add is whether anyone would like to see a
>
> >> new character set introduced for diagnostics. My intuition is that it
>
> >> would be a pretty heavy hammer to bring to bear and that the basic
>
> >> source character set is probably Good Enough (tm).
>
> >>
>
> >>
>
> >>
>
> >> Wouldn't these diagnostics be the place people are more likely to use
> non-basic source characters, though? When it comes to identifiers people
> will sometimes compromise and restrict themselves and e.g. avoid
> diacritics, but in error messages I feel like it makes a lot more sense to
> want to write with the full expression of their native script.
>
> >>
>
> >>
>
> >>
>
> >> --
>
> >> SG16 mailing list
>
> >> SG16_at_[hidden]
>
> >> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> >
>
> >
>
>

Received on 2020-09-07 10:01:52