Date: Thu, 3 Sep 2020 22:41:45 -0400
On 9/3/20 6:59 PM, JF Bastien wrote:
>
>
> On Thu, Sep 3, 2020 at 3:36 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 9/2/20 5:33 AM, Peter Brett via SG16 wrote:
>>
>> Hi all,
>>
>> We allow Unicode identifiers (if/when P1949 is adopted, UAX31
>> identifiers). Implementations will therefore need to have a
>> mechanism for communicating those identifiers to the user via
>> their diagnostics. Let us assume that such mechanism exists as a
>> necessary implementation detail of any reasonable C++ implementation.
>>
>> By the point at which static_assert() is evaluated, its string
>> argument will have already been converted to its associated
>> implementation-defined literal encoding. For some
>> implementations, this may be a lossy conversion.
>>
>> I am wary of mandating specific handling of this in the standard
>> because the way in which diagnostics are communicated to the user
>> seems to be something that really should be a quality of
>> implementation issue.
>>
>> If we were to adjust the standard, then the adjustment should not
>> preclude constexpr computation of the static_assert message, in
>> anticipation of reflection making it possible to format type
>> names into it with constexpr std::format.
>>
>> static_assert(std::is_base_of_v<MyBase, Arg>,
>> std::format("Cannot my_cast to {} because {} is not derived
>> from MyBase",
>> /* reflection expressions here */));
>>
>> We must not confuse 2 separate concerns:
>>
>> 1. Whether implementations correctly process strings from their
>> internal representation for display in diagnostic messages
>> 2. Whether implementations correctly handle situations in which
>> the literal encoding and the encoding required for displaying
>> diagnostic messages is different.
>>
>> I am strongly opposed to a solution that restricts
>> static_assert() messages to the basic source character set.
>>
> I would also be strongly opposed to a solution that prohibits
> characters outside of the basic source character set, but I think
> it would be reasonable to specify that characters outside the
> basic source character set may be subject to substitution
> (potentially lossy), presentation in non-glyph form (as a UCN), or
> perhaps even dropped (mildly opposed). Is that a view point that
> you could support?
>
>
> I don't think we should specify what happens to the string. Rather we
> should specify what kind of string literals are accepted (and I'd
> accept any valid string literal).
>
> First, what happens to diagnostics is outside the abstract machine, we
> don't legislate that. Second, it's not the source character set nor is
> it the execution one. What I mean by this is that the source character
> set is what the compiler consumes and my editor shows, but diagnostics
> are what my shell shows (that's not the compiler, nor is it the
> editor), but it can be in an IDE. Imagine that I run clang in my
> favorite PDP-11 shell emulator... clang might be nice to check if
> Unicode is supported and then escape what's not supported, but does
> the Standard need to say anything? Now imagine I pipe stderr to
> /dev/null, have I now made my compiler non-conformant? What if I pipe
> it to a file? I can't see it unless I open the file... is it still
> conformant? What about diagnostics in an IDE, where I only see
> diagnostics for the code currently open in the IDE, the others are
> "hidden". Say the IDE colors the diagnostics, is it still conformant?
> etc. The Standard doesn't care about any of this, it's not useful for
> us to care, let's not say anything. Trying to say something is
> legislating away implementation freedom, let's just trust that
> implementation aren't adversarial and they're actually trying to help
> users.
I agree with all of this, but there does seem to be some genuine
interest in improving something here. The question I've posed have been
intended to probe those interests.
For reference, since I don't think it has been pointed out yet, here is
the relevant wording from the C++ standard. What changes would make
people more happy than the status quo?
[intro.compliance]p2.2 <http://eel.is/c++draft/intro.compliance#2.2>:
If a program contains a violation of any diagnosable rule or an
occurrence of a construct described in this document as
“conditionally-supported” when the implementation does not support
that construct, *a conforming implementation shall issue at least
one diagnostic message*.
[dcl.pre]p6 <http://eel.is/c++draft/dcl.dcl#dcl.pre-6>:
In a static_assert-declaration, the constant-expression shall be a
contextually converted constant expression of type bool. If the
value of the expression when so converted is true, the declaration
has no effect. *Otherwise, the program is ill-formed, and the
resulting diagnostic message (**[intro.compliance]
<http://eel.is/c++draft/intro.compliance>**) shall include the text
of the string-literal, if one is supplied, except that characters
not in the basic source character set are not required to appear in
the diagnostic message*. [Example:
static_assert(sizeof(int) == sizeof(void*), "wrong pointer size");
— end example]
[dcl.attr.deprecated]p1 <http://eel.is/c++draft/dcl.attr.deprecated#1>:
The attribute-token deprecated can be used to mark names and
entities whose use is still allowed, but is discouraged for some
reason. [Note: In particular, deprecated is appropriate for names
and entities that are deemed obsolescent or unsafe. — end note] It
shall appear at most once in each attribute-list. An
attribute-argument-clause may be present and, if present, it shall
have the form:
( string-literal )
*[Note: The string-literal in the attribute-argument-clause could be
used to explain the rationale for deprecation and/or to suggest a
replacing entity. — end note]*
[dcl.attr.deprecated]p4 <http://eel.is/c++draft/dcl.attr.deprecated#4>:
/Recommended practice:/ Implementations should use the deprecated
attribute to produce a diagnostic message in case the program refers
to a name or entity other than to declare it, after a declaration
that specifies the attribute. *The diagnostic message should include
the text provided within the attribute-argument-clause of any
**deprecated**attribute applied to the name or entity*.
[dcl.attr.nodiscard]p4 <http://eel.is/c++draft/dcl.attr.nodiscard#4>:
/Recommended practice:/ Appearance of a nodiscard call as a
potentially-evaluated discarded-value expression ([expr.prop]
<http://eel.is/c++draft/expr.prop>) is discouraged unless explicitly
cast to void. *Implementations should issue a warning in such
cases.* [Note: This is typically because discarding the return
value of a nodiscard call has surprising consequences. — end note]
*The string-literal in a nodiscard attribute-argument-clause should
be used in the message of the warning as the rationale for why the
result should not be discarded*.
[cpp.error]p1 <http://eel.is/c++draft/cpp.error#1>:
A preprocessing directive of the form
# error pp-tokens_opt new-line
*causes the implementation to produce a diagnostic message that
includes the specified sequence of preprocessing tokens*, and
renders the program ill-formed.
At a minimum, there do appear to be opportunities to improve consistency
in wording here. We have an interesting mix of "shall", "should",
"could", and ... "causes".
Tom.
>
> Tom.
>
>> Best regards,
>>
>> Peter
>>
>> *From:*SG16 <sg16-bounces_at_[hidden]>
>> <mailto:sg16-bounces_at_[hidden]> *On Behalf Of *Martinho
>> Fernandes via SG16
>> *Sent:* 01 September 2020 18:16
>> *To:* sg16_at_[hidden] <mailto:sg16_at_[hidden]>
>> *Cc:* Martinho Fernandes <rmf_at_[hidden]> <mailto:rmf_at_[hidden]>
>> *Subject:* Re: [SG16] On the character encoding of diagnostic text
>>
>> EXTERNAL MAIL
>>
>> On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16
>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>>
>> On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>> >
>> > For a cross compiler, the basic execution character set
>> should correspond to the target platform, but the diagnostics
>> character set should be for the host?
>>
>> That matches my understanding.
>>
>> I suppose a question I could add is whether anyone would like
>> to see a
>> new character set introduced for diagnostics. My intuition is
>> that it
>> would be a pretty heavy hammer to bring to bear and that the
>> basic
>> source character set is probably Good Enough (tm).
>>
>> Wouldn't these diagnostics be the place people are more likely to
>> use non-basic source characters, though? When it comes to
>> identifiers people will sometimes compromise and restrict
>> themselves and e.g. avoid diacritics, but in error messages I
>> feel like it makes a lot more sense to want to write with the
>> full expression of their native script.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
>
> On Thu, Sep 3, 2020 at 3:36 PM Tom Honermann via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On 9/2/20 5:33 AM, Peter Brett via SG16 wrote:
>>
>> Hi all,
>>
>> We allow Unicode identifiers (if/when P1949 is adopted, UAX31
>> identifiers). Implementations will therefore need to have a
>> mechanism for communicating those identifiers to the user via
>> their diagnostics. Let us assume that such mechanism exists as a
>> necessary implementation detail of any reasonable C++ implementation.
>>
>> By the point at which static_assert() is evaluated, its string
>> argument will have already been converted to its associated
>> implementation-defined literal encoding. For some
>> implementations, this may be a lossy conversion.
>>
>> I am wary of mandating specific handling of this in the standard
>> because the way in which diagnostics are communicated to the user
>> seems to be something that really should be a quality of
>> implementation issue.
>>
>> If we were to adjust the standard, then the adjustment should not
>> preclude constexpr computation of the static_assert message, in
>> anticipation of reflection making it possible to format type
>> names into it with constexpr std::format.
>>
>> static_assert(std::is_base_of_v<MyBase, Arg>,
>> std::format("Cannot my_cast to {} because {} is not derived
>> from MyBase",
>> /* reflection expressions here */));
>>
>> We must not confuse 2 separate concerns:
>>
>> 1. Whether implementations correctly process strings from their
>> internal representation for display in diagnostic messages
>> 2. Whether implementations correctly handle situations in which
>> the literal encoding and the encoding required for displaying
>> diagnostic messages is different.
>>
>> I am strongly opposed to a solution that restricts
>> static_assert() messages to the basic source character set.
>>
> I would also be strongly opposed to a solution that prohibits
> characters outside of the basic source character set, but I think
> it would be reasonable to specify that characters outside the
> basic source character set may be subject to substitution
> (potentially lossy), presentation in non-glyph form (as a UCN), or
> perhaps even dropped (mildly opposed). Is that a view point that
> you could support?
>
>
> I don't think we should specify what happens to the string. Rather we
> should specify what kind of string literals are accepted (and I'd
> accept any valid string literal).
>
> First, what happens to diagnostics is outside the abstract machine, we
> don't legislate that. Second, it's not the source character set nor is
> it the execution one. What I mean by this is that the source character
> set is what the compiler consumes and my editor shows, but diagnostics
> are what my shell shows (that's not the compiler, nor is it the
> editor), but it can be in an IDE. Imagine that I run clang in my
> favorite PDP-11 shell emulator... clang might be nice to check if
> Unicode is supported and then escape what's not supported, but does
> the Standard need to say anything? Now imagine I pipe stderr to
> /dev/null, have I now made my compiler non-conformant? What if I pipe
> it to a file? I can't see it unless I open the file... is it still
> conformant? What about diagnostics in an IDE, where I only see
> diagnostics for the code currently open in the IDE, the others are
> "hidden". Say the IDE colors the diagnostics, is it still conformant?
> etc. The Standard doesn't care about any of this, it's not useful for
> us to care, let's not say anything. Trying to say something is
> legislating away implementation freedom, let's just trust that
> implementation aren't adversarial and they're actually trying to help
> users.
I agree with all of this, but there does seem to be some genuine
interest in improving something here. The question I've posed have been
intended to probe those interests.
For reference, since I don't think it has been pointed out yet, here is
the relevant wording from the C++ standard. What changes would make
people more happy than the status quo?
[intro.compliance]p2.2 <http://eel.is/c++draft/intro.compliance#2.2>:
If a program contains a violation of any diagnosable rule or an
occurrence of a construct described in this document as
“conditionally-supported” when the implementation does not support
that construct, *a conforming implementation shall issue at least
one diagnostic message*.
[dcl.pre]p6 <http://eel.is/c++draft/dcl.dcl#dcl.pre-6>:
In a static_assert-declaration, the constant-expression shall be a
contextually converted constant expression of type bool. If the
value of the expression when so converted is true, the declaration
has no effect. *Otherwise, the program is ill-formed, and the
resulting diagnostic message (**[intro.compliance]
<http://eel.is/c++draft/intro.compliance>**) shall include the text
of the string-literal, if one is supplied, except that characters
not in the basic source character set are not required to appear in
the diagnostic message*. [Example:
static_assert(sizeof(int) == sizeof(void*), "wrong pointer size");
— end example]
[dcl.attr.deprecated]p1 <http://eel.is/c++draft/dcl.attr.deprecated#1>:
The attribute-token deprecated can be used to mark names and
entities whose use is still allowed, but is discouraged for some
reason. [Note: In particular, deprecated is appropriate for names
and entities that are deemed obsolescent or unsafe. — end note] It
shall appear at most once in each attribute-list. An
attribute-argument-clause may be present and, if present, it shall
have the form:
( string-literal )
*[Note: The string-literal in the attribute-argument-clause could be
used to explain the rationale for deprecation and/or to suggest a
replacing entity. — end note]*
[dcl.attr.deprecated]p4 <http://eel.is/c++draft/dcl.attr.deprecated#4>:
/Recommended practice:/ Implementations should use the deprecated
attribute to produce a diagnostic message in case the program refers
to a name or entity other than to declare it, after a declaration
that specifies the attribute. *The diagnostic message should include
the text provided within the attribute-argument-clause of any
**deprecated**attribute applied to the name or entity*.
[dcl.attr.nodiscard]p4 <http://eel.is/c++draft/dcl.attr.nodiscard#4>:
/Recommended practice:/ Appearance of a nodiscard call as a
potentially-evaluated discarded-value expression ([expr.prop]
<http://eel.is/c++draft/expr.prop>) is discouraged unless explicitly
cast to void. *Implementations should issue a warning in such
cases.* [Note: This is typically because discarding the return
value of a nodiscard call has surprising consequences. — end note]
*The string-literal in a nodiscard attribute-argument-clause should
be used in the message of the warning as the rationale for why the
result should not be discarded*.
[cpp.error]p1 <http://eel.is/c++draft/cpp.error#1>:
A preprocessing directive of the form
# error pp-tokens_opt new-line
*causes the implementation to produce a diagnostic message that
includes the specified sequence of preprocessing tokens*, and
renders the program ill-formed.
At a minimum, there do appear to be opportunities to improve consistency
in wording here. We have an interesting mix of "shall", "should",
"could", and ... "causes".
Tom.
>
> Tom.
>
>> Best regards,
>>
>> Peter
>>
>> *From:*SG16 <sg16-bounces_at_[hidden]>
>> <mailto:sg16-bounces_at_[hidden]> *On Behalf Of *Martinho
>> Fernandes via SG16
>> *Sent:* 01 September 2020 18:16
>> *To:* sg16_at_[hidden] <mailto:sg16_at_[hidden]>
>> *Cc:* Martinho Fernandes <rmf_at_[hidden]> <mailto:rmf_at_[hidden]>
>> *Subject:* Re: [SG16] On the character encoding of diagnostic text
>>
>> EXTERNAL MAIL
>>
>> On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16
>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>>
>> On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>> >
>> > For a cross compiler, the basic execution character set
>> should correspond to the target platform, but the diagnostics
>> character set should be for the host?
>>
>> That matches my understanding.
>>
>> I suppose a question I could add is whether anyone would like
>> to see a
>> new character set introduced for diagnostics. My intuition is
>> that it
>> would be a pretty heavy hammer to bring to bear and that the
>> basic
>> source character set is probably Good Enough (tm).
>>
>> Wouldn't these diagnostics be the place people are more likely to
>> use non-basic source characters, though? When it comes to
>> identifiers people will sometimes compromise and restrict
>> themselves and e.g. avoid diacritics, but in error messages I
>> feel like it makes a lot more sense to want to write with the
>> full expression of their native script.
>>
>>
>
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2020-09-03 21:45:20