sg16: Re: [SG16] On the character encoding of diagnostic text

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 3 Sep 2020 18:36:21 -0400

On 9/2/20 5:33 AM, Peter Brett via SG16 wrote:
>
> Hi all,
>
> We allow Unicode identifiers (if/when P1949 is adopted, UAX31
> identifiers). Implementations will therefore need to have a mechanism
> for communicating those identifiers to the user via their
> diagnostics. Let us assume that such mechanism exists as a necessary
> implementation detail of any reasonable C++ implementation.
>
> By the point at which static_assert() is evaluated, its string
> argument will have already been converted to its associated
> implementation-defined literal encoding. For some implementations,
> this may be a lossy conversion.
>
> I am wary of mandating specific handling of this in the standard
> because the way in which diagnostics are communicated to the user
> seems to be something that really should be a quality of
> implementation issue.
>
> If we were to adjust the standard, then the adjustment should not
> preclude constexpr computation of the static_assert message, in
> anticipation of reflection making it possible to format type names
> into it with constexpr std::format.
>
> static_assert(std::is_base_of_v<MyBase, Arg>,
> std::format("Cannot my_cast to {} because {} is not derived from
> MyBase",
> /* reflection expressions here */));
>
> We must not confuse 2 separate concerns:
>
> 1. Whether implementations correctly process strings from their
> internal representation for display in diagnostic messages
> 2. Whether implementations correctly handle situations in which the
> literal encoding and the encoding required for displaying
> diagnostic messages is different.
>
> I am strongly opposed to a solution that restricts static_assert()
> messages to the basic source character set.
>
I would also be strongly opposed to a solution that prohibits characters
outside of the basic source character set, but I think it would be
reasonable to specify that characters outside the basic source character
set may be subject to substitution (potentially lossy), presentation in
non-glyph form (as a UCN), or perhaps even dropped (mildly opposed). Is
that a view point that you could support?

Tom.

> Best regards,
>
> Peter
>
> *From:*SG16 <sg16-bounces_at_[hidden]> *On Behalf Of *Martinho
> Fernandes via SG16
> *Sent:* 01 September 2020 18:16
> *To:* sg16_at_[hidden]
> *Cc:* Martinho Fernandes <rmf_at_[hidden]>
> *Subject:* Re: [SG16] On the character encoding of diagnostic text
>
> EXTERNAL MAIL
>
> On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
>
> On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> >
> > For a cross compiler, the basic execution character set should
> correspond to the target platform, but the diagnostics character
> set should be for the host?
>
> That matches my understanding.
>
> I suppose a question I could add is whether anyone would like to see a
> new character set introduced for diagnostics. My intuition is that it
> would be a pretty heavy hammer to bring to bear and that the basic
> source character set is probably Good Enough (tm).
>
> Wouldn't these diagnostics be the place people are more likely to use
> non-basic source characters, though? When it comes to identifiers
> people will sometimes compromise and restrict themselves and e.g.
> avoid diacritics, but in error messages I feel like it makes a lot
> more sense to want to write with the full expression of their native
> script.
>
>

Received on 2020-09-03 17:39:51