C++ Logo

sg16

Advanced search

Re: [SG16] On the character encoding of diagnostic text

From: Aaron Ballman <aaron_at_[hidden]>
Date: Thu, 3 Sep 2020 11:22:12 -0400
On Wed, Sep 2, 2020 at 11:15 AM Steve Downey <sdowney_at_[hidden]> wrote:
>
> We're working on restricting identifiers to UAX31 for many of the same
> reasons that these strings are problematic. I don't think we can apply
> UAX31 here, but I think it's reasonable to let compiler writers know
> that just catting these strings as encoded to stderr is not mandated.
> Some "wonderful" examples from JF, in the above Compiler Explorer link
> ( https://godbolt.org/z/vGPdha )
>
> static_assert(false, "RTL‮RTL");
>
> (yes that's well formed, at least in the original source)
>
> static_assert(false, "\033[0;31mRED!!!\033[0mnot red :(");
> static_assert(false,
> u8"Cť̺̹͕͎hųlù");
>
>
> I think we actually made a mistake making these things
> string-literals. They aren't. They're sort of delimited pp-tokens, and
> in that form are much safer, but expressing that in the grammar is
> difficult and mixes levels. That static_assert is a declaration was a
> little surprising to me, rather than an expression of some sort, but
> an expression wouldn't have really helped. It's not part of the
> pre-processor, and the same is true for attributes where we've
> annotated them with strings.
>
> My feeling is that the goal is to give C and C++ implementors leeway
> to do better, while setting a floor on minimal viable product. The
> rest is QoI. Current Quality of Implementation is terrible, though.
> That CE link is horrifying.

This matches my belief -- I'm trying to set a floor on the minimum
functional requirements, and I personally think the basic source
character set is sufficient for that. However, if there's a strong
sentiment in either committee to add a diagnostic character set so
that we can give different guarantees using it, I'm okay with that
approach.

~Aaron

>
>
> On Wed, Sep 2, 2020 at 5:33 AM Peter Brett via SG16
> <sg16_at_[hidden]> wrote:
> >
> > Hi all,
> >
> >
> >
> > We allow Unicode identifiers (if/when P1949 is adopted, UAX31 identifiers). Implementations will therefore need to have a mechanism for communicating those identifiers to the user via their diagnostics. Let us assume that such mechanism exists as a necessary implementation detail of any reasonable C++ implementation.
> >
> >
> >
> > By the point at which static_assert() is evaluated, its string argument will have already been converted to its associated implementation-defined literal encoding. For some implementations, this may be a lossy conversion.
> >
> >
> >
> > I am wary of mandating specific handling of this in the standard because the way in which diagnostics are communicated to the user seems to be something that really should be a quality of implementation issue.
> >
> >
> >
> > If we were to adjust the standard, then the adjustment should not preclude constexpr computation of the static_assert message, in anticipation of reflection making it possible to format type names into it with constexpr std::format.
> >
> >
> >
> > static_assert(std::is_base_of_v<MyBase, Arg>,
> > std::format("Cannot my_cast to {} because {} is not derived from MyBase",
> > /* reflection expressions here */));
> >
> >
> >
> > We must not confuse 2 separate concerns:
> >
> >
> >
> > Whether implementations correctly process strings from their internal representation for display in diagnostic messages
> > Whether implementations correctly handle situations in which the literal encoding and the encoding required for displaying diagnostic messages is different.
> >
> >
> >
> > I am strongly opposed to a solution that restricts static_assert() messages to the basic source character set.
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Peter
> >
> >
> >
> >
> >
> > From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Martinho Fernandes via SG16
> > Sent: 01 September 2020 18:16
> > To: sg16_at_[hidden]
> > Cc: Martinho Fernandes <rmf_at_[hidden]>
> > Subject: Re: [SG16] On the character encoding of diagnostic text
> >
> >
> >
> > EXTERNAL MAIL
> >
> >
> >
> > On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16 <sg16_at_[hidden]> wrote:
> >
> > On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
> > <sg16_at_[hidden]> wrote:
> > >
> > > For a cross compiler, the basic execution character set should correspond to the target platform, but the diagnostics character set should be for the host?
> >
> > That matches my understanding.
> >
> > I suppose a question I could add is whether anyone would like to see a
> > new character set introduced for diagnostics. My intuition is that it
> > would be a pretty heavy hammer to bring to bear and that the basic
> > source character set is probably Good Enough (tm).
> >
> >
> >
> > Wouldn't these diagnostics be the place people are more likely to use non-basic source characters, though? When it comes to identifiers people will sometimes compromise and restrict themselves and e.g. avoid diacritics, but in error messages I feel like it makes a lot more sense to want to write with the full expression of their native script.
> >
> > --
> > SG16 mailing list
> > SG16_at_[hidden]
> > https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2020-09-03 10:25:59