C++ Logo

sg16

Advanced search

Re: [SG16] On the character encoding of diagnostic text

From: JF Bastien <cxx_at_[hidden]>
Date: Thu, 3 Sep 2020 08:25:51 -0700
On Thu, Sep 3, 2020 at 8:22 AM Aaron Ballman via SG16 <sg16_at_[hidden]>
wrote:

> On Wed, Sep 2, 2020 at 11:15 AM Steve Downey <sdowney_at_[hidden]> wrote:
>
> >
>
> > We're working on restricting identifiers to UAX31 for many of the same
>
> > reasons that these strings are problematic. I don't think we can apply
>
> > UAX31 here, but I think it's reasonable to let compiler writers know
>
> > that just catting these strings as encoded to stderr is not mandated.
>
> > Some "wonderful" examples from JF, in the above Compiler Explorer link
>
> > ( https://godbolt.org/z/vGPdha )
>
> >
>
> > static_assert(false, "RTL‮RTL");
>
> >
>
> > (yes that's well formed, at least in the original source)
>
> >
>
> > static_assert(false, "\033[0;31mRED!!!\033[0mnot red :(");
>
> > static_assert(false,
>
> > u8"Cť̺̹͕͎hųlù");
>
> >
>
> >
>
> > I think we actually made a mistake making these things
>
> > string-literals. They aren't. They're sort of delimited pp-tokens, and
>
> > in that form are much safer, but expressing that in the grammar is
>
> > difficult and mixes levels. That static_assert is a declaration was a
>
> > little surprising to me, rather than an expression of some sort, but
>
> > an expression wouldn't have really helped. It's not part of the
>
> > pre-processor, and the same is true for attributes where we've
>
> > annotated them with strings.
>
> >
>
> > My feeling is that the goal is to give C and C++ implementors leeway
>
> > to do better, while setting a floor on minimal viable product. The
>
> > rest is QoI. Current Quality of Implementation is terrible, though.
>
> > That CE link is horrifying.
>
>
>
> This matches my belief -- I'm trying to set a floor on the minimum
>
> functional requirements, and I personally think the basic source
>
> character set is sufficient for that. However, if there's a strong
>
> sentiment in either committee to add a diagnostic character set so
>
> that we can give different guarantees using it, I'm okay with that
>
> approach.


Honestly I don’t think things are particularly bad, just inconsistent. IMO
we’re better off escaping things as clang does, except perhaps new lines,
but that’s purely an implementation choice and doesn’t need to be
specified. That would mean that we allow any character literal in
diagnostics, and compilers choose how to display them. I would find it odd
to restrict them to the basic character set.





>
>
> ~Aaron
>
>
>
> >
>
> >
>
> > On Wed, Sep 2, 2020 at 5:33 AM Peter Brett via SG16
>
> > <sg16_at_[hidden]> wrote:
>
> > >
>
> > > Hi all,
>
> > >
>
> > >
>
> > >
>
> > > We allow Unicode identifiers (if/when P1949 is adopted, UAX31
> identifiers). Implementations will therefore need to have a mechanism for
> communicating those identifiers to the user via their diagnostics. Let us
> assume that such mechanism exists as a necessary implementation detail of
> any reasonable C++ implementation.
>
> > >
>
> > >
>
> > >
>
> > > By the point at which static_assert() is evaluated, its string
> argument will have already been converted to its associated
> implementation-defined literal encoding. For some implementations, this may
> be a lossy conversion.
>
> > >
>
> > >
>
> > >
>
> > > I am wary of mandating specific handling of this in the standard
> because the way in which diagnostics are communicated to the user seems to
> be something that really should be a quality of implementation issue.
>
> > >
>
> > >
>
> > >
>
> > > If we were to adjust the standard, then the adjustment should not
> preclude constexpr computation of the static_assert message, in
> anticipation of reflection making it possible to format type names into it
> with constexpr std::format.
>
> > >
>
> > >
>
> > >
>
> > > static_assert(std::is_base_of_v<MyBase, Arg>,
>
> > > std::format("Cannot my_cast to {} because {} is not derived from
> MyBase",
>
> > > /* reflection expressions here */));
>
> > >
>
> > >
>
> > >
>
> > > We must not confuse 2 separate concerns:
>
> > >
>
> > >
>
> > >
>
> > > Whether implementations correctly process strings from their internal
> representation for display in diagnostic messages
>
> > > Whether implementations correctly handle situations in which the
> literal encoding and the encoding required for displaying diagnostic
> messages is different.
>
> > >
>
> > >
>
> > >
>
> > > I am strongly opposed to a solution that restricts static_assert()
> messages to the basic source character set.
>
> > >
>
> > >
>
> > >
>
> > > Best regards,
>
> > >
>
> > >
>
> > >
>
> > > Peter
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Martinho
> Fernandes via SG16
>
> > > Sent: 01 September 2020 18:16
>
> > > To: sg16_at_[hidden]
>
> > > Cc: Martinho Fernandes <rmf_at_[hidden]>
>
> > > Subject: Re: [SG16] On the character encoding of diagnostic text
>
> > >
>
> > >
>
> > >
>
> > > EXTERNAL MAIL
>
> > >
>
> > >
>
> > >
>
> > > On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16 <
> sg16_at_[hidden]> wrote:
>
> > >
>
> > > On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
>
> > > <sg16_at_[hidden]> wrote:
>
> > > >
>
> > > > For a cross compiler, the basic execution character set should
> correspond to the target platform, but the diagnostics character set should
> be for the host?
>
> > >
>
> > > That matches my understanding.
>
> > >
>
> > > I suppose a question I could add is whether anyone would like to see a
>
> > > new character set introduced for diagnostics. My intuition is that it
>
> > > would be a pretty heavy hammer to bring to bear and that the basic
>
> > > source character set is probably Good Enough (tm).
>
> > >
>
> > >
>
> > >
>
> > > Wouldn't these diagnostics be the place people are more likely to use
> non-basic source characters, though? When it comes to identifiers people
> will sometimes compromise and restrict themselves and e.g. avoid
> diacritics, but in error messages I feel like it makes a lot more sense to
> want to write with the full expression of their native script.
>
> > >
>
> > > --
>
> > > SG16 mailing list
>
> > > SG16_at_[hidden]
>
> > > https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
> --
>
> SG16 mailing list
>
> SG16_at_[hidden]
>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>

Received on 2020-09-03 10:29:32