sg16: Re: [SG16] On the character encoding of diagnostic text

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 2 Sep 2020 09:02:54 +0200

On Wed, Sep 2, 2020, 06:08 Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:

> On 9/1/20 2:14 PM, Aaron Ballman via SG16 wrote:
> > On Tue, Sep 1, 2020 at 1:16 PM Martinho Fernandes via SG16
> > <sg16_at_[hidden]> wrote:
> >>
> >> On Tue, Sep 1, 2020 at 7:05 PM Aaron Ballman via SG16 <
> sg16_at_[hidden]> wrote:
> >>> On Tue, Sep 1, 2020 at 12:08 PM Alisdair Meredith via SG16
> >>> <sg16_at_[hidden]> wrote:
> >>>> For a cross compiler, the basic execution character set should
> correspond to the target platform, but the diagnostics character set should
> be for the host?
> >>> That matches my understanding.
> >>>
> >>> I suppose a question I could add is whether anyone would like to see a
> >>> new character set introduced for diagnostics. My intuition is that it
> >>> would be a pretty heavy hammer to bring to bear and that the basic
> >>> source character set is probably Good Enough (tm).
> >>>
> >> Wouldn't these diagnostics be the place people are more likely to use
> non-basic source characters, though? When it comes to identifiers people
> will sometimes compromise and restrict themselves and e.g. avoid
> diacritics, but in error messages I feel like it makes a lot more sense to
> want to write with the full expression of their native script.
> > I think that's likely a valid point, but I'm struggling to find any
> > data that I can point to in the paper. I don't suppose you (or anyone
> > in SG16) have insights into how often this comes up in practice? e.g.,
> > does someone have evidence that this comes up enough to warrant making
> > a diagnostic character set? Does anyone think that would be a worse
> > approach than limiting to the basic source character set?
>
> I don't have any data regarding use in practice but my intuition matches
> Martinho's.
>
> Assuming a diagnostic character set were specified and that it contained
> characters beyond the basic source character set, what guarantees could
> be offered? Presentation of the diagnostic may be limited by factors
> outside the implementation's control; for example, terminal/console
> capabilities. We could state that diagnostic messages must reflect all
> message characters outside the basic source character set in some way,
> whether via an escape mechanism, substitution of a replacement
> character, or some other method.
>
> Perhaps it is useful to think about this more abstractly. Translation
> phase 1 doesn't place any restrictions on the program source. Likewise,
> diagnostic messages need not have an associated concrete character set;
> think of asking your smart speaker to compile a program and how it might
> present diagnostics. Perhaps diagnostic generation should be defined as
> the inverse of translation phase 1; the input is in the internal
> character set (insert reversion of translation phase 5 as necessary) and
> the output is analogous to "physical source file characters"
>

The encoding for diagnostic would have to be an implementation defined
encoding.
We probably can't make guarantees about it beside "super set of basic
character set"
It is still valuable that it would be decoupled from the execution
character set.
And yes, I believe people might not want to restrict themselves to latin1.
They might get replacement characters, that's seems reasonable

>

> Tom.
>
> >
> > Thanks!
> >
> > ~Aaron
> >
> >> --
> >> SG16 mailing list
> >> SG16_at_[hidden]
> >> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
> --
> SG16 mailing list
>

SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-09-02 02:06:37