sg16: Re: [SG16] On the character encoding of diagnostic text

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 2 Sep 2020 00:13:53 -0400

On 9/1/20 12:59 PM, Aaron Ballman via SG16 wrote:
> On Tue, Sep 1, 2020 at 11:01 AM Steve Downey <sdowney_at_[hidden]> wrote:
>> I'm not sure about C, but for C++ the grammar for static_assert is
>> static_assert-declaration:
>> static_assert ( constant-expression ) ;
>> static_assert ( constant-expression , string-literal ) ;
>> So the value for the string is going to be encoded in the target
>> environment, rather than the source. While all characters from the
>> basic source character set and the execution character set are going
>> to be available, the values for those characters may not actually be
>> suitable for emitting. No idea what compilers do today. String-literal
>> also allows for encoding prefix, so the associated encoding might be
>> Unicode. Or even a wide string.
>> Do we need to have an 'original spelling' rule for static_assert?
> C has the same grammar for static assertions, and [[deprecated]],
> [[nodiscard]], and static_assert all use a string-literal (with no
> mention of encoding prefixes). Personally, I would like to see
> encoding prefixes be ill-formed in these cases, but I was going for
> incremental improvement on the question of what to do with strings
> (without a prefix) with characters outside the basic character set. Do
> we have an "original spelling" rule somewhere else already in C++?

With regard to making encoding prefixes ill-formed, we did provide such
direction for _Pragma in the last SG16 telecon. See the discussion and
later polls for P2178R1 proposal 10 at
https://github.com/sg16-unicode/sg16-meetings#august-26th-2020.

The only thoughts I had regarding "original spelling" were the same ones
that Jens already stated.

Tom.

>
> ~Aaron
>
>> On Tue, Sep 1, 2020 at 10:13 AM Aaron Ballman via SG16
>> <sg16_at_[hidden]> wrote:
>>> I've been working on a paper (attached) for WG14 that I also intend to
>>> submit to WG21 (with C++-specific wording) and am looking for some
>>> early feedback from SG16 as the topic relates to text encoding of
>>> diagnostic messages.
>>>
>>> The basic thrust of the paper is that both languages have constructs
>>> which can produce diagnostics including user-defined text and don't
>>> have a "compiler diagnostic character set" defined to convert the text
>>> into. So the paper tries to limit the damage by allowing characters
>>> outside of the basic source character set to be handled as a matter of
>>> QoI. The paper also asks a question that Tom H posed to me, which is
>>> whether we want the restriction to be against the basic execution
>>> character set for slightly easier handling of \r and \n.
>>>
>>> Any feedback you feel like providing is appreciated. Thanks!
>>>
>>> ~Aaron
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2020-09-01 23:17:22