sg16: Re: [SG16] Wide characters with multiple c-chars

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 2 Jul 2020 19:05:29 +0200

On Thu, Jul 2, 2020, 18:59 Tom Honermann <tom_at_[hidden]> wrote:

> On 7/2/20 12:32 PM, Corentin via SG16 wrote:
>
>
>
> On Thu, 2 Jul 2020 at 17:50, Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 7/2/20 3:31 AM, Corentin via SG16 wrote:
>>
>>
>>
>> On Thu, 2 Jul 2020 at 09:09, Corentin <corentin.jabot_at_[hidden]> wrote:
>>
>>> Hello,
>>> As part of https://wg21.link/p2178r0,
>>>
>>> I would like to make *wide* characters litteral with multiple c-char
>>> (ie: L'abc') ill-formed
>>> https://compiler-explorer.com/z/MHExrk
>>>
>>
>> I forgot to mention that it's extra fun with combining characters
>> https://compiler-explorer.com/z/ndyyAD (the value of b is equivalent to
>> L'e\u0301')
>>
>> In both of the compiler-explorer examples, the options to the MSVC
>> compiler are incorrect; they should be '/O2' and '/utf-8' respectively
>> (though I don't think it affects the results for these cases).
>>
>>
>>
>>>
>>> All compilers but MSVC emit a warning by default, some
>>> implementations pick the first c-char, others pick the last.
>>> There is no use (no occurrence in any of the packages in vcpkg) or usage
>>> for this feature.
>>>
>> Did you check for cases like _TEXT('xx'), _T('xx'), and __T('xx') in
>> addition to L'xx'?
>>
>> Regardless, the packages in the vcpkg ecosystem can prove that a feature
>> is used, but cannot prove that a feature is not used.
>>
> Sure, but that is not achievable. It's nevertheless a reasonable proxy.
>
> I agree it isn't achievable. But we should be cautious about false
> negatives.
>

Sure?

>
>
>
>>
>>> So why do this?
>>> - Someone unfamiliar with C++ might do auto str = L'abc' instead of
>>> L"abc"
>>> - Things that are not useful should not linger for 40 more years in the
>>> standard; Tom and I talked too much about how to word this "feature" as
>>> part of P2029, so it's not free.
>>> - It's part of a wider "Bogus conversions in phase 5" should be
>>> ill-formed rather than doing their best to compile *something*
>>>
>> P2029 makes these explicitly conditionally-supported in order to, as I
>> understand it, match the C standard (in the C standard, my understanding is
>> that implementation-defined semantics can include rejecting the code, but
>> in the C++ standard, we use conditionally-supported for the same
>> allowance). Therefore, implementations are not (will not be) required to
>> accept them with current direction.
>>
>
> I know. I don't think that's sufficient.
>
> For what reason? We've lived with this for a very long time. Why is
> there urgency now to spend time on this? (When embarking on P2029, I had
> no intention of doing anything with these, but the direction to basically
> rewrite [lex.ccon] and [lex.string] made that unavoidable).
>

There is no _urgency_ , there is also no time like the present.
I doesn't have to take a lot of committee time. Either we think it is a
useful feature of we don't

>
>
>>
>>> Please note that I am not proposing to make (narrow) multi character
>>> literals ill-formed or deprecated at this point, there are some uses, and
>>> these uses are intended.
>>>
>>> I would really like your opinion so we can propose that change to EWG
>>> and make the change without taking too much of anyone's time (the process
>>> really isn't tuned for very small changes, which is why that is part of a
>>> larger paper)
>>>
>> I am weakly against for three reasons:
>>
>> 1. I think there are more valuable efforts for us to spend our time
>> on. I don't find the motivation above compelling. These are legacy
>> features that, as far as I am aware, don't actively cause problems. If
>> evidence can be found to demonstrate that they do actively cause problems,
>> then I might change my mind.
>>
>> I don't think that's ever a compelling reason not to do something
>
> One of the first questions asked in the incubator groups is whether to
> devote more time to something. In my opinion, this doesn't reach the bar
> for spending more time on it.
>

Do we need to spend time on it?
If we don't remove it now, we will deal with it in 40 years.

>
>> 1.
>> 2. Existing compilers are unlikely to change their behavior. Clang,
>> gcc, and icc already issue warnings for use of multicharacter literals
>> (both ordinary and wide) and I suspect they would continue to accept these
>> as extensions (even in their strict compliance modes). The cumulative
>> affect of the proposed change seems to be to require MSVC to issue a
>> warning.
>>
>> Why do you think that?
>
> As you've mentioned elsewhere, making them ill-formed only requires
> emitting a warning; which most implementations appear to already do. I see
> little motivation for them to do otherwise. Users can already elevate that
> warning to an error via -Werror.
>

You assume compilers care about this extension?
Do you think msvc should not to have to emit a diagnostic.
The question is rather simple. Do we believe it is a useful feature?
If we assume that compiler won't implement any paper there is no point
standardizing anything.
What do you think the correct behavior is?

>
>
>>
>> 1.
>> 2. Requiring a diagnostic creates an unnecessary divergence from C.
>>
>> I am sure C would consider aligning.
>
> I'm sure they would consider it, but I'm not at all sure that they would
> choose to do so. WG14 tends to be more conservative than WG21.
>
> Tom.
>
>
>

Received on 2020-07-02 12:08:57