C++ Logo

sg16

Advanced search

Re: [SG16] Wide characters with multiple c-chars

From: Corentin <corentin.jabot_at_[hidden]>
Date: Thu, 2 Jul 2020 18:32:26 +0200
On Thu, 2 Jul 2020 at 17:50, Tom Honermann <tom_at_[hidden]> wrote:

> On 7/2/20 3:31 AM, Corentin via SG16 wrote:
>
>
>
> On Thu, 2 Jul 2020 at 09:09, Corentin <corentin.jabot_at_[hidden]> wrote:
>
>> Hello,
>> As part of https://wg21.link/p2178r0,
>>
>> I would like to make *wide* characters litteral with multiple c-char (ie:
>> L'abc') ill-formed
>> https://compiler-explorer.com/z/MHExrk
>>
>
> I forgot to mention that it's extra fun with combining characters
> https://compiler-explorer.com/z/ndyyAD (the value of b is equivalent to
> L'e\u0301')
>
> In both of the compiler-explorer examples, the options to the MSVC
> compiler are incorrect; they should be '/O2' and '/utf-8' respectively
> (though I don't think it affects the results for these cases).
>
>
>
>>
>> All compilers but MSVC emit a warning by default, some
>> implementations pick the first c-char, others pick the last.
>> There is no use (no occurrence in any of the packages in vcpkg) or usage
>> for this feature.
>>
> Did you check for cases like _TEXT('xx'), _T('xx'), and __T('xx') in
> addition to L'xx'?
>
> Regardless, the packages in the vcpkg ecosystem can prove that a feature
> is used, but cannot prove that a feature is not used.
>
Sure, but that is not achievable. It's nevertheless a reasonable proxy.



>
>> So why do this?
>> - Someone unfamiliar with C++ might do auto str = L'abc' instead of L"abc"
>> - Things that are not useful should not linger for 40 more years in the
>> standard; Tom and I talked too much about how to word this "feature" as
>> part of P2029, so it's not free.
>> - It's part of a wider "Bogus conversions in phase 5" should be
>> ill-formed rather than doing their best to compile *something*
>>
> P2029 makes these explicitly conditionally-supported in order to, as I
> understand it, match the C standard (in the C standard, my understanding is
> that implementation-defined semantics can include rejecting the code, but
> in the C++ standard, we use conditionally-supported for the same
> allowance). Therefore, implementations are not (will not be) required to
> accept them with current direction.
>

I know. I don't think that's sufficient.


>
>> Please note that I am not proposing to make (narrow) multi character
>> literals ill-formed or deprecated at this point, there are some uses, and
>> these uses are intended.
>>
>> I would really like your opinion so we can propose that change to EWG and
>> make the change without taking too much of anyone's time (the process
>> really isn't tuned for very small changes, which is why that is part of a
>> larger paper)
>>
> I am weakly against for three reasons:
>
> 1. I think there are more valuable efforts for us to spend our time
> on. I don't find the motivation above compelling. These are legacy
> features that, as far as I am aware, don't actively cause problems. If
> evidence can be found to demonstrate that they do actively cause problems,
> then I might change my mind.
>
> I don't think that's ever a compelling reason not to do something

>
> 1.
> 2. Existing compilers are unlikely to change their behavior. Clang,
> gcc, and icc already issue warnings for use of multicharacter literals
> (both ordinary and wide) and I suspect they would continue to accept these
> as extensions (even in their strict compliance modes). The cumulative
> affect of the proposed change seems to be to require MSVC to issue a
> warning.
>
> Why do you think that?


>
> 1.
> 2. Requiring a diagnostic creates an unnecessary divergence from C.
>
> I am sure C would consider aligning.


>
>
> Tom.
>
>
>> Thanks a lot,
>> Corentin
>>
>>
>>
>>
>
>
>

Received on 2020-07-02 11:35:52