sg16: Re: [SG16] Wide characters with multiple c-chars

From: Peter Brett <pbrett_at_[hidden]>
Date: Thu, 2 Jul 2020 16:10:39 +0000

Is there scope for a non-normative note with a recommendation that implementations do not implement the feature?

                        Peter

From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Tom Honermann via SG16
Sent: 02 July 2020 16:50
To: sg16_at_[hidden]
Cc: Tom Honermann <tom_at_[hidden]>; Corentin <corentin.jabot_at_[hidden]>
Subject: Re: [SG16] Wide characters with multiple c-chars

EXTERNAL MAIL
On 7/2/20 3:31 AM, Corentin via SG16 wrote:

On Thu, 2 Jul 2020 at 09:09, Corentin <corentin.jabot_at_[hidden]<mailto:corentin.jabot_at_[hidden]>> wrote:
Hello,
As part of https://wg21.link/p2178r0<https://urldefense.com/v3/__https:/wg21.link/p2178r0__;!!EHscmS1ygiU1lA!VD8xKUvbEjOjC98OPihN7v4ZossmCZCNZOm8ErXEKF_uKp8IFAuTssmRnFsuGg$>,

I would like to make *wide* characters litteral with multiple c-char (ie: L'abc') ill-formed
https://compiler-explorer.com/z/MHExrk<https://urldefense.com/v3/__https:/compiler-explorer.com/z/MHExrk__;!!EHscmS1ygiU1lA!VD8xKUvbEjOjC98OPihN7v4ZossmCZCNZOm8ErXEKF_uKp8IFAuTssnHXqv2Ww$>

I forgot to mention that it's extra fun with combining characters https://compiler-explorer.com/z/ndyyAD<https://urldefense.com/v3/__https:/compiler-explorer.com/z/ndyyAD__;!!EHscmS1ygiU1lA!VD8xKUvbEjOjC98OPihN7v4ZossmCZCNZOm8ErXEKF_uKp8IFAuTssnm89UUtA$> (the value of b is equivalent to L'e\u0301')
In both of the compiler-explorer examples, the options to the MSVC compiler are incorrect; they should be '/O2' and '/utf-8' respectively (though I don't think it affects the results for these cases).

All compilers but MSVC emit a warning by default, some implementations pick the first c-char, others pick the last.
There is no use (no occurrence in any of the packages in vcpkg) or usage for this feature.

Did you check for cases like _TEXT('xx'), _T('xx'), and __T('xx') in addition to L'xx'?

Regardless, the packages in the vcpkg ecosystem can prove that a feature is used, but cannot prove that a feature is not used.

So why do this?
- Someone unfamiliar with C++ might do auto str = L'abc' instead of L"abc"
- Things that are not useful should not linger for 40 more years in the standard; Tom and I talked too much about how to word this "feature" as part of P2029, so it's not free.
- It's part of a wider "Bogus conversions in phase 5" should be ill-formed rather than doing their best to compile *something*
P2029 makes these explicitly conditionally-supported in order to, as I understand it, match the C standard (in the C standard, my understanding is that implementation-defined semantics can include rejecting the code, but in the C++ standard, we use conditionally-supported for the same allowance). Therefore, implementations are not (will not be) required to accept them with current direction.

Please note that I am not proposing to make (narrow) multi character literals ill-formed or deprecated at this point, there are some uses, and these uses are intended.

I would really like your opinion so we can propose that change to EWG and make the change without taking too much of anyone's time (the process really isn't tuned for very small changes, which is why that is part of a larger paper)

I am weakly against for three reasons:

  1. I think there are more valuable efforts for us to spend our time on. I don't find the motivation above compelling. These are legacy features that, as far as I am aware, don't actively cause problems. If evidence can be found to demonstrate that they do actively cause problems, then I might change my mind.
  2. Existing compilers are unlikely to change their behavior. Clang, gcc, and icc already issue warnings for use of multicharacter literals (both ordinary and wide) and I suspect they would continue to accept these as extensions (even in their strict compliance modes). The cumulative affect of the proposed change seems to be to require MSVC to issue a warning.
  3. Requiring a diagnostic creates an unnecessary divergence from C.

Tom.

Thanks a lot,
Corentin

Received on 2020-07-02 11:14:01