C++ Logo

SG16

Advanced search

Subject: Re: Wide characters with multiple c-chars
From: Tom Honermann (tom_at_[hidden])
Date: 2020-07-02 10:50:17


On 7/2/20 3:31 AM, Corentin via SG16 wrote:
>
>
> On Thu, 2 Jul 2020 at 09:09, Corentin <corentin.jabot_at_[hidden]
> <mailto:corentin.jabot_at_[hidden]>> wrote:
>
> Hello,
> As part of https://wg21.link/p2178r0,
>
> I would like to make *wide* characters litteral with multiple
> c-char (ie: L'abc') ill-formed
> https://compiler-explorer.com/z/MHExrk
>
>
> I forgot to mention that it's extra fun with combining characters
> https://compiler-explorer.com/z/ndyyAD (the value of b is equivalent
> to L'e\u0301')
In both of the compiler-explorer examples, the options to the MSVC
compiler are incorrect; they should be '/O2' and '/utf-8' respectively
(though I don't think it affects the results for these cases).
>
>
> All compilers but MSVC emit a warning by default, some
> implementations pick the first c-char, others pick the last.
> There is no use (no occurrence in any of the packages in vcpkg) or
> usage for this feature.
>
Did you check for cases like _TEXT('xx'), _T('xx'), and __T('xx') in
addition to L'xx'?

Regardless, the packages in the vcpkg ecosystem can prove that a feature
is used, but cannot prove that a feature is not used.

>
> So why do this?
> - Someone unfamiliar with C++ might do auto str = L'abc' instead
> of L"abc"
> - Things that are not useful should not linger for 40 more years
> in the standard;  Tom and I talked too much about how to word this
> "feature" as part of P2029, so it's not free.
> - It's part of a wider "Bogus conversions in phase 5" should be
> ill-formed rather than doing their best to compile *something*
>
P2029 makes these explicitly conditionally-supported in order to, as I
understand it, match the C standard (in the C standard, my understanding
is that implementation-defined semantics can include rejecting the code,
but in the C++ standard, we use conditionally-supported for the same
allowance).  Therefore, implementations are not (will not be) required
to accept them with current direction.
>
>
> Please note that I am not proposing to make (narrow) multi
> character literals ill-formed or deprecated at this point, there
> are some uses, and these uses are intended.
>
> I would really like your opinion so we can propose that change to
> EWG and make the change without taking too much of anyone's time
> (the process really isn't tuned for very small changes, which is
> why that is part of a larger paper)
>
I am weakly against for three reasons:

 1. I think there are more valuable efforts for us to spend our time
    on.  I don't find the motivation above compelling.  These are legacy
    features that, as far as I am aware, don't actively cause problems. 
    If evidence can be found to demonstrate that they do actively cause
    problems, then I might change my mind.
 2. Existing compilers are unlikely to change their behavior. Clang,
    gcc, and icc already issue warnings for use of multicharacter
    literals (both ordinary and wide) and I suspect they would continue
    to accept these as extensions (even in their strict compliance
    modes).  The cumulative affect of the proposed change seems to be to
    require MSVC to issue a warning.
 3. Requiring a diagnostic creates an unnecessary divergence from C.

Tom.

>
> Thanks a lot,
> Corentin
>
>
>



SG16 list run by sg16-owner@lists.isocpp.org