C++ Logo


Advanced search

Re: [SG16] Wide characters with multiple c-chars

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 2 Jul 2020 11:50:17 -0400
On 7/2/20 3:31 AM, Corentin via SG16 wrote:
> On Thu, 2 Jul 2020 at 09:09, Corentin <corentin.jabot_at_[hidden]
> <mailto:corentin.jabot_at_[hidden]>> wrote:
> Hello,
> As part of https://wg21.link/p2178r0,
> I would like to make *wide* characters litteral with multiple
> c-char (ie: L'abc') ill-formed
> https://compiler-explorer.com/z/MHExrk
> I forgot to mention that it's extra fun with combining characters
> https://compiler-explorer.com/z/ndyyAD (the value of b is equivalent
> to L'e\u0301')
In both of the compiler-explorer examples, the options to the MSVC
compiler are incorrect; they should be '/O2' and '/utf-8' respectively
(though I don't think it affects the results for these cases).
> All compilers but MSVC emit a warning by default, some
> implementations pick the first c-char, others pick the last.
> There is no use (no occurrence in any of the packages in vcpkg) or
> usage for this feature.
Did you check for cases like _TEXT('xx'), _T('xx'), and __T('xx') in
addition to L'xx'?

Regardless, the packages in the vcpkg ecosystem can prove that a feature
is used, but cannot prove that a feature is not used.

> So why do this?
> - Someone unfamiliar with C++ might do auto str = L'abc' instead
> of L"abc"
> - Things that are not useful should not linger for 40 more years
> in the standard; Tom and I talked too much about how to word this
> "feature" as part of P2029, so it's not free.
> - It's part of a wider "Bogus conversions in phase 5" should be
> ill-formed rather than doing their best to compile *something*
P2029 makes these explicitly conditionally-supported in order to, as I
understand it, match the C standard (in the C standard, my understanding
is that implementation-defined semantics can include rejecting the code,
but in the C++ standard, we use conditionally-supported for the same
allowance). Therefore, implementations are not (will not be) required
to accept them with current direction.
> Please note that I am not proposing to make (narrow) multi
> character literals ill-formed or deprecated at this point, there
> are some uses, and these uses are intended.
> I would really like your opinion so we can propose that change to
> EWG and make the change without taking too much of anyone's time
> (the process really isn't tuned for very small changes, which is
> why that is part of a larger paper)
I am weakly against for three reasons:

 1. I think there are more valuable efforts for us to spend our time
    on. I don't find the motivation above compelling. These are legacy
    features that, as far as I am aware, don't actively cause problems.
    If evidence can be found to demonstrate that they do actively cause
    problems, then I might change my mind.
 2. Existing compilers are unlikely to change their behavior. Clang,
    gcc, and icc already issue warnings for use of multicharacter
    literals (both ordinary and wide) and I suspect they would continue
    to accept these as extensions (even in their strict compliance
    modes). The cumulative affect of the proposed change seems to be to
    require MSVC to issue a warning.
 3. Requiring a diagnostic creates an unnecessary divergence from C.


> Thanks a lot,
> Corentin

Received on 2020-07-02 10:54:07