Subject: Re: Wide characters with multiple c-chars
From: Tom Honermann (tom_at_[hidden])
Date: 2020-07-02 11:59:19
On 7/2/20 12:32 PM, Corentin via SG16 wrote:
> On Thu, 2 Jul 2020 at 17:50, Tom Honermann <tom_at_[hidden]
> <mailto:tom_at_[hidden]>> wrote:
> On 7/2/20 3:31 AM, Corentin via SG16 wrote:
>> On Thu, 2 Jul 2020 at 09:09, Corentin <corentin.jabot_at_[hidden]
>> <mailto:corentin.jabot_at_[hidden]>> wrote:
>> As part of https://wg21.link/p2178r0,
>> I would like to makeÂ *wide* characters litteral with multiple
>> c-char (ie: L'abc') ill-formed
>> I forgot to mention that it's extra fun with combining characters
>> https://compiler-explorer.com/z/ndyyAD (the value of b is
>> equivalent to L'e\u0301')
> In both of the compiler-explorer examples, the options to the MSVC
> compiler are incorrect; they should be '/O2' and '/utf-8'
> respectively (though I don't think it affects the results for
> these cases).
>> All compilers but MSVC emit a warning byÂ default, some
>> implementationsÂ pick the first c-char, others pick the last.
>> There is no use (no occurrenceÂ in any of the packages in
>> vcpkg) or usage for this feature.
> Did you check for cases like _TEXT('xx'), _T('xx'), and __T('xx')
> in addition to L'xx'?
> Regardless, the packages in the vcpkg ecosystem can prove that a
> feature is used, but cannot prove that a feature is not used.
> Sure, but that is not achievable. It's nevertheless a reasonable proxy.
I agree it isn't achievable.Â But we should be cautious about false
>> So why do this?
>> - Someone unfamiliar with C++ might do auto str = L'abc'
>> instead of L"abc"
>> - Things that are not useful should not linger for 40 more
>> years in the standard;Â Tom and I talked too much about how
>> to word this "feature" as part of P2029, so it's not free.
>> - It's part of a wider "Bogus conversions in phase 5" should
>> be ill-formed rather than doing their best to compile *something*
> P2029 makes these explicitly conditionally-supported in order to,
> as I understand it, match the C standard (in the C standard, my
> understanding is that implementation-defined semantics can include
> rejecting the code, but in the C++ standard, we use
> conditionally-supported for the same allowance). Therefore,
> implementations are not (will not be) required to accept them with
> current direction.
> I know. I don't think that's sufficient.
For what reason?Â We've lived with this for a very long time.Â Why is
there urgency now to spend time on this?Â (When embarking on P2029, I
had no intention of doing anything with these, but the direction to
basically rewrite [lex.ccon] and [lex.string] made that unavoidable).
>> Please note that I am not proposing to make (narrow) multi
>> character literals ill-formed or deprecated at this point,
>> there are some uses, and these uses are intended.
>> I would really like your opinion so we can propose that
>> change to EWG and make the change without taking too much of
>> anyone's time (the process really isn't tuned for very small
>> changes, which is why that is part of a larger paper)
> I am weakly against for three reasons:
> 1. I think there are more valuable efforts for us to spend our
> time on.Â I don't find the motivation above compelling.Â These
> are legacy features that, as far as I am aware, don't actively
> cause problems.Â If evidence can be found to demonstrate that
> they do actively cause problems, then I might change my mind.
> Â I don't think that's ever a compelling reason not to do something
One of the first questions asked in the incubator groups is whether to
devote more time to something.Â In my opinion, this doesn't reach the
bar for spending more time on it.
> 2. Existing compilers are unlikely to change their behavior.Â
> Clang, gcc, and icc already issue warnings for use of
> multicharacter literals (both ordinary and wide) and I suspect
> they would continue to accept these as extensions (even in
> their strict compliance modes).Â The cumulative affect of the
> proposed change seems to be to require MSVC to issue a warning.
> Why do you think that?
As you've mentioned elsewhere, making them ill-formed only requires
emitting a warning; which most implementations appear to already do.Â I
see little motivation for them to do otherwise.Â Users can already
elevate that warning to an error via -Werror.
> 2. Requiring a diagnostic creates an unnecessary divergence from C.
> Â I am sure C would consider aligning.
I'm sure they would consider it, but I'm not at all sure that they would
choose to do so.Â WG14 tends to be more conservative than WG21.
SG16 list run by email@example.com