On 7/8/20 3:15 PM, Jens Maurer wrote:
On 08/07/2020 13.09, Alisdair Meredith via SG16 wrote:
After taking another look over P2029 resolving a few core issues,
I am further concerned by [lex.string]p11, which states (among
other things) that concatenation of unicode string literals with
different encoding-prefixes is conditionally supported with
implementation-defined behavior.  That seems a little to free for
my tastes.

I can buy conditionally supported, although see no harm in
requiring it for any combination of unicode encoding prefixes.
I am concerned about the implementation-defined behavior:
the end result should be the result of concatenating the
transcoded representation of each of the strings into a common
encoding, corresponding to one of the involved encoding
prefixes.
That's not how it works.  You first pick a common
encoding-prefix for the concatenation (whatever it is),
and then you encode the entire (concatenated) string
using that encoding-prefix.

I think Jens' description matches the intent expressed in [lex.string]p11, though I have long struggled with the intent of the note there:

> [ Note: This concatenation is an interpretation, not a conversion.  Because the interpretation happens in translation phase 6 (after each character from a string-literal has been translated into a value from the appropriate character set), a string-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. — end note ]

However, Alisdair's description appears to match the Visual C++ implementation as previously discussed in https://lists.isocpp.org/sg16/2020/07/1699.php and as exhibited at https://msvc.godbolt.org/z/7KcMs5 (including for wide literals, and including the same bug where the wrong encoding is used for the second conversion).

In cases where character conversion is non-lossy through the various encodings, the difference is unobservable.

I prefer the design that Jens described since it avoids the additional conversions.


  I am happy to defer to implementations to choose
between UTF8/16/32, or we could define a canonical prefered
ordering among those choices.
Since all four well-known C++ implementations appear to
produce an error for the test cases at
https://compiler-explorer.com/z/4NDo-4
I'm fine with specifying these as ill-formed.

I'm fine with that as well.

Jens, would you consider such a change as evolutionary given that we don't know of any implementations (so far) that actually support these concatenations?  Would it be reasonable to take this issue straight to core (with JF's blessing of course)?  The only arguments I can see against making this change are 1) Not a great use of our time to excise a weird conditionally-supported feature that is not implemented anywhere, and 2) additional drift from C.

JeanHeyd has already reached out to WG14 to ask for their input on making these ill-formed.


There is no (technical) need to support these cases,
and nobody has written code like that (because
no compiler accepts it), so let's nix it.

>From a procedural standpoint, P2029 produces enough
churn in the general area that I'd like to see P2029
hit the working draft before future papers in that
area are processed.

Me too.

Tom.