On 08/07/2020 13.09, Alisdair Meredith via SG16 wrote:After taking another look over P2029 resolving a few core issues, I am further concerned by [lex.string]p11, which states (among other things) that concatenation of unicode string literals with different encoding-prefixes is conditionally supported with implementation-defined behavior. That seems a little to free for my tastes. I can buy conditionally supported, although see no harm in requiring it for any combination of unicode encoding prefixes. I am concerned about the implementation-defined behavior: the end result should be the result of concatenating the transcoded representation of each of the strings into a common encoding, corresponding to one of the involved encoding prefixes.That's not how it works. You first pick a common encoding-prefix for the concatenation (whatever it is), and then you encode the entire (concatenated) string using that encoding-prefix.
I think Jens' description matches the intent expressed in [lex.string]p11, though I have long struggled with the intent of the note there:
> [ Note: This concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a string-literal has been translated into a value from the appropriate character set), a string-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. — end note ]
However, Alisdair's description appears to match the Visual C++ implementation as previously discussed in https://lists.isocpp.org/sg16/2020/07/1699.php and as exhibited at https://msvc.godbolt.org/z/7KcMs5 (including for wide literals, and including the same bug where the wrong encoding is used for the second conversion).
In cases where character conversion is non-lossy through the
various encodings, the difference is unobservable.
I prefer the design that Jens described since it avoids the
additional conversions.
I am happy to defer to implementations to choose between UTF8/16/32, or we could define a canonical prefered ordering among those choices.Since all four well-known C++ implementations appear to produce an error for the test cases at https://compiler-explorer.com/z/4NDo-4 I'm fine with specifying these as ill-formed.
I'm fine with that as well.
Jens, would you consider such a change as evolutionary given that we don't know of any implementations (so far) that actually support these concatenations? Would it be reasonable to take this issue straight to core (with JF's blessing of course)? The only arguments I can see against making this change are 1) Not a great use of our time to excise a weird conditionally-supported feature that is not implemented anywhere, and 2) additional drift from C.
JeanHeyd has already reached out to WG14 to ask for their input
on making these ill-formed.
There is no (technical) need to support these cases, and nobody has written code like that (because no compiler accepts it), so let's nix it. >From a procedural standpoint, P2029 produces enough churn in the general area that I'd like to see P2029 hit the working draft before future papers in that area are processed.
Me too.
Tom.