I would like all of the combinations that do not involve L”” to have a

well defined value, whether or not we specify the encoding.

Well-defined or perhaps ill-formed?

This is another case where I think the feature makes little or no sense, but, unless shown otherwise, doesn't cause problems in practice and should therefore be treated as a low priority issue relative to other things we could be working on.

I do not care to further constrain the L”” forms at this point.

+1.

Tom.

AlisdairM

On Jul 8, 2020, at 12:35, Corentin Jabot <corentinjabot@gmail.com> wrote:

On Wed, 8 Jul 2020 at 13:09, Alisdair Meredith via SG16 <sg16@lists.isocpp.org> wrote:

After taking another look over P2029 resolving a few core issues,
I am further concerned by [lex.string]p11, which states (among
other things) that concatenation of unicode string literals with
different encoding-prefixes is conditionally supported with
implementation-defined behavior. That seems a little to free for
my tastes.

+1

I can buy conditionally supported, although see no harm in
requiring it for any combination of unicode encoding prefixes.
I am concerned about the implementation-defined behavior:
the end result should be the result of concatenating the
transcoded representation of each of the strings into a common
encoding, corresponding to one of the involved encoding
prefixes. I am happy to defer to implementations to choose
between UTF8/16/32, or we could define a canonical prefered
ordering among those choices.

Does this seem worth calling out (yet another SG16 paper) or
better left alone, as we already have way too much busy work
on this groups plate, and implementation will most likely do the
right thing anyway?

It's called out in P2178

from a user perspective, we allow to manything for any mental model to work

u8"" "" and "" u8"" are both utf-8 strings

what

u8"" L""

is supposed to be? utf-8? wide? How can I tell ? and it's not portable?

There are only 2 models that make sense imo

Either

- Only the first string can have a prefix

- Only one of the string can have a prefix

Note that, despite the terribly misleading wording the strings that are concatenated do *not* have an associated encoding before concatenation,

the compiler choses a prefix after concatenation (this is yet something needing fixing, i think tom started a thread a few days ago back)

u8"" L"" is not utf-8 + wide (what would that mean?) but a string interpreted as either utf-8 or wide depending on the whims of the implementation.

I think it needs fixing indeed :)

(I am not overly concerned about specifying concatenation for
narrow/wide string literals with unicode string literals, which
can remain conditionally supported with implementation-defined
values.)

AlisdairM
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16