On Wed, 8 Jul 2020 at 13:56, Alisdair Meredith <alisdairm@me.com> wrote:

Well, u8”” L”” is portably ill-formed, and should be diagnosed :)

The corner cases that are conditionally supported with an
implementation defined value are:

u8”” u”"
u8”” U””
u”” u8””
u”” U””
u"” L””
U”” u8””
U”” u””
U”” L””
L”” u””
L”” U”"

Let's entertain that this should, for some reason, not be ill-formed.

What should it do?

Can you find a set of rules that people will not trip over?

Is it like rock paper scissors but with u8, u and U instead?

I understand that we can make _anything_ well-formed or conditionally supported.

We should ask ourselves why?

Is there a reason to allow that, which surpasses the cost of the added complexity?

We can ( and I believe we should) make that ill-formed.

(Implementations sensibly do not support these things anyway)

I would like all of the combinations that do not involve L”” to have a
well defined value, whether or not we specify the encoding.

I do not care to further constrain the L”” forms at this point.

AlisdairM

On Jul 8, 2020, at 12:35, Corentin Jabot <corentinjabot@gmail.com> wrote:

On Wed, 8 Jul 2020 at 13:09, Alisdair Meredith via SG16 <sg16@lists.isocpp.org> wrote:
After taking another look over P2029 resolving a few core issues,
I am further concerned by [lex.string]p11, which states (among
other things) that concatenation of unicode string literals with
different encoding-prefixes is conditionally supported with
implementation-defined behavior. That seems a little to free for
my tastes.

+1

I can buy conditionally supported, although see no harm in
requiring it for any combination of unicode encoding prefixes.
I am concerned about the implementation-defined behavior:
the end result should be the result of concatenating the
transcoded representation of each of the strings into a common
encoding, corresponding to one of the involved encoding
prefixes. I am happy to defer to implementations to choose
between UTF8/16/32, or we could define a canonical prefered
ordering among those choices.

Does this seem worth calling out (yet another SG16 paper) or
better left alone, as we already have way too much busy work
on this groups plate, and implementation will most likely do the
right thing anyway?

It's called out in P2178

from a user perspective, we allow to manything for any mental model to work

u8"" "" and "" u8"" are both utf-8 strings

what

u8"" L""

is supposed to be? utf-8? wide? How can I tell ? and it's not portable?

There are only 2 models that make sense imo
Either
- Only the first string can have a prefix
- Only one of the string can have a prefix

Note that, despite the terribly misleading wording the strings that are concatenated do *not* have an associated encoding before concatenation,
the compiler choses a prefix after concatenation (this is yet something needing fixing, i think tom started a thread a few days ago back)

u8"" L"" is not utf-8 + wide (what would that mean?) but a string interpreted as either utf-8 or wide depending on the whims of the implementation.

I think it needs fixing indeed :)

(I am not overly concerned about specifying concatenation for
narrow/wide string literals with unicode string literals, which
can remain conditionally supported with implementation-defined
values.)

AlisdairM
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16