On Wed, 8 Jul 2020 at 18:07, Tom Honermann <tom@honermann.net> wrote:
On 7/8/20 8:45 AM, Corentin Jabot wrote:
On Wed, 8 Jul 2020 at 13:56, Alisdair Meredith <alisdairm@me.com> wrote:
Well, u8”” L”” is portably ill-formed, and should be diagnosed :)
The corner cases that are conditionally supported with animplementation defined value are:
u8”” u”"u8”” U””u”” u8””u”” U””u"” L””U”” u8””U”” u””U”” L””L”” u””L”” U”"
Let's entertain that this should, for some reason, not be ill-formed.What should it do?Can you find a set of rules that people will not trip over?Is it like rock paper scissors but with u8, u and U instead?I can sort-of kind-of see a use case for allowing one of u"" or U"" to be concatenated with L"" as a conditionally-supported feature when the wide execution encoding is a match (UTF-16 for u"" or UTF-32 for U""). This might be useful in strange situations where string concatenation is desired and one of the components is provided by a macro expansion. The question then is what the type of the string literal is. The only model I can see working there is to adopt the type from the first component and ignore the encoding-prefix from the remaining ones.
I would not be heart broken over breaking code that does this.
I understand that we can make _anything_ well-formed or conditionally supported.We should ask ourselves why?Is there a reason to allow that, which surpasses the cost of the added complexity?We can ( and I believe we should) make that ill-formed.(Implementations sensibly do not support these things anyway)Can you summarize what implementations do today? I haven't researched.
They do not support the combinations not mandated by the standard
Which implementations did you check? Clang, gcc, MSVC, and icc?
Tom.
I would like all of the combinations that do not involve L”” to have awell defined value, whether or not we specify the encoding.Well-defined or perhaps ill-formed?
This is another case where I think the feature makes little or no sense, but, unless shown otherwise, doesn't cause problems in practice and should therefore be treated as a low priority issue relative to other things we could be working on.
I do not care to further constrain the L”” forms at this point.+1.
Tom.
AlisdairM
On Jul 8, 2020, at 12:35, Corentin Jabot <corentinjabot@gmail.com> wrote:
On Wed, 8 Jul 2020 at 13:09, Alisdair Meredith via SG16 <sg16@lists.isocpp.org> wrote:
After taking another look over P2029 resolving a few core issues,
I am further concerned by [lex.string]p11, which states (among
other things) that concatenation of unicode string literals with
different encoding-prefixes is conditionally supported with
implementation-defined behavior. That seems a little to free for
my tastes.
+1I can buy conditionally supported, although see no harm in
requiring it for any combination of unicode encoding prefixes.
I am concerned about the implementation-defined behavior:
the end result should be the result of concatenating the
transcoded representation of each of the strings into a common
encoding, corresponding to one of the involved encoding
prefixes. I am happy to defer to implementations to choose
between UTF8/16/32, or we could define a canonical prefered
ordering among those choices.
Does this seem worth calling out (yet another SG16 paper) or
better left alone, as we already have way too much busy work
on this groups plate, and implementation will most likely do the
right thing anyway?
It's called out in P2178
from a user perspective, we allow to manything for any mental model to work
u8"" "" and "" u8"" are both utf-8 strings
what
u8"" L""
is supposed to be? utf-8? wide? How can I tell ? and it's not portable?
There are only 2 models that make sense imoEither- Only the first string can have a prefix- Only one of the string can have a prefix
Note that, despite the terribly misleading wording the strings that are concatenated do *not* have an associated encoding before concatenation,the compiler choses a prefix after concatenation (this is yet something needing fixing, i think tom started a thread a few days ago back)
u8"" L"" is not utf-8 + wide (what would that mean?) but a string interpreted as either utf-8 or wide depending on the whims of the implementation.
I think it needs fixing indeed :)
(I am not overly concerned about specifying concatenation for
narrow/wide string literals with unicode string literals, which
can remain conditionally supported with implementation-defined
values.)
AlisdairM
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16