C++ Logo

SG16

Advanced search

Subject: Re: Concatenating unicode string literals
From: Corentin Jabot (corentinjabot_at_[hidden])
Date: 2020-07-08 06:35:34


On Wed, 8 Jul 2020 at 13:09, Alisdair Meredith via SG16 <
sg16_at_[hidden]> wrote:

> After taking another look over P2029 resolving a few core issues,
> I am further concerned by [lex.string]p11, which states (among
> other things) that concatenation of unicode string literals with
> different encoding-prefixes is conditionally supported with
> implementation-defined behavior. That seems a little to free for
> my tastes.
>

+1

> I can buy conditionally supported, although see no harm in
> requiring it for any combination of unicode encoding prefixes.
> I am concerned about the implementation-defined behavior:
> the end result should be the result of concatenating the
> transcoded representation of each of the strings into a common
> encoding, corresponding to one of the involved encoding
> prefixes. I am happy to defer to implementations to choose
> between UTF8/16/32, or we could define a canonical prefered
> ordering among those choices.
>
> Does this seem worth calling out (yet another SG16 paper) or
> better left alone, as we already have way too much busy work
> on this groups plate, and implementation will most likely do the
> right thing anyway?
>

It's called out in P2178

from a user perspective, we allow to manything for any mental model to work

u8"" "" and "" u8"" are both utf-8 strings

what

u8"" L""

is supposed to be? utf-8? wide? How can I tell ? and it's not portable?

There are only 2 models that make sense imo
Either
- Only the first string can have a prefix
- Only one of the string can have a prefix

Note that, despite the terribly misleading wording the strings that are
concatenated do *not* have an associated encoding before concatenation,
the compiler choses a prefix after concatenation (this is yet something
needing fixing, i think tom started a thread a few days ago back)

u8"" L"" is not utf-8 + wide (what would that mean?) but a string
interpreted as either utf-8 or wide depending on the whims of the
implementation.

I think it needs fixing indeed :)

>
> (I am not overly concerned about specifying concatenation for
> narrow/wide string literals with unicode string literals, which
> can remain conditionally supported with implementation-defined
> values.)
>
> AlisdairM
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>



SG16 list run by sg16-owner@lists.isocpp.org