C++ Logo

SG16

Advanced search

Subject: Re: Concatenating unicode string literals
From: Alisdair Meredith (alisdairm_at_[hidden])
Date: 2020-07-08 06:55:54


Well, u8”” L”” is portably ill-formed, and should be diagnosed :)

The corner cases that are conditionally supported with an
implementation defined value are:

  u8”” u”"
  u8”” U””
  u”” u8””
  u”” U””
  u"” L””
  U”” u8””
  U”” u””
  U”” L””
  L”” u””
  L”” U”"

I would like all of the combinations that do not involve L”” to have a
well defined value, whether or not we specify the encoding.

I do not care to further constrain the L”” forms at this point.

AlisdairM

> On Jul 8, 2020, at 12:35, Corentin Jabot <corentinjabot_at_[hidden]> wrote:
>
>
>
> On Wed, 8 Jul 2020 at 13:09, Alisdair Meredith via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> After taking another look over P2029 resolving a few core issues,
> I am further concerned by [lex.string]p11, which states (among
> other things) that concatenation of unicode string literals with
> different encoding-prefixes is conditionally supported with
> implementation-defined behavior. That seems a little to free for
> my tastes.
>
> +1
>
> I can buy conditionally supported, although see no harm in
> requiring it for any combination of unicode encoding prefixes.
> I am concerned about the implementation-defined behavior:
> the end result should be the result of concatenating the
> transcoded representation of each of the strings into a common
> encoding, corresponding to one of the involved encoding
> prefixes. I am happy to defer to implementations to choose
> between UTF8/16/32, or we could define a canonical prefered
> ordering among those choices.
>
> Does this seem worth calling out (yet another SG16 paper) or
> better left alone, as we already have way too much busy work
> on this groups plate, and implementation will most likely do the
> right thing anyway?
>
> It's called out in P2178
>
> from a user perspective, we allow to manything for any mental model to work
>
> u8"" "" and "" u8"" are both utf-8 strings
>
> what
>
> u8"" L""
>
> is supposed to be? utf-8? wide? How can I tell ? and it's not portable?
>
> There are only 2 models that make sense imo
> Either
> - Only the first string can have a prefix
> - Only one of the string can have a prefix
>
> Note that, despite the terribly misleading wording the strings that are concatenated do *not* have an associated encoding before concatenation,
> the compiler choses a prefix after concatenation (this is yet something needing fixing, i think tom started a thread a few days ago back)
>
> u8"" L"" is not utf-8 + wide (what would that mean?) but a string interpreted as either utf-8 or wide depending on the whims of the implementation.
>
> I think it needs fixing indeed :)
>
>
> (I am not overly concerned about specifying concatenation for
> narrow/wide string literals with unicode string literals, which
> can remain conditionally supported with implementation-defined
> values.)
>
> AlisdairM
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>



SG16 list run by sg16-owner@lists.isocpp.org