C++ Logo


Advanced search

Re: [SG16] Concatenating unicode string literals

From: Alisdair Meredith <alisdairm_at_[hidden]>
Date: Wed, 8 Jul 2020 12:55:54 +0100
Well, u8”” L”” is portably ill-formed, and should be diagnosed :)

The corner cases that are conditionally supported with an
implementation defined value are:

  u8”” u”"
  u8”” U””
  u”” u8””
  u”” U””
  u"” L””
  U”” u8””
  U”” u””
  U”” L””
  L”” u””
  L”” U”"

I would like all of the combinations that do not involve L”” to have a
well defined value, whether or not we specify the encoding.

I do not care to further constrain the L”” forms at this point.


> On Jul 8, 2020, at 12:35, Corentin Jabot <corentinjabot_at_[hidden]> wrote:
> On Wed, 8 Jul 2020 at 13:09, Alisdair Meredith via SG16 <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> wrote:
> After taking another look over P2029 resolving a few core issues,
> I am further concerned by [lex.string]p11, which states (among
> other things) that concatenation of unicode string literals with
> different encoding-prefixes is conditionally supported with
> implementation-defined behavior. That seems a little to free for
> my tastes.
> +1
> I can buy conditionally supported, although see no harm in
> requiring it for any combination of unicode encoding prefixes.
> I am concerned about the implementation-defined behavior:
> the end result should be the result of concatenating the
> transcoded representation of each of the strings into a common
> encoding, corresponding to one of the involved encoding
> prefixes. I am happy to defer to implementations to choose
> between UTF8/16/32, or we could define a canonical prefered
> ordering among those choices.
> Does this seem worth calling out (yet another SG16 paper) or
> better left alone, as we already have way too much busy work
> on this groups plate, and implementation will most likely do the
> right thing anyway?
> It's called out in P2178
> from a user perspective, we allow to manything for any mental model to work
> u8"" "" and "" u8"" are both utf-8 strings
> what
> u8"" L""
> is supposed to be? utf-8? wide? How can I tell ? and it's not portable?
> There are only 2 models that make sense imo
> Either
> - Only the first string can have a prefix
> - Only one of the string can have a prefix
> Note that, despite the terribly misleading wording the strings that are concatenated do *not* have an associated encoding before concatenation,
> the compiler choses a prefix after concatenation (this is yet something needing fixing, i think tom started a thread a few days ago back)
> u8"" L"" is not utf-8 + wide (what would that mean?) but a string interpreted as either utf-8 or wide depending on the whims of the implementation.
> I think it needs fixing indeed :)
> (I am not overly concerned about specifying concatenation for
> narrow/wide string literals with unicode string literals, which
> can remain conditionally supported with implementation-defined
> values.)
> AlisdairM
> --
> SG16 mailing list
> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16 <https://lists.isocpp.org/mailman/listinfo.cgi/sg16>

Received on 2020-07-08 06:59:15