sg16: Re: [SG16] [ WG14 ] Mixed Wide String Literals

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 9 Dec 2020 20:22:02 +0100

On Wed, Dec 9, 2020, 05:52 Tom Honermann via SG16 <sg16_at_[hidden]>
wrote:

> On 12/8/20 11:21 AM, JeanHeyd Meneide via SG16 wrote:
>
> Dear Tom,
>
> On Sun, Dec 6, 2020 at 5:41 PM Tom Honermann <tom_at_[hidden]> wrote:
>
>> On 12/4/20 10:48 AM, JeanHeyd Meneide via SG16 wrote:
>>
>> BUT!
>>
>> WG14 also wants to look into -- after removing it -- spending
>> some time coming up with well-defined combination mechanism. We would
>> like to have ill-formedness (C++) / constraint violations (C) if the
>> conversion to the final chosen encoding do not work, to give people
>> good guarantees above e.g. synthesizing Unicode into an wide string
>> literal or synthesizing wide string data into Unicode literals.
>>
>> Was any particular use case discussed? Or just a general preference to,
>> given something like this:
>>
>> #define NAME u8"foo"
>> #define VERSION "5"
>>
>> void emit(const char16_t*);
>>
>> to be able to do something like this:
>>
>> emit(u"" NAME "-" VERSION);
>>
>>
> This, and the ability to take "implementation defined" text in either
> L"" or "" strings (e.g., from external headers you don't control or
> generated code) and force them to be a Unicode encoding by doing similar to
> the above -- prefixing with a u"". I think there's some utility to be had
> here when it comes to interfacing with legacy code and wanting to make sure
> macro expansions and code generation can be done well. This probably
> applies a lot more for C than C++, where references and `constexpr`
> arrays/string_views are generally used over macros and direct string
> literals.
>
> My best thought so far is that "first prefix is the chosen encoding".
> This also means you can write Unicode String literals and attempt to merge
> them into the desired narrow character set or wide character set (for
> compile-time). This has some advantages in being that if you know the data
> is going to go into a function take a `char*` or a `wchar_t*`, you can get
> a string literal into that encoding. Because we're also making the behavior
> well-defined, we can specify it as a constraint violation / ill-formed if
> the code points in any of the trailing string tokens is not representable
> in the first-specified encoding. (The usual "escaped from safety" rules for
> "\x" sequences would apply here too.)
>
> Got it, thanks. I agree there are use cases for the behavior you describe
> of effectively ignoring any prefixes after the first one. One complication
> is that "" L"" is equivalent to L"" L"" (and L"" "") today; even after
> P2201 and N2594. That means prefixing an unadorned string literal won't
> have the effect of forcing an encoding. That would be inconsistent, but
> maybe that is ok.
>

How frequent is that pattern in actual production code?

> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-12-09 13:22:16