C++ Logo

liaison

Advanced search

Re: [wg14/wg21 liaison] Type of u8"" array elements are 'char', but u8'' is 'unsigned char' ?

From: Steve Downey <sdowney_at_[hidden]>
Date: Tue, 1 Mar 2022 08:50:48 -0500
Sorry, yes, we have a wording bug using the XID Unicode properties because
the source character set is not necessarily Unicode. In C the source
character set is an encoded set of its own.

On Tue, Mar 1, 2022, 00:17 Tom Honermann <tom_at_[hidden]> wrote:

> On 2/28/22 11:29 PM, Steve Downey wrote:
>
> I'm ecstatic it's been taken care of. But I've also learned not to assume
> anything is already fixed.
> Thanks for the hard work that means I don't have to figure out how to file
> an issue.
>
> If necessary, though, we could reuse the mb to char 32 machinery to find
> the code points for property look up.
>
> I'm missing some context (or it's just late and my brain has already
> headed to bed). Could you elaborate on what you are thinking about here?
>
> Tom.
>
>
> On Mon, Feb 28, 2022, 20:49 Tom Honermann via Liaison <
> liaison_at_[hidden]> wrote:
>
>> Steve, yes, this inconsistency was deliberate; see N2418
>> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2418.pdf> (the paper
>> that proposed u8 character literals for C2X). The choice was made
>> anticipating a char8_t type as then proposed by N2231
>> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2231.htm>. As Robert
>> already noted, the adoption of N2653
>> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm> at the last
>> WG14 meeting resolves the inconsistency.
>>
>> Tom.
>> On 2/28/22 3:39 PM, Robert Seacord via Liaison wrote:
>>
>> If I remember correctly, we voted this paper
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm into C23 at
>> the last meeting so you probably want to refer to this as it changes
>> things.
>>
>> Thanks,
>> rCs
>>
>> On Mon, Feb 28, 2022 at 3:29 PM Steve Downey via Liaison <
>> liaison_at_[hidden]> wrote:
>>
>>> Looking at N 2731
>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf to see how
>>> conversion from source encoding to u8 literal is handled, I notice that
>>> 6.4.5 String Literals para 6 says "For UTF–8 string literals, the array
>>> elements have type char", although in 6.4.4.4 Character Constants para 12
>>> it says "A UTF–8 character constant has type unsigned char."
>>>
>>> Is the inconsistency a bug or deliberate?
>>> _______________________________________________
>>> Liaison mailing list
>>> Liaison_at_[hidden]
>>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
>>> Link to this post: http://lists.isocpp.org/liaison/2022/02/1009.php
>>>
>>
>> _______________________________________________
>> Liaison mailing listLiaison_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
>> Link to this post: http://lists.isocpp.org/liaison/2022/02/1010.php
>>
>> _______________________________________________
>> Liaison mailing list
>> Liaison_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
>> Searchable archives: http://lists.isocpp.org/liaison/2022/03/index.php
>>
>

Received on 2022-03-01 13:51:01