C++ Logo

liaison

Advanced search

Re: [wg14/wg21 liaison] Fwd: (SC22WG14.19436) [SG16] Draft WG14 N2653: char8_t: A type for UTF-8 characters and strings (Revision 1)

From: Tom Honermann <tom_at_[hidden]>
Date: Tue, 8 Jun 2021 13:33:37 -0400
On 6/7/21 3:59 AM, Peter Sommerlad (C++) via Liaison wrote:
> I didn't follow, but AFAIK char8_t is said to be distinct, not an
> alias, it was never suggested to make uint8_t a distinct type from
> unsigned char.

char8_t is a distinct type in C++, but the proposal for C is that it be
a typedef of unsigned char (for consistency with wchar_t, char16_t, and
char32_t).

Tom.

>
> Peter
>
> Aaron Peter Bachmann via Liaison wrote on 07.06.21 09:45:
>>
>>
>> Hello!
>> At least for C if uint8_t is not an alias for char, signed char and
>> unsigned char is very undesirable.
>> [u]intxy_t are just typedefs.
>> Having a special case uint8_t makes C less regular and breaks
>> existing code.
>> For performance we have restrict.
>>
>> Regards, Aaron Peter Bachmann
>>
>> On 6/6/21 5:49 PM, Tom Honermann wrote:
>>> On 6/5/21 4:27 PM, Victor Yodaiken wrote:
>>>>
>>>>
>>>> On Sat, Jun 5, 2021 at 9:51 AM Tom Honermann <tom_at_[hidden]
>>>> <mailto:tom_at_[hidden]>> wrote
>>>>
>>>>>
>>>>> Why is it not desirable?
>>>>
>>>> I mentioned that there is a tradeoff between code efficiency and
>>>> safety in the updates made to the paper.
>>>>
>>>> maybe im looking at the wrong version but i dont see any discussion
>>>> of that
>>>
>>> From here
>>> <https://rawgit.com/sg16-unicode/sg16/master/papers/n2653.html#do_char8_t_type>.
>>> Start reading at the paragraph that begins with "Additional
>>> motivation for distinct integer types is the ability to specify them
>>> as non-aliasing types".
>>>
>>> > The following example code would be well-formed in C regardless of
>>> whether char8_t is specified as a new integer type or as a typedef
>>> name of an existing character type. If char8_t is specified as a
>>> typedef name of an existing character type, then the example also
>>> works as expected because it does not violate aliasing rules.
>>> However, if char8_t is specified as a new integer type, then the
>>> example would exhibit undefined behavior because an object of type
>>> char is accessed using the char8_t type (assuming no new special
>>> provisions added to C17 6.5, Expressions, paragraph 7). *Thus, there
>>> is a trade-off between code efficiency and safety inherent in how
>>> **char8_t**is defined*.
>>>
>>>>>
>>>>> Thank you, good suggestion.
>>>>>
>>>>> I updated the "typedef name vs a new integer type"
>>>>> <https://rawgit.com/sg16-unicode/sg16/master/papers/n2653.html#do_char8_t_type>
>>>>>
>>>>> section and have now submitted the paper to
>>>>>
>>>>>
>>>>> That seems more useful. What is the purpose of creating more
>>>>> type restrictions?
>>>>
>>>> The usual reasons; a coherent object model enables type based
>>>> analysis for improved code generation and other forms of static
>>>> analysis.
>>>>
>>>>
>>>> What makes it coherent to have unnecessary type ub?
>>>> What evidence shows tbaa improves performance on real programs?
>>>> My concern is that there is too much ub in the standard already
>>>
>>> I'm not going to debate this here as other forums are better suited
>>> for discussing the pros and cons of the C object model. I'm
>>> confident WG14 is well positioned to make a decision with regard to
>>> this paper.
>>>
>>> Tom.
>>>
>>
>> _______________________________________________
>> Liaison mailing list
>> Liaison_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/liaison
>> Link to this post: http://lists.isocpp.org/liaison/2021/06/0606.php
>
>

Received on 2021-06-08 12:33:40