C++ Logo

sg12

Advanced search

Re: [ub] Aliasing char16_t with int_least16_t, etc.

From: Jeffrey Yasskin <jyasskin_at_[hidden]>
Date: Wed, 30 Oct 2013 16:02:36 -0700
On Wed, Oct 30, 2013 at 3:06 PM, Lawrence Crowl <Lawrence_at_[hidden]> wrote:
> On 10/30/13, Jeffrey Yasskin <jyasskin_at_[hidden]> wrote:
>> I was sent a code review today that wanted to pass an array of wchar_t
>> (sizeof(wchar_t)==2 on Windows) to a function taking const uint16_t*
>> (https://code.google.com/p/chromium/codesearch/#chromium/src/third_party/harfbuzz-ng/src/hb-buffer.cc&l=982).
>> The proposed code did this with "reinterpret_cast<const
>> uint16_t*>(the_wchar_t_pointer)", but I had to point out that this
>> violates [basic.lval]p10. The workarounds seem to involve either
>> copying the array or adding overloads to the function that pass
>> through to a template.
>>
>> Can we make this sort of aliasing defined instead? With 2-3 ways to
>> represent a utf-16 array, we're likely to see more undefined casting
>> as users try to avoid extra copies or perceived code bloat.
>
> The undefined behavior permits better anti-aliasing. I do not know
> how large the effect is.

There's some evidence that strict-aliasing as a whole has roughly no
effect on performance in 2 large codebases:
https://groups.google.com/a/chromium.org/d/topic/chromium-dev/dUebWSEpAR8/discussion
and https://bugzilla.mozilla.org/show_bug.cgi?id=657806. This small
relaxation in the rules would have even less effect.

However, this list tends to worry about semantic effects even when
performance isn't an issue.

>> I think the change would be to add some bullets in [basic.lval]p10:
>> * [a type that is] the (possibly cv-qualified) underlying type of the
>> dynamic type of the object,
>> * [a type that is] the (possibly cv-qualified) signed or unsigned type
>> corresponding to the underlying type of the dynamic type of the
>> object,
>>
>> Would we want to go the other way too? That is, do we want to force
>> everyone writing a flexible utf-16 function to take uint16_t, or could
>> they accept char16_t too? If we want to let them take char16_t, we'd
>> need to add:
>> * a (possibly cv-qualifed) type whose underlying type is the dynamic
>> type of the object
>> * a (possibly cv-qualifed) type whose underlying type is the signed or
>> unsigned type corresponding to the dynamic type of the object
>
> Even further, we consider saying that we can access any integral type
> with the same size and alignment. But of course, that makes aliasing
> even less effective than what you suggest. I do not know how important
> that is.

I wouldn't object to that either.

> If we do allow such accesses, I suggest either adding another form of
> cast or loosening the static_cast requirements, to allow checkable
> code like the following.
>
> extern void foo( char16_t* p );
> void bar( int_least16_t* q ) { foo( static_cast<char16_t*>(q) ); }
>
> If int_least16_t is not 16 bits, the compilation should fail with
> a helpful message.

Modulo your exact choice of types, I like the guideline that
static_cast<T*> should work when we have defined behavior for
dereferencing the result.

Received on 2013-10-31 00:02:57