C++ Logo

sg16

Advanced search

Re: [SG16] Reinterpreting pointers of character types

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 29 Jan 2021 09:39:27 +0100
On 29/01/2021 08.20, Tom Honermann wrote:
> On 1/28/21 1:57 PM, Jens Maurer via SG16 wrote:
>> On 28/01/2021 19.37, Corentin via SG16 wrote:
>>> On Thu, Jan 28, 2021 at 7:22 PM Peter Brett <pbrett_at_[hidden] <mailto:pbrett_at_[hidden]>> wrote:
>>>
>>> I think the big problem here is trying to make it a template.____
>>>
>>> __ __
>>>
>>> Make it named. It’s literally not possible to use this correctly in generic code.
>>>
>>>
>>> Question then is do we want to solve the issue for wchar_t?
>>> Because having the name of the encoding in the function kinda precludes that - the sizeof(wchar_t) being platform dependant
>> You only get away with char* -> char8_t* because "char" has special
>> aliasing exceptions.
>>
>> You'll get the full set of aliasing concerns for
>> wchar_t* -> char16_t* or char32_t*
>
> I think what we're looking for is a portable solution for this ICU hack <https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/char16ptr.h#L30-L36> (generalized to make it work for [unsigned] char* conversion to char8_t*); the goal being to enable some form of explicit restricted pointer interconvertibility between same sized/aligned types.

Yes.

> I don't understand the ICU hack sufficiently well to relate it to a memory or object model. I'm also not sure that it actually works (though it may suffice for the scenarios that are encountered in practice).
>
> Perhaps something like this would suffice.
>
> template<typename To, typename From>
> requires requires {
> requires std::is_trivial_v<To>;
> requires std::is_trivial_v<From>;
> requires sizeof(To) == sizeof(From);
> requires alignof(To) == alignof(From);
> }
> To* alias_barrier_cast(From *p) {
> asm volatile("" : : "rm"(p) : "memory");
> return reinterpret_cast<To*>(p);
> }

From a C++ memory model perspective, there is no difference between,
say, char16_t and short or int: They form their own aliasing domain.
Converting the pointer with reinterpret_cast or something is NOT
the problem; the problem is accessing the data before and after.

There have been papers in the past that attempted to bless
regions of memory with a different data type (e.g. to deal with
mmapped file data); I think such a direction might be worthwhile
to investigate.

I certainly don't want to deal with a "solution" that covers
char8_t / char16_t / char32_t only, if the underlying concerns
are also applicable elsewhere.

Jens

Received on 2021-01-29 02:39:38