sg16: Re: [SG16] Reinterpreting pointers of character types

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 29 Jan 2021 19:26:00 +0100

On 29/01/2021 10.27, Corentin wrote:
>
>
> On Fri, Jan 29, 2021 at 9:39 AM Jens Maurer <Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>> wrote:
>
> On 29/01/2021 08.20, Tom Honermann wrote:

> > I don't understand the ICU hack sufficiently well to relate it to a memory or object model. I'm also not sure that it actually works (though it may suffice for the scenarios that are encountered in practice).
> >
> > Perhaps something like this would suffice.
> >
> > template<typename To, typename From>
> > requires requires {
> > requires std::is_trivial_v<To>;
> > requires std::is_trivial_v<From>;
> > requires sizeof(To) == sizeof(From);
> > requires alignof(To) == alignof(From);
> > }
> > To* alias_barrier_cast(From *p) {
> > asm volatile("" : : "rm"(p) : "memory");
> > return reinterpret_cast<To*>(p);
> > }
>
> >From a C++ memory model perspective, there is no difference between,
> say, char16_t and short or int: They form their own aliasing domain.
> Converting the pointer with reinterpret_cast or something is NOT
> the problem; the problem is accessing the data before and after.
>
>
> The solution I'm thinking of is that the data can only be
> accessed through the returned pointer after the function call.
> Do you think that is more workable?

That's certainly a good idea, and fits nicely with the
situation with placement new.
(You should use the pointer returned by placement new, not
the one you stuck in, even though both may have the same
bit pattern.)

> There have been papers in the past that attempted to bless
> regions of memory with a different data type (e.g. to deal with
> mmapped file data); I think such a direction might be worthwhile
> to investigate.
>
>
> I certainly don't want to deal with a "solution" that covers
> char8_t / char16_t / char32_t only, if the underlying concerns
> are also applicable elsewhere.
>
>
> Even if there was a use case for a generalized solution, we need to do something specific for this as we have additional
> preconditions, namely that the input is a well-formed sequence of utf code units.

I'm not seeing how "well-formed sequence of UTF code units" could
possibly be a consideration for memory model issues.

Jens

Received on 2021-01-29 12:26:11