C++ Logo

std-discussion

Advanced search

Re: reading values as bytes without memcpu, from enum unsigned char?

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Fri, 7 Aug 2020 14:30:03 -0400
On Fri, Aug 7, 2020 at 1:14 PM <language.lawyer_at_[hidden]> wrote:
>
> On 07/08/2020 20:03, Jason McKesson via Std-Discussion wrote:
> > On Fri, Aug 7, 2020 at 11:43 AM <language.lawyer_at_[hidden]> wrote:
> >>
> >> On 07/08/2020 18:16, Jason McKesson via Std-Discussion wrote:
> >>> On Fri, Aug 7, 2020 at 10:06 AM <language.lawyer_at_[hidden]> wrote:
> >>>>
> >>>> On 07/08/2020 16:51, Jason McKesson via Std-Discussion wrote:
> >>>>> The cast is legal. Accessing the `unsigned char` at that address is
> >>>>> legal.
> >>>>
> >>>> You can't access `unsigned char` objects, `reinterpret_cast<unsigned char*>(buf)` points to the object of `mybyte` type.
> >>>
> >>> According to [basic.lval]/8.8, I can:
> >>>
> >>>> If a program attempts to access the stored value of an object through a glvalue of other than one of the
> >>> following types the behavior is undefined:
> >>>> ...
> >>>> a char, unsigned char, or std::byte type.
> >>>
> >>> So it doesn't matter what `T1` is; you can access that byte as an
> >>> `unsigned char`. And indeed, if you could get a pointer to *any* byte
> >>> within `T1`, you can access it as an `unsigned char`.
> >>>
> >>> It's getting a pointer to those bytes that is the problem.
> >>
> >> That is what was meant by "you can't access `unsigned char` objects". You can't get a pointer to them.
> >
> > You can't get a pointer to any bytes *other than* the first. That's
> > why I said "those bytes".
> >
> >>>> But accessing an object of `mybyte` type through a glvalue of `unsigned char` type is indeed legal, because an enumeration with a fixed underlying type has the same values as the underlying type.
> >>>
> >>> Actually, that's not true. I mean, it is true that the value
> >>> representations are the same, but that doesn't mean you can access
> >>> them through different types. [basic.lval] has no special provisions
> >>> for an enum and its underlying type.
> >>
> >> I was speaking exactly about the case of `enum mybyte : unsigned char`.
> >> If there were no rule explicitly saying
> >>> For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type.
> >>
> >> accessing an object of `mybyte` type through an `unsigned char` glvalue would be UB because of [expr.pre]/4, even though [basic.lval] "allows" such access.
> >
> > [expr.pre] doesn't apply. Remember: I was talking *specifically* about
> > having a pointer of type `unsigned char` that's pointing to a
> > `mybyte`. If you have such a pointer that's valid (ie: you *didn't* do
> > the pointer arithmetic, and thus didn't invoke UB), then the act of
> > *accessing* it is valid. But it's only valid because you used
> > `unsigned char`; it would not be valid if the underlying type were not
> > a bytewise type and you casted it to that type.
> >
> > Just to spell it out, this is fine:
> >
> > ```
> > auto ptr = new AnyType(...);
> > auto byte_ptr = reinterpret_cast<unsigned char*>(ptr);
> > *byte_ptr;
> > ```
>
> *byte_ptr; doesn't access anything. lvalue-to-rvalue conversion is not applied to a discarded-value expression of non-volatile type.
> But anyway, assuming we have access there, that code can haz UB. More specific example:
> ```
> int i = UCHAR_MAX + 1;
> auto uc = *reinterpret_cast<unsigned char*>(&i); // UB because of [expr.pre]/4
> ```

I don't see how [expr]/4 applies here. You're accessing an `unsigned
char` in a place where there is no `unsigned char` object. There's no
expectation that this value won't be "mathematically defined" or
otherwise "not in the range of representable values for its type". So
it's not UB by [expr]/4.

The issue is that the value is not well-specified by the standard.
That is, the standard says that it's OK to access it, but it doesn't
say what accessing it *actually does* or what the actual value being
extracted means. That's not undefined behavior per-se; it's a hole in
the standard.

Received on 2020-08-07 13:33:35