C++ Logo

std-discussion

Advanced search

Re: reading values as bytes without memcpu, from enum unsigned char?

From: language.lawyer_at <language.lawyer_at_[hidden]>
Date: Fri, 7 Aug 2020 20:14:26 +0300
On 07/08/2020 20:03, Jason McKesson via Std-Discussion wrote:
> On Fri, Aug 7, 2020 at 11:43 AM <language.lawyer_at_[hidden]> wrote:
>>
>> On 07/08/2020 18:16, Jason McKesson via Std-Discussion wrote:
>>> On Fri, Aug 7, 2020 at 10:06 AM <language.lawyer_at_[hidden]> wrote:
>>>>
>>>> On 07/08/2020 16:51, Jason McKesson via Std-Discussion wrote:
>>>>> The cast is legal. Accessing the `unsigned char` at that address is
>>>>> legal.
>>>>
>>>> You can't access `unsigned char` objects, `reinterpret_cast<unsigned char*>(buf)` points to the object of `mybyte` type.
>>>
>>> According to [basic.lval]/8.8, I can:
>>>
>>>> If a program attempts to access the stored value of an object through a glvalue of other than one of the
>>> following types the behavior is undefined:
>>>> ...
>>>> a char, unsigned char, or std::byte type.
>>>
>>> So it doesn't matter what `T1` is; you can access that byte as an
>>> `unsigned char`. And indeed, if you could get a pointer to *any* byte
>>> within `T1`, you can access it as an `unsigned char`.
>>>
>>> It's getting a pointer to those bytes that is the problem.
>>
>> That is what was meant by "you can't access `unsigned char` objects". You can't get a pointer to them.
>
> You can't get a pointer to any bytes *other than* the first. That's
> why I said "those bytes".
>
>>>> But accessing an object of `mybyte` type through a glvalue of `unsigned char` type is indeed legal, because an enumeration with a fixed underlying type has the same values as the underlying type.
>>>
>>> Actually, that's not true. I mean, it is true that the value
>>> representations are the same, but that doesn't mean you can access
>>> them through different types. [basic.lval] has no special provisions
>>> for an enum and its underlying type.
>>
>> I was speaking exactly about the case of `enum mybyte : unsigned char`.
>> If there were no rule explicitly saying
>>> For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type.
>>
>> accessing an object of `mybyte` type through an `unsigned char` glvalue would be UB because of [expr.pre]/4, even though [basic.lval] "allows" such access.
>
> [expr.pre] doesn't apply. Remember: I was talking *specifically* about
> having a pointer of type `unsigned char` that's pointing to a
> `mybyte`. If you have such a pointer that's valid (ie: you *didn't* do
> the pointer arithmetic, and thus didn't invoke UB), then the act of
> *accessing* it is valid. But it's only valid because you used
> `unsigned char`; it would not be valid if the underlying type were not
> a bytewise type and you casted it to that type.
>
> Just to spell it out, this is fine:
>
> ```
> auto ptr = new AnyType(...);
> auto byte_ptr = reinterpret_cast<unsigned char*>(ptr);
> *byte_ptr;
> ```

*byte_ptr; doesn't access anything. lvalue-to-rvalue conversion is not applied to a discarded-value expression of non-volatile type.
But anyway, assuming we have access there, that code can haz UB. More specific example:
```
int i = UCHAR_MAX + 1;
auto uc = *reinterpret_cast<unsigned char*>(&i); // UB because of [expr.pre]/4
```
This has been discussed in std-discussion, mb not even once.

> What is not fine in C++ as it currently stands is trying to do pointer
> arithmetic on `byte_ptr`.
>
>> Your
>>> The cast is legal. Accessing the `unsigned char` at that address is legal. But without P1839, actually doing the pointer arithmetic needed to access more than one such byte is not.
>>
>> sounds like that now, without P1839, there is some "access to one byte", which is not the case.
>
> [basic.lval]/8.8 says otherwise.

[basic.lval] only says when you will definitely have UB. But it doesn't guarantee that you won't have UB even when you use "the right" types.

Received on 2020-08-07 12:17:51