std-discussion: Re: reading values as bytes without memcpu, from enum unsigned char?

From: Thiago Macieira <thiago_at_[hidden]>
Date: Mon, 10 Aug 2020 09:17:25 -0700

On Friday, 7 August 2020 12:12:22 PDT language.lawyer--- via Std-Discussion
wrote:
> >> ```
> >> int i = UCHAR_MAX + 1;
> >> auto uc = *reinterpret_cast<unsigned char*>(&i); // UB because of
> >> [expr.pre]/4 ```
> >
> > I don't see how [expr]/4 applies here. You're accessing an `unsigned
> > char` in a place where there is no `unsigned char` object. There's no
> > expectation that this value won't be "mathematically defined" or
> > otherwise "not in the range of representable values for its type". So
> > it's not UB by [expr]/4.
>
> See my analysis of a similar code
> https://lists.isocpp.org/sg12/2019/06/0802.php

I'm not sure if it's the same thing. In that example, by attempting to bind a
reference to something that is of the wrong type, the language could say that
there's an implicit conversion and you bind to the temporary resulting from
that. Which would make your code ill-formed because you can't bind a non-const
lvalue to a prvalue.

Since that isn't the case (it is valid code), we have to see it the same way
as:

unsigned u = *reinterpret_cast<unsigned *>(&i);

And that's still different from the above because of the type. One is unsigned
char (a byte) which is allowed by the standard to alias everything. The other
isn't.

> > The issue is that the value is not well-specified by the standard.
>
> The value is 100% specified by the standard. It is the value of `i`, which
> is `UCHAR_MAX + 1`.

If you meant that the value of i is the value of i, then it's tautological.

If you meant that the value of a uchar resulting from this conversion should
be defined by the standard, then it's impossible nonsense. The value is
neither specified nor is it today uchar(UCHAR_MAX + 1) == 1 on a good number
of machines (it's 0).

The result of the reading off a byte should be unspecified. Well-defined,
consistent, machine-dependent value that cannot be an invalid value, but not
specified by the standard.

At most, if we want to move the standard, we could specify how unsigned
integer types are organised in terms of endianness. Specifying signed integer
ones requires another step before, which is specifying how they are stored in
memory in the first place (the standard requires that they behave as two's
complement, but not that they be stored as such). This requires a paper or
two.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DPG Cloud Engineering

Received on 2020-08-10 11:20:51