sg12: Re: [ub] [c++std-ext-14592] Re: Re: Sized integer types and char bits

From: Lawrence Crowl <Lawrence_at_[hidden]>
Date: Sun, 27 Oct 2013 17:07:13 -0700

On 10/26/13, Ion Gaztañaga <igaztanaga_at_[hidden]> wrote:
> El 26/10/2013 22:51, John Regehr escribió:
>>> ... there is no
>>> representation change when converting a signed int value to unsigned
>>> int or when converting an unsigned int value to signed int.
>>
>> Wow-- anyone care to guess what fraction of existing C programs run
>> correctly under these conditions?
>
> If you think that's strange you got to read this: Unisys' servers with
> sign-magnitude 48 bit integers with 8 padding bits and 8 bit two's
> complement character types ("ClearPath MCP 15.0 operating system for
> their MCP Serverss",
> http://public.support.unisys.com/search/DocumentationSearch.aspx?ID=750&pla=ps&nav=ps)
>
> They support many programming languages and even partial POSIX support.
> See "Application Development" here:
> http://public.support.unisys.com/aseries/docs/clearpath-mcp-15.0/pdf/70118328-104.pdf
>
> I'm sorry to copy-paste so many lines from the C manual, but I think
> it's interesting to read that really unusual C implementations are still
> produced and maintained.
>
> -------------------------------------------------
> -------------------------------------------------
> BEGINNING OF C COMPILER MANUAL QUOTES
> -------------------------------------------------
> -------------------------------------------------
>
> C compiler manual:
>
> http://public.support.unisys.com/aseries/docs/clearpath-mcp-15.0/pdf/86002268-205.pdf
>
> --------------------------------------------------
>
> Type Specifier / Description
>
> char, unsigned char / Represents an unsigned
> whole number in 8 bits (1 byte).
>
> signed char Represents a signed whole number in 8 bits (1 byte).
>
> float / Represents a real number in 48 bits (1 word).
>
> double / Represents a real number in 48 bits (1 word) or a real
> number in 96 bits (2 words), depending on the value of the
> DBTOSNGL compiler control option. The default is float.
>
> long double Represents a real number in 96 bits (2 words).

Note that there are decent arguments that these representations
are better than 32 and 64 bits.

>
> int, signed, signed int, short int, signed short int, long int,
> signed long int / Represents a signed whole number in 48 bits
> (1 word).
>
> unsigned, unsigned int, unsigned short int, unsigned long int, /
> Represents an unsigned whole number in 48 bits (1 word).

Pretty classic for a word-addressed machine.

>
> ------------------------------------------------
>
> Char Types
>
> Characters are 8 bits wide. The plain char type is unsigned by
> default. This default can be changed to be signed by the
> $PORT (SIGNEDCHAR) option. This affects all variables of type
> char, even in arrays and structures.
>
> Signed characters are stored in two’s-complement format. The
> values 0 through have the same bit pattern for signed and
> unsigned types. Unsigned characters are stored in two’s
> complement format if the $PORT (CHAR2) option is enabled.

How would the representation of an unsigned number change with this
option?

>
> ...
>
> The default character set used at run time is EBCDIC. This can be
> changed by the $ SET STRINGS=ASCII option. When ASCII is set, all
> characters are stored using the ASCII character set and all I/O is
> translated to or from ASCII if necessary.

This must imply some wierd mappings depending on the EBCDIC code
page.

>
> Six characters are stored for each word instead of the usual four
> or two. It is not valid to compare multiple characters at once by
> casting a character pointer into an integer pointer and doing
> integer comparisons. This comparison results in a run-time error
> if the BOUNDS(ALIGNMENT) compiler control option is set; otherwise
> undefined behavior is likely to occur.

Again, common with word addressed machines.

>
> ------------------------------------------------
>
> Integer Types
>
> Integer type representation differs between A Series C and C
> language on most other machines. A Series C uses a signed-magnitude
> representation for integers instead of two’s-complement
> representation. Furthermore, A Series C integers use only 40 of the
> 48 bits in the word: a separate sign bit and the low order 39 bits
> for the absolute value.

I believe the bits are unused because the machine has a unified
integer and floating number format and the remaining bits are the
exponent floating point numbers.

> Unsigned types in A Series C use the same representation as signed
> types, except that the sign bit is always zero. Negative values,
> when casted to an unsigned type, are added to (INT_MAX+1), producing
> a value within the signed integer range.

I think they meant to say "within the unsigned integer range".
This behavior is correct.

> This value does not change when cast back to a signed type.

Right, because the number of significant bits is the same in both
signed and unsigned.

> The types short, int, and long are all the same size.

> Bit operations (bitwise AND, OR, exclusive OR, and NOT) on signed
> values affect only the 40 bits used by integers. Bit operations on
> unsigned values conform to the mathematical definitions given in the
> ANSI C standard. Because the sign bit is not adjacent to the other
> bits, it is not possible to shift into or out of the sign bit.

Which, I think conflicts with the recent change we made to allow
shifting into the sign bit. I was never comfortable with that change
because it broke the model of shift being isomorphic to multiplication
by a power of two.

Note also, that because you cannot shift out of the sign bit, the
right shift rounds towards zero rather than towards negative infinity
(as most current machines do). I actually think this difference in
behavior is the hardest one to accomodate.

> Operations on unsigned integer types are more expensive than on
> signed types. The $RESET PORT (UNSIGNED) option makes unsigned
> equivalent to signed types and should be used on programs that do
> not depend upon the wraparound or bit operation properties of
> unsigned types.

In other words, they have implemented the "overflow undefined
unsigned integer" type that I have been wanting.

Porting a program with a global option like that is likely not
trivial. We really need two separate types.

> By default, bit fields in structures or unions that are of type
> plain int are unsigned. The default can be changed to signed by the
> $PORT (SIGNEDFIELD) option.

We have been changing things here too.

-- 
Lawrence Crowl

Received on 2013-10-28 01:07:16