sg12: Re: [ub] [c++std-ext-14592] Re: Re: Sized integer types and char bits

From: Ion Gaztañaga <igaztanaga_at_[hidden]>
Date: Sat, 26 Oct 2013 23:50:13 +0200

El 26/10/2013 22:51, John Regehr escribió:
>> ... there is no
>> representation change when converting a signed int value to unsigned int
>> or when converting an unsigned int value to signed int.
>
> Wow-- anyone care to guess what fraction of existing C programs run
> correctly under these conditions?

If you think that's strange you got to read this: Unisys' servers with
sign-magnitude 48 bit integers with 8 padding bits and 8 bit two's
complement character types ("ClearPath MCP 15.0 operating system for
their MCP Serverss",
http://public.support.unisys.com/search/DocumentationSearch.aspx?ID=750&pla=ps&nav=ps)

They support many programming languages and even partial POSIX support.
See "Application Development" here:
http://public.support.unisys.com/aseries/docs/clearpath-mcp-15.0/pdf/70118328-104.pdf

I'm sorry to copy-paste so many lines from the C manual, but I think
it's interesting to read that really unusual C implementations are still
produced and maintained.

-------------------------------------------------
-------------------------------------------------
BEGINNING OF C COMPILER MANUAL QUOTES
-------------------------------------------------
-------------------------------------------------

C compiler manual:

http://public.support.unisys.com/aseries/docs/clearpath-mcp-15.0/pdf/86002268-205.pdf

--------------------------------------------------

Type Specifier / Description

char, unsigned char / Represents an unsigned
whole number in 8 bits (1 byte).

signed char Represents a signed whole number in 8 bits (1 byte).

float / Represents a real number in 48 bits (1 word).

double / Represents a real number in 48 bits (1 word) or a real number
in 96 bits (2 words), depending on the value of the
DBTOSNGL compiler control option. The default is float.

long double Represents a real number in 96 bits (2 words).

int, signed, signed int, short int, signed short int, long int, signed
long int / Represents a signed whole number in 48 bits (1 word).

unsigned, unsigned int, unsigned short int, unsigned long int, /
Represents an unsigned whole number in 48 bits (1 word).

------------------------------------------------

Char Types

Characters are 8 bits wide. The plain char type is unsigned by default.
This default can be changed to be signed by the $PORT (SIGNEDCHAR)
option. This affects all variables of type char, even in arrays and
structures.

Signed characters are stored in two’s-complement format. The values 0
through have the same bit pattern for signed and unsigned types.
Unsigned characters are stored in two’s complement format if the $PORT
(CHAR2) option is enabled.

...

The default character set used at run time is EBCDIC. This can be
changed by the $ SET STRINGS=ASCII option. When ASCII is set, all
characters are stored using the ASCII character set and all I/O is
translated to or from ASCII if necessary.

Six characters are stored for each word instead of the usual four or
two. It is not valid to compare multiple characters at once by casting a
character pointer into an integer pointer and doing integer comparisons.
This comparison results in a run-time error if the BOUNDS(ALIGNMENT)
compiler control option is set; otherwise undefined behavior is likely
to occur.

------------------------------------------------

Integer Types

Integer type representation differs between A Series C and C language on
most other machines. A Series C uses a signed-magnitude representation
for integers instead of two’s-complement representation. Furthermore, A
Series C integers use only 40 of the 48 bits in the word: a separate
sign bit and the low order 39 bits for the absolute value.

Unsigned types in A Series C use the same representation as signed
types, except that the sign bit is always zero. Negative values, when
casted to an unsigned type, are added to (INT_MAX+1), producing a value
within the signed integer range. This value does not change when cast
back to a signed type.

The types short, int, and long are all the same size.

Bit operations (bitwise AND, OR, exclusive OR, and NOT) on signed values
affect only the 40 bits used by integers. Bit operations on unsigned
values conform to the mathematical definitions given in the ANSI C
standard. Because the sign bit is not adjacent to the other bits, it is
not possible to shift into or out of the sign bit.

Operations on unsigned integer types are more expensive than on signed
types. The $RESET PORT (UNSIGNED) option makes unsigned equivalent to
signed types and should be used on programs that do not depend upon the
wraparound or bit operation properties of unsigned types.

By default, bit fields in structures or unions that are of type plain
int are unsigned. The default can be changed to signed by the $PORT
(SIGNEDFIELD) option.

------------------------------------------------

Floating Types

By default, double type is the same size and range as floattype. Note
that the A Series floattype has about 11 digits of precision. The
default can be changed to be the same size and range of long doubletype
by the $RESET DBLTOSNGL option (Double to Single)

------------------------------------------------

Pointer Types

Pointers are internally stored as integer values. A pointer to a char is
the number of bytes from the start of addressable memory (the C heap),
not the machine memory. A pointer to an int or a float is the number of
words, and a pointer to a long double is the number of double words,
from the start of addressable memory. Implicit and explicit casts
between pointers of different types adjust the value. Casts that are
invisible to the compiler must be avoided, such as an invisible cast
declaring a prototype to an external procedure as taking a char*
parameter, but defining the procedure in another compilation unit as
taking an int* parameter. See “Pointer Alignment” in this section for
the implication of implicit and explicit casts between pointers of
differing types.

Any pointer that is itself pointed at or any pointer stored in an array,
structure, or union is always stored as if a void cast were done. These
pointers may be cast safely, either visibly or invisibly.

Pointer arguments to old-style functions are always passed as if a void*
cast were done.

The allocation of objects is not necessarily consecutive. Bumping a
pointer beyond the end of an object does not cause the pointer to point
to the next object declared. This is especially true for function
arguments.

Problems with implicit and explicit casts between pointers of different
types can possibly be avoided through use of the $BYTEADDRESS compiler
control option.

------------------------------------------------

Common Portation Problems

You might encounter the following problems when porting C code to an
enterprise server:

1. Pointer alignment — Many programs assume that all pointers can be
treated in the same manner. You can detect these problems at run-time by
setting the $BOUNDS(ALIGNMENT) compiler control option.

2. Signed magnitude representation — Encryption (and other) algorithms
might assume that integers are stored in a particular format.

3. Unsigned types — Since the enterprise server stores the sign bit
separately from the data, different behavior can occur when casting
between signed and unsigned types. When you are porting an application
for the first time, it is strongly recommended that you set the
BOUNDS(ALIGNMENT) compiler control option. When the application is fully
tested, the option can be reset to remove the performance penalty
associated with its use.

------------------------------------------------

Two’s Complement Arithmetic

Two’s complement arithmetic is used on many platforms. On A Series
systems, arithmetic is performed on data in signed-magnitude form. This
can cause discrepancies in algorithms that depend on the two’s
complement representation. For example, C applications that use
encryption algorithms to match data, such as passwords, between client
and server must perform the encryption the same way on the client-end
and the server-end. The differences between the two’s complement and
signed-magnitude representation may result in different values when
fragments of data are extracted, encrypted, and reinserted into the data.

To obtain matching results, you can define macros that return arithmetic
results in two’s complement form. The following example illustrates
macros for two’s complement addition and subtraction:

#define tc_add(arg1, arg2) (((arg1) + (arg2)) & 0xFF)
#define tc_sub(arg1, arg2) (((arg1) + (0x100 - (arg2))) & 0xFF)

-------------------------------------------------
-------------------------------------------------
END OF C COMPILER MANUAL QUOTES
-------------------------------------------------
-------------------------------------------------

Happy compiler porting ;-)

Ion

Received on 2013-10-26 23:50:35