sg12: Re: [ub] [c++std-ext-14592] Re: Re: Sized integer types and char bits

From: Lawrence Crowl <Lawrence_at_[hidden]>
Date: Sun, 27 Oct 2013 18:32:24 -0700

On Oct 27, 2013 3:46 PM, "Ion Gaztañaga" <igaztanaga_at_[hidden]> wrote:
> I think we have two separate issues here. We might have a different
> answer to each question:
>
> 1) One's complement, sign-magnitude / padding bits
>
> Non two's complement (with/without padding bits) machines seem to
> be old architectures that have survived until now in some sectors
> (government, financial, health) where backwards compatibility with
> old mainframes is important.
>
> Unisys was formed through a merger of mainframe corporations
> Sperry-Univac and Burroughs, so Clearpath systems are available in
> two variants: a UNISYS 2200-based system (Sperry, one's complement
> 36 bit machines) or an MCP-based system (Burroughs, sign-magnitude
> 48/8 bit machines). According to their website, new Intel based
> mainframes are being designed (so they'd need to emulate 1's
> complement / sign-magnitude behaviour though the compiler). They
> have no C++ compiler and Java is executed with additional 2's
> complement emulation code. They are migrating from custom ASICs to
> Intel processors
> (http://www.theregister.co.uk/2011/05/10/unisys_clearpath_mainframe/)
> so 2's complement will be faster in newer mainframes than 1's
> complement.

The value to us in examining these systems is not so much in
supporting their programmers, but in ensuring that we have a general
notion of integers and avoid group-thinking our way into definitions
that limit our future choices. IEEE floating-point has cleared out
inferior approaches, but also prevented superior approaches.

> I think requiring 2's complement in the long term would be a
> good idea, even in C, as no new architecture is using other
> representation and this simplifies teaching and programming in
> C/C++.

Why do you think intN_t is not sufficient? The interpretation is
that int means "representation is not relevant, add traps if you
want" while (e.g.) int32_t means "I care about representation".

> We could start having a ISO C macro (for C compatibility) to
> detect 2's complement at compile time and deprecate 1's complement
> a sign-magnitude representations for C++. If no one objects then
> only 2's complement could be allowed for the next standard.

WG21 cannot add a macro to ISO C. Did you mean laison work or did
you mean simply adding a macro?

> It would be interesting to have more guarantees on 2's complement
> systems (say no padding bits, other than in bool), but I don't know
> if that would be possible as I think there are Cray machines with
> padding bits in short/int pointers types:
> http://docs.cray.com/books/004-2179-001/html-004-2179-001/rvc5mrwh.html#QEARLRWH

It would certainly be possible, but it may the systems to define all
types to be as wide as their minimum bounding alignment. This
definition might be incompatible with existing practice. I am
frankly curious as to why Cray did not take this approach.

But note that there are additional implementation implications. If
you want an atomic<short> to be value-compatible with a short, but
the machine only supports atomic operations on full words, the atomic
needs to be implemented with compare-and-exchange on the containing
word. Hence, compare-and-exchange becomes required hardware.

> At least it would be interesting to have a simple way to detect
> types with padding bits.

It would be helpful, as types with padding bits cause issues when
used in atomics. In particular, the compare-exchange operation
compares the padding bits, and a loop may be needed to converge on
a representation.

> 2) CHAR_BITS > 8
>
> Architectures with CHAR_BIT > 8 are being designed these days and
> they have a very good reason to support only word (16-24-32 bit)
> based types: performance. Word-multiple memory accesses and
> operands simplify the design, speed-up and allow bigger caches and
> arithmetic units, they allow fetching several instructions and
> operands in parallel more easily and use every transistor to do
> what a DSP is supposed to do: very high-speed data processing.

Well, everyone wants very high-speed data processing. :-)

> These DSPs have modern C++ compilers (VisualDSP++ 5.0 C/C++
> Compiler Manual for SHARC Processors,
> http://www.analog.com/static/imported-files/software_manuals/50_21k_cc_man.rev1.1.pdf).
>
> "Analog Devices does not support data sizes smaller than the
> addressable unit size on the processor. For the ADSP-21xxx
> processors, this means that both short and char have the same size
> as int. Although 32-bit chars are unusual, they do conform to the
> standard"
>
> "All the standard features of C++ are accepted in the default mode
> except exception handling and run-time type identification because
> these impose a run-time overhead that is not desirable for all
> embedded programs. Support for these features can be enabled with
> the -eh and -rtti switches."
>
> In DSPs that can be configured in byte-addressing mode (instead of
> the default word-addressing mode) stdint.h types are accordingly
> defined (int8_t and friends only exist in in byte addressing mode).
> Example: TigerShard DSPs (VisualDSP++ for TigerSharc processors:
> http://www.analog.com/static/imported-files/software_manuals/50_ts_cc_man.4.1.pdf).
> Even pointer implementations are optimized for Word-addressing
> (taken from the C compiler manual):
>
> "Pointers
>
> The pointer representation uses the low-order 30 bits to address
> the word and the high-order two bits to address the byte within the
> word. Due to the pointer implementation, the address range in
> byte-addressing mode is 0x00000000 to 0x3FFFFFFF.
>
> The main advantage of using the high-order bits to address the
> bytes within the word as opposed to using the low-order bits is
> that all pointers that address word boundaries are compatible with
> existing code. This choice means there is no performance loss when
> accessing 32-bit items.
>
> A minor disadvantage with this representation is that address
> arithmetic is slower than using low-order bits to address the bytes
> within a word when the computation might involve part-word offsets."

This was true of historical word-addressed machines as well.

> I think banning or deprecating systems with CHAR_BIT != 8 would be
> a very bad idea as C++ is a natural choice for high-performance
> data/signal processors.

Agreed. But also, it turns out, a UTF-12 and UTF-24 are pretty
good at representing Unicode.

On 10/27/13, Jeffrey Yasskin <jyasskin_at_[hidden]> wrote:
> I think I agree with everything in Ion's email, especially that the
> ability to detect padding bits is useful, and that banning
> CHAR_BIT>8 is probably a bad idea.
>
> One wrinkle in the goal to standardize 2's complement is the
> ability to reinterpret the bytes of a negative integer as an array
> of chars:

unsigned chars?

> that seems harder to emulate, and possibly less needed to
> allow programs to have portable behavior. If we have functions to
> serialize and deserialize as 2's-complement byte arrays, we may not
> need the ability to memcpy as them. 2's-complement behavior in
> conversions and bitwise operations may be enough.

Unless we are going to make the radical step of making the built-in
integer types non-trivial, we get memcpy for free.

-- 
Lawrence Crowl

Received on 2013-10-28 02:32:26