std-discussion: Re: Why is size_type in std::array an alias for size

From: Thiago Macieira <thiago_at_[hidden]>
Date: Fri, 15 Nov 2019 07:48:05 -0800

On quinta-feira, 14 de novembro de 2019 11:11:40 PST Wilhelm Meier via Std-
Discussion wrote:
> >> std::array<A, 5>::size_type y{}; // uint8_t
> >
> > Why would you want that?
>
> Why not? Why should the type bigger than neccessary?

Because it might be more efficient.

> > Don't answer "microcontrollers" and "performance", since uint8_t will get
> > promoted to int when used and return types are returned in register-sized
> > chunks.
>
> Consider this very simple example:
>
> #include <limits>
> #include <array>
>
> std::array<uint8_t, 10> a {1, 2, 3};
>
> volatile uint8_t r;
>
> int main() {
> for(decltype(a)::size_type i{0}; i < a.size(); ++i) {
> r; // prevent the loop from being optimized away
> }
>
> for(uint16_t i{0}; i < a.size(); ++i) {
> r; // dito
> }
> }

I would expect to see identical code generation. Indeed it does:
https://gcc.godbolt.org/z/Xjccf_

The Clang code is identical in all (it unrolled, your test is no good). MSVC
used rax (64-bit) throughout, ICC used %eax (32-bit) throughout except for the
f<uint64_t>() case where it used %al (8-bit). GCC is the only one out of the
three to actually do what you expected and use different-sized registers, and
still note how it initialised the %eax register in all cases: movl, not movb,
movw or movq.

Let's see what GCC does in architectures where registers don't change sizes:
https://gcc.godbolt.org/z/4KBTgz

In the ARM 32-bit case, it used r3 (32-bit) in all cases. For ARM 64-bit, it
used w0 in all cases except f<uint64_t>(), in which case it used x0 (64-bit).
On MIPS64, it used $2 throughout, but on the f<uint64_t>() function, it used
daddiu to subtract (64-bit), but addiu (32-bit) on the other three. Finally,
on MSP430, it used register R12, but used MOV.B and ADD.B only on the uint8_t
case. Still, it used CMP.W in all cases.

> The first loop is with 1-Byte unsigned arithmetic and the second loop is
> with 2-byte unsigned arithmetic (g++ -Os for AVR) as show below:

Sorry, AVR isn't working on Godbolt, or I'd have tried it too.

> 000000de <main>:
> de: 8a e0 ldi r24, 0x0A ; 10
> e0: 90 91 0a 28 lds r25, 0x280A ; 0x80280a <__data_end>
> e4: 81 50 subi r24, 0x01 ; 1
> e6: e1 f7 brne .-8 ; 0xe0 <main+0x2>
> e8: 8a e0 ldi r24, 0x0A ; 10
> ea: 90 e0 ldi r25, 0x00 ; 0
> ec: 20 91 0a 28 lds r18, 0x280A ; 0x80280a <__data_end>
> f0: 01 97 sbiw r24, 0x01 ; 1
> f2: e1 f7 brne .-8 ; 0xec <main+0xe>
> f4: 90 e0 ldi r25, 0x00 ; 0
> f6: 80 e0 ldi r24, 0x00 ; 0
> f8: 08 95 ret

The code size appears to be identical in both loops. The only difference is
subi vs subiw and as I've shown above, GCC is the only one of the major three
compilers to obey your type. Since you care about code size and the code size
is identical for both, then your own example shows that it doesn't matter the
choice anyway.

> >> In highly templated code the size_type "propagates" through the
> >> templates and produces optimal code by choosing the "right" type in all
> >> places.
> >
> > Using a register-sized type is "highly optimised".
>
> In microcontroller-aplications I want to choose the type as small as
> possible and as big as neccessary to give the compiler the best
> information to optimize so code. And in my experience this is extremely
> valuable.

Then do so. We don't need std::array to cater for your specific use-case.
Which you've not given an example of.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2019-11-15 09:50:27