Date: Wed, 12 Apr 2023 20:57:15 +0200
E.g. for ARM Neon one has to choose in the opcodes beforehand between
- fast data access with hints, but Data Abort Exception for unaligned accesses
- always not-so-fast data access without hints, but tolerating unaligned accesses
https://developer.arm.com/documentation/den0018/a/NEON-Instruction-Set-Architecture/Alignment
Probably most programmers specifically demanding SIMD accesses from the compiler would also strive for the faster data path. Otherwise one could either rely on the optimizer or specify the operations more abstract (e.g. not for a fixed SIMD width like float4).
-----Ursprüngliche Nachricht-----
Von:Thiago Macieira via Std-Proposals <std-proposals_at_[hidden]>
Gesendet:Mi 12.04.2023 20:42
Betreff:Re: [std-proposals] SIMD by just operating on 2 arrays
An:std-proposals_at_[hidden];
CC:Thiago Macieira <thiago_at_[hidden]>;
On Wednesday, 12 April 2023 13:15:46 -03 samuel ammonius via Std-Proposals
wrote:
> I think a vectorized CPU would still be able to copy the elements of an
> unaligned array with the same speed though, right?
That depends on the CPU architecture, micro-architecture and generation. The
original SSE (which supported float[4]) took twice as long to perform an
unaligned load as an aligned one.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel DCAI Cloud Engineering
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2023-04-12 18:57:18