Ryan Houdek 67f2ff2e18 [AArch64] Move the 64bit floating point instructions to scalar.
Instead of doing vector operations and throwing away the top 64bits of each operation, let's instead use scalar operations.
On Cortex-A57 this saves us three cycles per vector operation changed to scalar, so this saves 3-9cycles per instruction emulated.
Also puts one less micro-op in to the vector pipeline there.
On the Nvidia Denver I couldn't see any noticeable performance difference, but it's a quirky architecture so it may be noticing we are throwing away
the top bits anyway and optimizing it. The world may never know what's truly happening there.
2015-01-20 16:35:08 -06:00
..
2015-01-06 19:54:12 -05:00
2015-01-03 13:17:57 +01:00
2014-12-13 20:59:40 -08:00
2014-05-26 21:28:59 -07:00