Ryan Houdek 67f2ff2e18 [AArch64] Move the 64bit floating point instructions to scalar.
Instead of doing vector operations and throwing away the top 64bits of each operation, let's instead use scalar operations.
On Cortex-A57 this saves us three cycles per vector operation changed to scalar, so this saves 3-9cycles per instruction emulated.
Also puts one less micro-op in to the vector pipeline there.
On the Nvidia Denver I couldn't see any noticeable performance difference, but it's a quirky architecture so it may be noticing we are throwing away
the top bits anyway and optimizing it. The world may never know what's truly happening there.
2015-01-20 16:35:08 -06:00
..
2014-12-10 20:45:45 +11:00