Instead of doing vector operations and throwing away the top 64bits of each operation, let's instead use scalar operations.
On Cortex-A57 this saves us three cycles per vector operation changed to scalar, so this saves 3-9cycles per instruction emulated.
Also puts one less micro-op in to the vector pipeline there.
On the Nvidia Denver I couldn't see any noticeable performance difference, but it's a quirky architecture so it may be noticing we are throwing away
the top bits anyway and optimizing it. The world may never know what's truly happening there.
We can compile with haptic support, and then not initialize due to haptics not being available.
So if we are compiling with haptics, test initializing with haptics and if that fails attempt to initialize without haptics before bailing out.
Allows the UI to easily check the current exclusive mode state.
This simplifies a few checks and prevents the user from ever getting stuck in fullscreen.
The maths appears to give crazy impossible answers without this fix, but the cause is all the ints being "promoted" to unsigned because of the single unsigned division at the end.
Don't change the texID depending on the tlut_hash for paletted textures that are efb copies and don't have an entry in the cache for texID ^ tlut_hash. This makes those textures less broken when using efb to texture.
This is not really fixing those textures, but it's a step forward. The mini map in Twilight Princess for example is in grayscales with this and is more or less usable.
For offsets that fit in the instruction encoding then we should just put it in the instruction encoding.
Saves an instruction in a large amount of loadstores.
Someone thought it would be a good idea to have the location as the first argument on the instruction.
Changed it to how it is supposed to be disassembled.
There are a couple things in this PR.
Fixes a bug where if we hit an invalid instruction we would infinite loop.
Fixes an issue where on AArch64 it would show invalid instructions for all NEON instructions.
This was due to asimd and crc being optional extensions and LLVM not enabling them by default.
So we have to specify a CPU which has the feature. LLVM 3.6 will let us select by features instead of CPUs, but we don't have a release of that quite
yet.
If we are on an architecture that has a known instruction size, we will continue onward after hitting the invalid instruction. If we don't have a
known instruction size like on x86, we will instead just dump the rest of the block.