18406 Commits

Author SHA1 Message Date
Tillmann Karras
6ec4bdf862 CoreTiming: remove unused functions 2015-08-26 15:40:15 +02:00
Tillmann Karras
0f4861cac2 CoreTiming: make loops easier to read 2015-08-26 14:53:58 +02:00
Markus Wick
a36b135473 Merge pull request #2911 from Sonicadvance1/aarch64_regcache_optimization
[AArch64] Optimize paired registers being used in double operations.
2015-08-26 13:10:04 +02:00
Ryan Houdek
ca51f1a4f6 [AArch64] Optimize paired registers being used in double operations.
In particular this optimizes the case where a 32bit float is loaded via lfs, and then used in double operations.
This happens very often in Gekko based code because the best way to load a 32bit value as a double is lfs since it automatically turns in to a double value.

There are a few other implications of this in practice as well. Like if both of the paired registers are loaded via psq_l and then used in double
operations it would be improved.
Also if we implement a double register we've got to be careful to make sure we understand if it is in "lower" register or the full 128bit register.
2015-08-26 05:50:04 -05:00
Markus Wick
5716d18d10 Merge pull request #2910 from Sonicadvance1/aarch64_regcache_fix
[AArch64] Fix a bug in the register cache.
2015-08-26 08:31:24 +02:00
Ryan Houdek
4f5f29a0fb [AArch64] Fix a bug in the register cache.
If the register was only a lower pair and it needed the full register, then we need to load the high 64bits.
Which we weren't doing before.
2015-08-26 01:21:43 -05:00
Markus Wick
43d17cb360 Merge pull request #2904 from Sonicadvance1/aarch64_more_inst
[AArch64] Implement fdivx/fdivsx/mfcr/mtcrf.
2015-08-26 07:48:24 +02:00
Tillmann Karras
ee4a12ffe2 Jit64: some byte-swapping changes 2015-08-26 05:41:18 +02:00
flacs
6015e2d812 Merge pull request #2900 from aroulin/x64emitter-rcp
x64Emitter: add RCPPS and RCPSS SSE instructions
2015-08-26 05:05:53 +02:00
Ryan Houdek
6729a36d8d [AArch64] Set BindToRegister's to_load correctly for double FP ops. 2015-08-25 21:29:27 -05:00
flacs
9ba7497a3f Merge pull request #2907 from lioncash/log
GCMemcard: Clean up memcard logging messages.
2015-08-26 04:03:07 +02:00
Lioncash
db4f692482 GCMemcard: Clean up memcard logging messages. 2015-08-25 21:55:52 -04:00
Tillmann Karras
ee50a2ef28 Jit64: fix bugs in the FPSCR instructions 2015-08-25 23:48:14 +02:00
Markus Wick
bd08c1b01a Merge pull request #2901 from Sonicadvance1/aarch64_stfiwx
[AArch64] Implement stfiwx
2015-08-25 22:47:39 +02:00
Markus Wick
24cb650078 Merge pull request #2663 from degasus/dcbx
Jit64: dcbf + dcbi
2015-08-25 12:16:56 +02:00
Ryan Houdek
0666c0750b [AArch64] Implement fdivx/fdivsx/mfcr/mtcrf.
Gets the povray bench to better times than the Wii.
2015-08-24 15:32:19 -05:00
Ryan Houdek
d96be9250c Merge pull request #2899 from Sonicadvance1/aarch64_fctiwzx
[AArch64] Implement fctiwzx
2015-08-24 13:22:27 -05:00
Ryan Houdek
cd03b8baf6 Merge pull request #2895 from Sonicadvance1/qualcomm_workaround_gles31
Disable OpenGL ES 3.1 on all Qualcomm Adreno devices.
2015-08-24 13:22:12 -05:00
degasus
0d92c8fb89 Jit64: Optimize dcbx 2015-08-24 18:33:23 +02:00
Tillmann Karras
ac84d6d0fa Jit64: some cache flush changes
- dynamically allocate third scratch register instead of forcing ECX
- use LEA as 3 operand add if possible
- use BT,JC instead of SHR,TEST,JNZ
- merge MOV,TEST
- use appropriate ABI function (no asm change)
2015-08-24 18:33:23 +02:00
degasus
6f34b27323 Jit64: implement dcbf + dcbi 2015-08-24 18:33:19 +02:00
Markus Wick
0ad6fa8f62 Merge pull request #2903 from lioncash/cast
Memmap: Remove pointer casts
2015-08-24 15:42:56 +02:00
Lioncash
abd3b124be Memmap: Remove pointer casts 2015-08-24 09:07:09 -04:00
flacs
4baf3e10c6 Merge pull request #2902 from Tilka/fpscr
Jit64: quickfix for mtfsfx
2015-08-24 13:19:26 +02:00
Tillmann Karras
33eefc2d86 Jit64: quickfix for mtfsfx 2015-08-24 12:12:31 +02:00
Ryan Houdek
d3176fe22a [AArch64] Implement stfiwx
Improves povray performance by ~4%
2015-08-24 01:10:55 -05:00
Ryan Houdek
80fa9af9b1 Merge pull request #2898 from degasus/linking
JitArm64: Faster linking of continuous blocks
2015-08-23 18:09:02 -05:00
Markus Wick
8bc311ab3c Merge pull request #2897 from degasus/arm
JitArm64: Implement subfex, subfcx, addex, subfic, divwux, srwx
2015-08-23 23:52:35 +02:00
degasus
7320d519b4 JitArm64: Implement srwx 2015-08-23 23:29:48 +02:00
degasus
4722a69fd0 JitArm64: Implement divwux 2015-08-23 23:29:18 +02:00
degasus
9e4366963c JitArm64: Implement subfic 2015-08-23 23:29:07 +02:00
degasus
95be17772f JitArm64: Implement addex 2015-08-23 23:29:02 +02:00
degasus
025e7c835a JitArm64: Implement subfcx 2015-08-23 23:28:28 +02:00
degasus
550a90e691 JitArm64: Implement subfex 2015-08-23 23:28:24 +02:00
Ryan Houdek
561744819e [AArch64] Implement fctiwzx
Improves the povray benchmark time by 5.6%
2015-08-23 15:35:18 -05:00
Ryan Houdek
4fa23abbe1 [AArch64] Implement MOVI and ORR(imm) in the NEON emitter. 2015-08-23 15:34:53 -05:00
aroulin
0a0e012fab x64Emitter: add RCPPS and RCPSS SSE instructions 2015-08-23 16:59:27 +02:00
degasus
77a6798094 JitArm64: Faster linking of continuous blocks 2015-08-23 14:44:23 +02:00
Markus Wick
73067b1ef1 Merge pull request #2888 from degasus/jit64
Jit64: Faster linking of continuous blocks
2015-08-23 13:24:15 +02:00
Lioncash
2a1abf8dd6 Merge pull request #2896 from lioncash/using
Core: Minor CPU core typedef cleanup
2015-08-22 19:00:23 -04:00
Ryan Houdek
cc3fb7e7b4 Merge pull request #2883 from degasus/master
Profiler: Sort output by total time
2015-08-22 17:52:54 -05:00
Markus Wick
8b881a6c34 Merge pull request #2891 from Sonicadvance1/aarch64_implement_crxxx
[AArch64] Implement the cr instructions
2015-08-23 00:44:47 +02:00
Lioncash
fdafa5d063 Core: Move includes out of instruction table headers
These aren't necessary (and cause unnecessary indirect inclusions).
2015-08-22 14:15:02 -04:00
Lioncash
a248a4d2ce Jit64/JitIL: Relocate instruction typedefs 2015-08-22 14:15:00 -04:00
Lioncash
c56717e058 Core: Shorten the _interpreterInstruction typedef
The class itself already acts as a namespace trailer, so '_interpreter'
isn't necessary. This also gets rid of a duplicate typedef in the
Interpreter_Tables.
2015-08-22 14:14:49 -04:00
Ryan Houdek
b4e4a4cef4 Disable OpenGL ES 3.1 on all Qualcomm Adreno devices.
Their new driver that supports GLES3.1 + AEP has issues with it.
At the very least they don't implement all of the geometry shader features fully which causes shader linker issues when we attempt to use them.
I don't have a device so I can't fully test, so until I do I'm going to blanket disable the whole thing.
2015-08-22 09:12:19 -05:00
Markus Wick
a39c0910c4 Merge pull request #2893 from Sonicadvance1/aarch64_memory_base_register
[AArch64] Use a register as a constant for the memory base.
2015-08-22 15:41:57 +02:00
Ryan Houdek
dba579c52f [AArch64] Use a register as a constant for the memory base.
Removes a /lot/ of redundant movk operations in fastmem loadstores.
Improves performance of the povray bench by ~5%
2015-08-22 08:36:34 -05:00
Markus Wick
3f5ff98c1b Merge pull request #2890 from lioncash/ptr
x64Emitter: Remove pointer casts from Write{8,16,32,64} functions
2015-08-22 10:09:28 +02:00
Markus Wick
2d505bc2a6 Merge pull request #2894 from Sonicadvance1/no_more_eaten_canary
Fix the shader overrunning our max shader size.
2015-08-22 10:08:14 +02:00