1194 Commits

Author SHA1 Message Date
shuffle2
2cb5c41fed HostGetString: Actually fill a string with data 2015-09-28 05:40:52 +02:00
Lioncash
19ac565e0d Common: Move asserts to their own header 2015-09-26 18:51:27 -04:00
Lioncash
bddcdd9d94 Jit_Util: Replace two MDisp usages with MatR
Same thing, less to read.
2015-09-21 08:20:35 -04:00
Lioncash
9f389fdccb Gekko: Make sign-extension functions constexpr 2015-09-18 11:14:45 -04:00
Lioncash
00ffc47751 Jit_Util: Mark a class function as const 2015-09-17 00:21:50 -04:00
Lioncash
c6ea9eb7c3 JitCache: Remove unused define 2015-09-16 19:15:47 -04:00
flacs
29a0a2b626 Merge pull request #3043 from lioncash/jitalign
JitCache: Get rid of pointer casts
2015-09-16 21:22:01 +02:00
Lioncash
8aac59418b JitCache: Get rid of pointer casts
Silences more ubsan runtime asserts
2015-09-16 06:25:48 -04:00
degasus
3ae466a33c JitArm64: Fix lmw + stmw 2015-09-16 08:11:18 +02:00
Scott Mansell
c0a89c3bf4 Merge pull request #3009 from phire/depth_tested_pokes
Add some logging for depth tested efb color pokes.
2015-09-12 22:20:59 +12:00
Scott Mansell
44456bec0f Add some logging for depth tested efb color pokes. 2015-09-12 22:19:59 +12:00
Lioncash
3f4852a03d MathUtil: Convert IsPow2 into a constexpr function 2015-09-12 01:26:05 -04:00
Lioncash
b9e360df7b MathUtil: Convert Clamp into a constexpr function 2015-09-12 01:18:28 -04:00
flacs
c5685ba53a Merge pull request #2972 from lioncash/align
General: Replace GC_ALIGN macros with alignas
2015-09-11 17:00:13 +00:00
Lioncash
19459e827f Partially revert "General: Toss out PRI macro usage" 2015-09-11 09:49:00 -04:00
flacs
48031eaff7 Merge pull request #2974 from Tilka/fprf
Jit64: fix errors in FPRF calculation
2015-09-08 18:59:22 +00:00
Ryan Houdek
5d7f834cde Add run count to the JIT profile information 2015-09-08 11:09:52 -05:00
Ryan Houdek
a9a339a00c Merge pull request #2962 from Sonicadvance1/aarch64_integer_gatherpipe
[AArch64] Implement integer gatherpipe writes.
2015-09-07 06:20:01 -05:00
Lioncash
8ce04f9a65 General: Replace GC_ALIGN macros with alignas
Standard supported alignment -> out with compiler-specific.
2015-09-06 12:53:51 -04:00
Scott Mansell
be4caa3dc0 Merge pull request #2961 from lioncash/printf
General: Toss out PRI macro usage
2015-09-06 21:02:22 +12:00
Lioncash
4fc71e9708 Common: Remove StdMakeUnique.h 2015-09-06 04:09:53 -04:00
comex
96e42dff52 Merge pull request #2977 from lioncash/unused
General: Remove unimplemented function prototypes
2015-09-05 22:20:47 -04:00
Lioncash
633be0387d General: Remove unimplemented function prototypes 2015-09-05 22:01:07 -04:00
Lioncash
8fdb013d54 General: Toss out PRI macro usage
Now that VS supports more printf specifiers, these aren't necessary
2015-09-05 16:02:35 -04:00
Tillmann Karras
8f777cd839 Jit64: fix errors in FPRF calculation 2015-09-05 20:17:53 +02:00
Ryan Houdek
de80b9e988 Merge pull request #2971 from degasus/arm
JitArm64: fix smaller issues
2015-09-05 08:43:44 -05:00
degasus
24fec3ebca JitArm64: Fix float load & store 2015-09-05 13:48:29 +02:00
degasus
36902c58eb JitArm64: Fix lwbrx and lhbrx 2015-09-05 13:48:29 +02:00
degasus
696f95d5f9 JitArm64: Fix subfic 2015-09-05 13:48:29 +02:00
degasus
baa28e13f4 JitArm64: Remove FLUSH_INTERPRETER
It seems to be broken for some instructions, and there is no need for it any more.
2015-09-05 13:48:29 +02:00
Tillmann Karras
405554e327 Jit64: remove unnecessary indirection 2015-09-05 12:40:14 +02:00
Tillmann Karras
72eed1aa82 JitCache: drop unused method 2015-09-05 12:40:14 +02:00
Ryan Houdek
de051dac71 [AArch64] Implement integer gatherpipe writes. 2015-09-04 19:52:25 -05:00
Ryan Houdek
791c7d5a84 [AArch64] Clean up bogus vector FCVT{N,L} instruction usage.
Replace the instruction with the scalar variant FCVT instruction.
FCVT{N,L} 8 cycles latency on the Cortex A57
FCVT has five cycle latency and slightly higher throughput

On the A72 all three of these instructions will have three cycle latency,
While FCVT{N,L} will have half the throughput.
2015-09-04 19:41:54 -05:00
Ryan Houdek
2c68f6bfc5 [AArch64] Implement Fiora's preemptive paired loadstore optimization.
This provides a decent speed up in pretty much everything that touches pair loadstores because in most cases they are just regular non-quantizing
float loadstores that happen.
2015-09-04 19:20:33 -05:00
degasus
5797111ef0 JitArm64: Optimize fpr.R() 2015-09-02 22:46:14 +02:00
degasus
dfd44730c8 JitArm64: simplify fpr call 2015-09-02 22:46:14 +02:00
Ryan Houdek
ae0a06a018 [AArch64] Implement dcbz instruction 2015-08-31 15:39:47 -05:00
Ryan Houdek
0f54aa48b4 Merge pull request #2928 from Sonicadvance1/aarch64_improved_singles
[AArch64] Improve floating point single instructions.
2015-08-31 12:00:08 -05:00
Ryan Houdek
bcde1aa8ff [AArch64] Improve floating point single instructions.
Instead of having an "INS" instruction after every single instruction to duplicate the bottom 64bits in to the top 64bits of the register,
create a new FPR register cache type to track when a register's lower 64bits is supposed to be duplicated in to the high 64bits.
Not necessarily actually having the lower bits duplicated in the host side register. This removes inefficient INS instructions from sequential single
float instructions.
In particular a very heavy single heavy block in Animal Crossing went from 712 instructions down to 520 instructions(~37% less instructions!)
2015-08-31 11:09:17 -05:00
Ryan Houdek
d003934b8a Merge pull request #2929 from Sonicadvance1/aarch64_optimize_gpr_flush
Aarch64 optimize gpr flush
2015-08-31 10:55:45 -05:00
Ryan Houdek
8bf332cf08 [AArch64] Optimize GPR cache flushing.
If we are flushing multiple sequential guest GPRs then we can store two in a single STP instruction.
Ikaruga does this quite a bit in their blocks where they do an lmw at the very end and then we have to flush them all.
Typically cuts 16 STR instructions down to 8 STP instructions there.
2015-08-30 23:07:12 -05:00
Ryan Houdek
b907576510 [AArch64] Support profiling by cycle counters if they are available to EL0 2015-08-30 10:25:16 -05:00
Ryan Houdek
5110574c1f Merge pull request #2921 from Sonicadvance1/aarch64_optimize_lmw
[AArch64] Optimize lmw.
2015-08-30 10:23:57 -05:00
Lioncash
df19f11cb9 Jit_Util: Add missing override specifiers 2015-08-29 00:30:18 -04:00
Ryan Houdek
8d61706440 [AArch64] Optimize lmw.
This instruction is fairly heavily used by Ikaruga to load a bunch of registers from the stack.
In particular at the start of the second stage is a block that takes up ~20% CPU time that includes a usage of lmw to load half of the guest
registers.

Basic thing optimized here is changing from a single 32bit LDR to potentially a single 128bit LDR.
a single 32bit LDR is fairly slow, so we can optimize a few ways.
If we have four or more registers to load, do a 64bit LDP in to two host registers, byteswap, and then move the high 32bits of the host registers in
to the correct mapped guest register locations.
If we have two registers to load then do a 32bit LDP which will load two guest registers in a single instruction.
and then if we have only one register left to load, load it as before.

This saves quite a bit of cycles since the Cortex-A57 and A72's LDR instruction takes a few cycles.

Each 32bit LDR takes 4 cycles latency, plus 1 cycle for post-index(which typically happens in parallel.
Both the 32bit and 64bit LDP take the same amount of latency.

So we are improving latencies and reducing code bloat here.
2015-08-28 14:40:30 -05:00
Ryan Houdek
2c3fa8da28 [AArch64] Fix a bug in the register caches.
This is a bug that crops if BindToRegister() is called multiple times in a row without a R() function call between them.
How to reproduce the bug:
1) Have a completely filled cache with no host register remaining
2) Call BindToRegister() with different guest registers
3) Don't call R() between the BindToRegister() calls.

This issue typically wouldn't be seen for a couple of reasons. Typically we have /plenty/ of registers in the cache, and in most cases we only call
BindToRegister() once per instruction. In the off chance that it is called multiple times, it wouldn't update the last used counts and would flush the
same register as the previous call to it.
2015-08-28 14:36:14 -05:00
degasus
e516d4ef59 JitArm64: Implement rlwnmx 2015-08-26 21:59:10 +02:00
flacs
99e88a7af7 Merge pull request #2887 from Tilka/swap
Jit64: some byte-swapping changes
2015-08-26 16:43:45 +02:00
flacs
eb6ac641be Merge pull request #2906 from Tilka/fpscr
Jit64: fix bugs in the FPSCR instructions
2015-08-26 16:43:28 +02:00