And switch to a register order that consistently prefers callee-save to
caller-save. phire suggested putting rdi/rsi first, even though they're
caller-save, to save code space; this is more conservative and I can do
that later.
Rather than using a variety of registers including RSI, ABI_PARAM1
(either RCX or RDI), RCX, and RDX, the rule is:
- RDI and RSI are never used. This allows them to be allocated on Unix,
bringing parity with Windows.
- RDX is a permanent temporary register along with RAX (and is thus not
FlushLocked). It's used frequently enough that allocating it would
probably be a bad idea, as it would constantly get flushed.
- RCX is allocatable, but is flushed in two situations:
- Non-immediate shifts (rlwnm), because x86 requires RCX to be used.
- Paired single loads and stores, because they require three
temporary registers: the helper functions take two integer
arguments, and another register is used as an index to get the
function address.
These should be relatively rare.
While we're at it, in stores, use the registers directly where possible
rather than always using temporaries (by making SafeWriteRegToReg
clobber less). The address doesn't need to be clobbered in the usual
case, and on CPUs with MOVBE, neither does the value.
Oh, and get rid of a useless MEMCHECK.
This commit does not actually add new registers to the allocation order;
it is intended to test for any performance or correctness issues
separately.
The special case is where the registers are actually to be swapped (i.e.
func(ABI_PARAM2, ABI_PARAM1); this was previously impossible but would
be ugly not to handle anyway.
In two cases, my old code was using a temporary register but not saving
it properly; it basically worked by accident (an otherwise useless
FlushLock was causing CallerSavedRegistersInUse to think it was in use
by the GPR cache, even though it was actually a temporary).
I'm going to modify this in the next commit to use RDX, but I didn't
want to leave a broken revision in the middle.
The register is RBP, previously in the GPR allocation order. The next
commit will investigate whether there are too few GPRs (now or before),
but for now there is no replacement.
Previously, it was accessed RIP relatively; using RBP, anything in the
first 0x100 bytes of ppcState (including all the GPRs) can be accessed
with three fewer bytes. Code to access ppcState is generated constantly
(mostly by register save/load), so in principle, this should improve
instruction cache footprint significantly. It seems that this makes a
significant performance difference in practice.
The vast majority of this commit is mechanically replacing
M(&PowerPC::ppcState.x) with a new macro PPCSTATE(x).
Version 2: gets most of the cases which were using the register access
macros.
GetOpInfo was returning null pointers for invalid ops in subtables
instead of asserting an error. This was causing segfaults when the
jit tried to jit invalid code.
Previously it would fall through to the mmu code path, and raise a dsi
exception, which it would never check for, so it would continue
executing code silently.
Added the option to handle whether the user wants to iterate through the
assignment of button mappings or assign them one at a time.
fixed formatting issues and code style.
I excluded this option from the config file. This stopped the check box value and the boolean from becoming offset. Since the option should always start as false.
This still causes an issue with the Wiimote input, since the class variable that keeps the state will be wiped, but the check box value will stay the same after closing/reopening without closing the entire Wiimote configuration. I am looking for a way to resolve this.
I also reduced wait time to 2.5 seconds vs. the 5 seconds previously. Seemed to be a little long.
These changes apparently did not go through.
This should fix the Wiimote issue.
As far as I can tell, this has literally been here since the start of the git
history; maybe it was stubbed out because the author wasn't sure it was right?
It matches the PPC/Broadway manuals perfectly, though.
This value was "helpful" for debugging when the stack got corrupted.
Helpful that if gpr[1](Which is the stack pointer with PPC ABI) is zero then the interpreter would spam huge amounts of annoy text saying that we
managed to get in to a "corrupted" state.
This is incremented every instruction on the interpreter, or every block run on the JIT64....Only if debugging is enabled(JIT64 it is a const
variable)
The message is only outputted when interpreter is used and debugging is enabled.
The old method would always evict the first suitable register, i.e. the
same register every time once the cache got full. The cache doesn't get
terribly often, but the result is pathological...