Rather than *MemTools.cpp checking whether the address is in the
emulated range itself (which, as of the next commit, doesn't cover every
kind of access the JIT might want to intercept) and doing PC
replacement, they just pass the access address and context to
jit->HandleFault, which does the rest itself.
Because SContext is now in JitInterface, I wanted JitBackpatch.h (which
defines it) to be lightweight, so I moved TrampolineCache and associated
x64{Analyzer,Emitter} dependencies into its own file. I hate adding new
files in three places, two of which are MSVC...
While I'm at it, edit a misleading comment.
When executing a BL-type instruction, push the new LR onto the stack,
then CALL the dispatcher or linked block rather than JMPing to it. When
executing BLR, compare [rsp+8] to LR, and RET if it's right, which it
usually will be unless the thread was switched out. If it's not right,
reset RSP to avoid overflow.
This both saves a trip through the dispatcher and improves branch
prediction.
There is a small possibility of stack overflow anyway, which should
be handled... *yawn*
These calls are made outside of JIT blocks, and thus previously did not
read any protection - register use is taken into account and the outer
dispatcher stack frame is sufficient. However, if data is to be stored
on the stack, these calls must reserve stack shadow space on Windows to
avoid clobbering it.
Use some SSE4 instructions in on CPUs that support them.
Use float instructions instead of int where appropriate (it's a cycle faster
on CPUs with arithmetic unit forwarding penalties).
JitIL's fastmem was stubbed out when Sonicadvance1 merged JitARMIL
into the tree. Since JitARMIL has been deleted, I simply re-arrange
the inheritance to base JitIL on Jitx86Base, so it can inherit the
backpatch function.
Povray Benchmark: 1985 seconds to 1316 seconds.
Tries as hard as possible to push carry-using operations (like addc and adde)
next to each other. Refactor the instruction reordering to be more flexible
and allow multiple passes.
353 -> 192 x86 instructions on a carry-heavy code block in Pokemon Puzzle.
12% faster overall in Pokemon Puzzle; probably less in typical games (Virtual
Console games seem to be carry-heavy for some reason; maybe a different
compiler?)
Has been broken since the flags-opt merge. The idle skipping code in
JitIL was very brittle and depended on the IL of it's inputs not
changing in any way.
flags-opt changed the IR generated by the cmp instruction, which is part
of the idle loop, causing JitIL to break in really weird ways, which
were almost impossible to track down.
This fixes various wii games crashing/not booting and the Regspill
error on (all?) gamecube mmu games.