This fixes a regression from 592ba31. When `a` was a constant 0 and `b`
was a non-constant 0x80000000, the 32-bit negation operation would
overflow, causing an incorrect result. The sign extension needs to happen
before the negation to avoid overflow.
Note that I can't merge the SXTW and NEG into one instruction.
NEG is an alias for SUB with the first operand being set to ZR,
but "SUB (extended register)" treats register 31 as SP instead of ZR.
I've also changed the order for the case where `a` is a constant
0xFFFFFFFF. I don't think the order actually affects correctness here,
but let's use the same order for all the cases since it makes the code
easier to reason about.
This works around an Intel driver bug where, on D3D12 only, dual-source blending behaves incorrectly if the second source is unused on. This bug is visible in skyboxes in Super Mario Sunshine, which first draw clouds and sun flare in greyscale and then draw the sky afterwards with a source factor of 1 and a dest factor of 1-src_color (this results in the clouds being tinted blue). This process is done on an RGB888 framebuffer, so alpha update is disabled. (Color update is enabled; note that if you look at this in Dolphin's fifo analyzer, it won't be enabled because they use the BP mask functionality to only change the blending functions and not alpha/color update, for whatever reason.)
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one.
This regressed in 9d39647f9e3be1bdc2ee2cfc5ce6af2e560b4cc1 (part of #11183, but it was fine in e97d3804373b31724d0f092d8132e5ba23dec4a5), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it).
I'm not aware of any games that are affected by this, though I did not do extensive testing.
Previously we would only backpatch overflowed address calculations
if the overflow was 0x1000 or less. Now we can handle the full 2 GiB
of overflow in both directions.
I'm also making equivalent changes to JitArm64's code. This isn't because
it needs it – JitArm64 address calculations should never overflow – but
because I wanted to get rid of the 0x100001000 inherited from Jit64 that
makes even less sense for JitArm64 than for Jit64.
This avoids a pseudo infinite loop where CodeWidget::UpdateCallstack
would lock the CPU in order to read the call stack, causing the CPU to
call Host_UpdateDisasmDialog because it's transitioning from running to
pausing, causing Host::UpdateDisasmDialog to be emitted, causing
CodeWidget::Update to be called, once again causing
CodeWidget::UpdateCallstack to be called, repeating the cycle.
Dolphin didn't go completely unresponsive during this, because
Host_UpdateDisasmDialog schedules the emitting of Host::UpdateDisasmDialog
to happen on another thread without blocking, but it was stopping certain
operations like exiting emulation from working.
This fixes a problem I was having where using frame advance with the
debugger open would frequently cause panic alerts about invalid addresses
due to the CPU thread changing MSR.DR while the host thread was trying
to access memory.
To aid in tracking down all the places where we weren't properly locking
the CPU, I've created a new type (in Core.h) that you have to pass as a
reference or pointer to functions that require running as the CPU thread.