Previous code from #7950 only clamps correctly when the efb copies
left and top coordinates are (0, 0)
Now we should handle all situations.
Spyro: A hero's tail is an example of a game that does an oversized
EFB copy with a non-zero origin.
If W0 is locked when fpr.RW is called, the indirectly called
ConvertSingleToDoubleLower may need to emit a push+pop, so it's
better for fresx/frsqrtex to call RW before locking W0 than after.
This way the address check will take up less icache (since it's
only emitted once for each routine rather than once for each
psq_st instruction), and we also get address checking for psq_l.
Matches Jit64's approach.
The disadvantage: In the slowmem case, the routines have to
push *every* caller-saved register onto the stack, even though
most callers probably don't need it. But at long as the slowmem
case isn't hit frequently, this is fine.
In the case of the JitAsm routines, we can't actually use
backpatching. Still, I would like to gather all the load and
store instructions in one place to make future changes easier.
This adjusts the NaN replacement logic introduced in #9928 to work around the HLSL compiler optimizing away calls to isnan, which caused that functionality to not work with ubershaders on D3D11 and D3D12 (it did work with specialized shaders, despite a warning being logged for both; that warning is also now gone). Note that the `D3DCOMPILE_IEEE_STRICTNESS` flag did not solve this issue, despite the warning suggesting that it might.
Suggested by @kayru and @jamiehayes.
This is a proper fix for the issue that 3071a1d was a workaround for.
It wasn't some kind of bug in the register cache that had laid dormant,
it was a simple mistake made in b24b79e.
Fixes a regression from ecf86bb.
The GPR allocation_order is initialized with only 28 elements,
so the 29th element ends up getting zero initialized.
Very sneaky bug...
Previously in Read_U64 and Write_U64 the value that was read or written
would be truncated to a 32-bit value before being passed off to the
memcheck handler, which can result in incorrect values being logged out.
Lets us simplify SDRUpdated() a little bit.
This also fixes the layout of UReg_SDR1. Turns out this struct has been
incorrect (from a little-endian perspective) the entire time and went
unnoticed, since the union was never used.
These are trivial to resolve.
Converting the structure member into a u32 results in no increase in
structure size, as it's making use of the three extra padding bits in
the structure.