Jit64: srwx - Optimize shift by constant

More efficient code can be generated if the shift amount is known at
compile time. Similar optimizations were present in JitArm64 already,
but were missing in Jit64.

- By using an 8-bit immediate we can eliminate the need for ECX as a
  scratch register, thereby reducing register pressure and occasionally
  eliminating a spill.

Before:
B9 18 00 00 00       mov         ecx,18h
45 8B C1             mov         r8d,r9d
49 D3 E8             shr         r8,cl

After:
45 8B C1             mov         r8d,r9d
41 C1 E8 18          shr         r8d,18h

- PowerPC has strange shift amount masking behavior which is emulated
  using 64-bit shifts, even though we only care about a 32-bit result.
  If the shift amount is known, we can handle this special case
  separately, and use 32-bit shift instructions otherwise.

Before:
B9 F8 FF FF FF       mov         ecx,0FFFFFFF8h
45 8B C1             mov         r8d,r9d
49 D3 E8             shr         r8,cl

After:
Nothing, register is set to constant zero.

- A shift by zero becomes a simple MOV.

Before:
B9 00 00 00 00       mov         ecx,0
45 8B C1             mov         r8d,r9d
49 D3 E8             shr         r8,cl

After:
45 8B C1             mov         r8d,r9d
This commit is contained in:
Sintendo 2020-11-16 23:00:52 +01:00
parent 2e4e2ad1ff
commit 17db359979

View File

@ -1795,6 +1795,27 @@ void Jit64::srwx(UGeckoInstruction inst)
u32 amount = gpr.Imm32(b);
gpr.SetImmediate32(a, (amount & 0x20) ? 0 : (gpr.Imm32(s) >> (amount & 0x1f)));
}
else if (gpr.IsImm(b))
{
u32 amount = gpr.Imm32(b);
if (amount & 0x20)
{
gpr.SetImmediate32(a, 0);
}
else
{
RCX64Reg Ra = gpr.Bind(a, RCMode::Write);
RCOpArg Rs = gpr.Use(s, RCMode::Read);
RegCache::Realize(Ra, Rs);
if (a != s)
MOV(32, Ra, Rs);
amount &= 0x1f;
if (amount != 0)
SHR(32, Ra, Imm8(amount));
}
}
else
{
RCX64Reg ecx = gpr.Scratch(ECX); // no register choice