This doesn't really add any new optimizations, but fixes an issue that
prevented the optimizations introduced in #8551 and #8755 from being
applied in specific cases. A similar issue was solved for subfx as part
of #9425.
Consider the case where the destination register is also an input
register and happens to hold an immediate value. This results in a set
of constraints that forces the RegCache to allocate a register and move
the immediate value into it for us. By the time we check for immediate
values in the JIT, we're too late.
We solve this by refactoring the code in such a way that we can check
for immediates before involving the RegCache.
- Example 1
Before:
41 BF 00 68 00 CC mov r15d,0CC006800h
44 03 FF add r15d,edi
After:
44 8D BF 00 68 00 CC lea r15d,[rdi-33FF9800h]
- Example 2
Before:
41 BE 00 00 00 00 mov r14d,0
44 03 F7 add r14d,edi
After:
44 8B F7 mov r14d,edi
- Example 3
Before:
41 BD 03 00 00 00 mov r13d,3
44 03 6D 8C add r13d,dword ptr [rbp-74h]
After:
44 8B 6D 8C mov r13d,dword ptr [rbp-74h]
41 83 C5 03 add r13d,3
For certain occurrences of nandx/norx, we declare a ReadWrite constraint
on the destination register, even though the value of the destination
register is irrelevant. This false dependency would force the RegCache
to generate a redundant MOV when the destination register wasn't already
assigned to a host register.
Example 1:
BF 00 00 00 00 mov edi,0
8B FE mov edi,esi
F7 D7 not edi
Example 2:
8B 7D 80 mov edi,dword ptr [rbp-80h]
8B FE mov edi,esi
F7 D7 not edi
FinalizeCarryOverflow didn't maintain XER[OV/SO] properly due to an
oversight. Here's the code it would generate:
0: 9c pushf
1: 80 65 3b fe and BYTE PTR [rbp+0x3b],0xfe
5: 71 04 jno b <jno>
7: c6 45 3b 03 mov BYTE PTR [rbp+0x3b],0x3
000000000000000b <jno>:
b: 9d popf
At first glance it seems reasonable. The host flags are carefully
preserved with PUSHF. The AND instruction clears XER[OV]. Next, an
conditional branch checks the host's overflow flag and, if needed, skips
over a MOV that sets XER[OV/SO]. Finally, host flags are restored with
POPF.
However, the AND instruction also clears the host's overflow flag. As a
result, the branch that follows it is always taken and the MOV is always
skipped. The end result is that XER[OV] is always cleared while XER[SO]
is left unchanged.
Putting POPF immediately after the AND would fix this, but we already
have GenerateOverflow doing it correctly (and without the PUSHF/POPF
shenanigans too). So let's just use that instead.