FinalizeCarryOverflow didn't maintain XER[OV/SO] properly due to an
oversight. Here's the code it would generate:
0: 9c pushf
1: 80 65 3b fe and BYTE PTR [rbp+0x3b],0xfe
5: 71 04 jno b <jno>
7: c6 45 3b 03 mov BYTE PTR [rbp+0x3b],0x3
000000000000000b <jno>:
b: 9d popf
At first glance it seems reasonable. The host flags are carefully
preserved with PUSHF. The AND instruction clears XER[OV]. Next, an
conditional branch checks the host's overflow flag and, if needed, skips
over a MOV that sets XER[OV/SO]. Finally, host flags are restored with
POPF.
However, the AND instruction also clears the host's overflow flag. As a
result, the branch that follows it is always taken and the MOV is always
skipped. The end result is that XER[OV] is always cleared while XER[SO]
is left unchanged.
Putting POPF immediately after the AND would fix this, but we already
have GenerateOverflow doing it correctly (and without the PUSHF/POPF
shenanigans too). So let's just use that instead.
Happens in Super Mario Sunshine. You could probably do something similar
for b == -1 (like we do for subfic), but I couldn't find any titles that
do this.
- Case 1: d == a
Before:
41 8B C7 mov eax,r15d
41 BF 00 00 00 00 mov r15d,0
44 2B F8 sub r15d,eax
After:
41 F7 DF neg r15d
- Case 2: d != a
Before:
BF 00 00 00 00 mov edi,0
41 2B FD sub edi,r13d
After:
41 8B FD mov edi,r13d
F7 DF neg edi
Consider the case where d and a refer to the same PowerPC register,
which is known to hold an immediate value by the RegCache. We place a
ReadWrite constraint on this register and bind it to an x86 register.
The RegCache then allocates a new register, initializes it with the
immediate, and returns a RCX64Reg for both d and a.
At this point information about the immediate value becomes unreachable.
In the case of subfx, this generates suboptimal code:
Before 1:
BF 1E 00 00 00 mov edi,1Eh <- done by RegCache
8B C7 mov eax,edi
8B FE mov edi,esi
2B F8 sub edi,eax
Before 2:
BE 00 AC 3F 80 mov esi,803FAC00h <- done by RegCache
8B C6 mov eax,esi
8B 75 EC mov esi,dword ptr [rbp-14h]
2B F0 sub esi,eax
The solution is to explicitly handle the constant a case before having
the RegCache allocate registers for us.
After 1:
8D 7E E2 lea edi,[rsi-1Eh]
After 2:
8B 75 EC mov esi,dword ptr [rbp-14h]
81 EE 00 AC 3F 80 sub esi,803FAC00h
Also provides operator!= for logical symmetry.
We can also take the arguments by value, as the arguments are trivially
copyable enum values which fit nicely into registers already.
https://bugs.dolphin-emu.org/issues/6749
This change fixes the scratchy audio in Teenage Mutant Ninja Turtles (SX7E52/SX7P52). The game starts an audio interface DMA with an unaligned address, and because Dolphin was not masking off the low 5 bits of AUDIO_DMA_START_LO, all future AI DMAs were misaligned. To understand why, it is instructive to refer to AUDIO_InitDMA() in libogc, which behaves the same as the official SDK:
_dspReg[25] = (_dspReg[25]&~0xffe0)|(startaddr&0xffff);
The implementation does not mask off the low bits of the passed in value before it ORs them with low bits of the current register value. Therefore, if they are not masked off by the hardware itself, they become permanently stuck once set.
Adding a write mask for AUDIO_DMA_START_LO is enough to fix the bug in TMNT, but I decided to run some tests on GC and Wii to find the correct write masks for the surrounding registers, as only a couple were already being masked. Dolphin has gotten away with not masking the rest because many are already A) masked on read (or never read) by the SDK and/or B) masked on use (or never used) in Dolphin.
This leaves just three registers where the difference may be observable: AR_DMA_CNT_H and AUDIO_DMA_START_HI/LO.
operator[] performs a default construction if an object at the given key
doesn't exist before overwriting it with the one we provide in operator=
insert_or_assign performs optimal insertion by avoiding the default
construction if an entry doesn't exist.
Not a game changer, but it is essentially a "free" change.
Allows lookups to be done with std::string_view or any other string
type. This allows for non-allocating strings to be used with the name
lookup without needing to construct a std::string.