mirror of
https://github.com/dolphin-emu/dolphin.git
synced 2025-01-24 23:11:14 +01:00
ac28b89fa5
When I added the software FMA path in 2c38d64 and made us use it when determinism is enabled, I was assuming that either the performance impact of software FMA wouldn't be too large or CPUs that were too old to have FMA instructions were too slow to run Dolphin well anyway. This was wrong. To give an example, the netplay performance went from 60 FPS to 30 FPS in one case. This change makes netplay clients negotiate whether FMA should be used. If all clients use an x64 CPU that supports FMA, or AArch64, then FMA is enabled, and otherwise FMA is disabled. In other words, we sacrifice accuracy if needed to avoid massive slowdown, but not otherwise. When not using netplay, whether to enable FMA is simply based on whether the host CPU supports it. The only remaining case where the software FMA path gets used under normal circumstances is when an input recording is created on a CPU with FMA support and then played back on a CPU without. This is not an especially common scenario (though it can happen), and TASers are generally less picky about performance and more picky about accuracy than other users anyway. With this change, FMA desyncs are avoided between AArch64 and modern x64 CPUs (unlike before 2c38d64), but we do get FMA desyncs between AArch64 and old x64 CPUs (like before 2c38d64). This desync can be avoided by adding a non-FMA path to JitArm64 as an option, which I will wait with for another pull request so that we can get the performance regression fixed as quickly as possible. https://bugs.dolphin-emu.org/issues/12542