Fifo overflow fix
The idea behind separating the CPU and GPU thread path is to prevent two threads executing the same function (SetCPStatusFromCPU) at the same time. The CPU thread gets to that function via GetherPipeBursted and the GPU thread gets there via SetCPStatusFromGPU. I wrote the original (factored) version as I like to keep code duplication to a minimum. This worked most of the time but not all of the time. It was a good move to separate it to a GPU version and a CPU version, but then the GPU version called the CPU version at the tail end. Removing the call (as in this PR) prevents the FIFO overflow in Starfox Adventures.
The "Updating the CPStatus before FIFO events" change is simply there to keep the DC code path and the SC code path consistent in the !GPLinked scenario. The SC code path does the same thing when !GPLinked via RunGPU. Putting the SetCPStatus call before the !GPLinked state changes the DC code path to do the same thing. The more we keep the DC and SC code paths consistent, the better.
Forcing the exception check on interrupts is the change for The Last Story. It makes the emulator process interrupts more responsively on CP interrupts.
The HiWatermark check is something that old (pre-2010) versions of Dolphin used to do. Games like Battalion Wars 2 do not expect the GPU to run so slowly and do not have the code to react to that situation, so this change makes the emulator extra careful in those situations.
This wasn't too much of a concern since we normally don't care about this feature set, but it is nice when testing on new devices and they don't
support the higher feature sets but want to run under software renderer.
The Mesa softpipe and PowerVR 5xx drivers don't support higher GL versions, but they shouldn't exit out just because they couldn't get a GL3 function
pointer that isn't even going to be used at that point.
The ActionBar method of doing the tabular layout is deprecated on Android 5.0.
This method alleviates those deprecations while providing the same functionality.
This was interesting implementing.
Our generic QueryPerformanceCounter function on ARMv7 was so slow that profiling a block was impossible.
I waited about five minutes and I couldn't even get a single frame to output.
This instead uses ARMv7's PMU to get cycle counts, which are a relatively minor performance drop in my testing.
One disadvantage of this method is that the kernel can lock us out of using these co-processor registers, but it seems to work on my Jetson board.
Another disadvantage is that we aren't having block times in "real" time but cycles instead, not too big of a deal.
This also removes instruction run counts from profiling because that's just annoying and we don't expose an interface for even getting those results
from our UI.
Regression by 3fed975bac11956ffc4d3eae86c928c7e0c921db caused netplay to
crash on OS X. While I'm at it, fix the long-standing "unsafe i guess"
AddPendingEvent, since we depend on wx 3 now...
This implements a new system for fastmem backpatching on ARMv7 that is less of a mindfsck to deal with.
This also implements stfs under the default loadstore path as well, not sure why it was by itself in the first place.
I'll be moving the rest of the loadstore methods over to this new way in a few days.