Also, this avoids keeping the system awake if a game is not being
played.
Frankly, I don't know what the point of precisely tracking these things
is, but that's how the API works. Feel free to add analogous
functionality on other platforms.
Move the JITed function/basic-block registration logic out of the CPU
subsystem in order to add JIT registration to JITed DSP and
Video/VertexLoader code.
This necessary in order to add /tmp/perf-$pid.map support to other
JITed code as they need to write to the same file.
'perf' is the standard builtin tool for performance analysis on recent
Linux kernel. Its source code is shipped within the kernel repository.
'perf' has basic support for JIT. For each process, it can read a file
named /tmp/perf-$PID.map. This file contains mapping from address
range to function name in the format:
41187e2a 1a EmuCode_804a33fc
with the following entries:
1. beginning of the range (hexadecimal);
2. size of the range (hexadecimal);
3. name of the function.
We supply the PowerPC address of the basic block as function name.
Usage:
DOLPHIN_PERF_DIR=/tmp dolphin-emu &
perf record -F99 -p $(pgrep dolphin-emu) --call-graph dwarf
perf script | stackcollapse-perf.pl | grep EmuCode__ | flamegraph.pl > profile.svg
Issue: perf does not have support for region invalidation. It reads
the file in postprocessing. It probably does not work very well if a
JIT region is reused for another basic block: wrong results should be
expected in this case. Currently, nothing is done to prevent this.
This is a fairly lengthy change that can't be separated out to multiple commits well due to the nature of fastmem being a bit of an intertangled mess.
This makes my life easier for maintaining fastmem on ARMv7 because I now don't have to do any terrible instruction counting and NOP padding. Really
makes my brain stop hurting when working with it.
This enables fastmem for a whole bunch of new instructions, which basically means that all instructions now have fastmem working for them. This also
rewrites the floating point loadstores again because the last implementation was pretty crap when it comes to performance, even if they were the
cleanest implementation from my point of view.
This initially started with me rewriting the fastmem routines to work just like the previous/current implementation of floating loadstores. That was
when I noticed that the performance tanked and decided to rewrite all of it.
This also happens to implement gatherpipe optimizations alongside constant address optimization.
Overall this comment brings a fairly large speedboost when using fastmem.
* Added country flags for games from Netherlands and Spain
* Added separate category for Region Free games (Uses European flag as placeholder)
* Added missing country filter options in "show regions" menu
* Rearranged country filters for readability
* Incremented CACHE_REVISION
Also fixed various country filters not showing up as options in the "Show regions" menu.
There was a longstanding hack that defined ucontext_t manually to work
around the lack of this header on the Android NDK. However, it looks
like newer NDK versions now have it like good little POSIX boys, and my
recent header reshuffle broke the build on those versions, presumably
because the real and fake definitions of ucontext_t end up included in
the same file where they weren't under the old organization.
Rather than try to revert the conflict, this commit just removes the
hack. The buildbot's NDK will need to be upgraded.
Changes the read speed of GC discs from 3 MiB/s to 2-3.3 MiB/s,
depending on the location of the data. I also attempted to change the
speeds for Wii discs, but it has very little effect right now because
Wii games use IPC_HLE instead of DVDInterface. It does affect Wii
homebrew that reads Wii discs, though.
This was interesting implementing.
Our generic QueryPerformanceCounter function on ARMv7 was so slow that profiling a block was impossible.
I waited about five minutes and I couldn't even get a single frame to output.
This instead uses ARMv7's PMU to get cycle counts, which are a relatively minor performance drop in my testing.
One disadvantage of this method is that the kernel can lock us out of using these co-processor registers, but it seems to work on my Jetson board.
Another disadvantage is that we aren't having block times in "real" time but cycles instead, not too big of a deal.
This also removes instruction run counts from profiling because that's just annoying and we don't expose an interface for even getting those results
from our UI.
This implements a new system for fastmem backpatching on ARMv7 that is less of a mindfsck to deal with.
This also implements stfs under the default loadstore path as well, not sure why it was by itself in the first place.
I'll be moving the rest of the loadstore methods over to this new way in a few days.
These are causing issues in games. In particular you get pink on the screen in Animal Crossing.
Disable until fully investigated.
This also disables fastmem on floating point loadstore instructions which are horribly broken and won't actually backpatch when an invalid read/write
is encountered.