34 Commits

Author SHA1 Message Date
comex
65af90669b Add the 'desynced GPU thread' mode.
It's a relatively big commit (less big with -w), but it's hard to test
any of this separately...

The basic problem is that in netplay or movies, the state of the CPU
must be deterministic, including when the game receives notification
that the GPU has processed FIFO data.  Dual core mode notifies the game
whenever the GPU thread actually gets around to doing the work, so it
isn't deterministic.  Single core mode is because it notifies the game
'instantly' (after processing the data synchronously), but it's too slow
for many systems and games.

My old dc-netplay branch worked as follows: everything worked as normal
except the state of the CP registers was a lie, and the CPU thread only
delivered results when idle detection triggered (waiting for the GPU if
they weren't ready at that point).  Usually, a game is idle iff all the
work for the frame has been done, except for a small amount of work
depending on the GPU result, so neither the CPU or the GPU waiting on
the other affected performance much.  However, it's possible that the
game could be waiting for some earlier interrupt, and any of several
games which, for whatever reason, never went into a detectable idle
(even when I tried to improve the detection) would never receive results
at all.  (The current method should have better compatibility, but it
also has slightly higher overhead and breaks some other things, so I
want to reimplement this, hopefully with less impact on the code, in the
future.)

With this commit, the basic idea is that the CPU thread acts as if the
work has been done instantly, like single core mode, but actually hands
it off asynchronously to the GPU thread (after backing up some data that
the game might change in memory before it's actually done).  Since the
work isn't done, any feedback from the GPU to the CPU, such as real
XFB/EFB copies (virtual are OK), EFB pokes, performance queries, etc. is
broken; but most games work with these options disabled, and there is no
need to try to detect what the CPU thread is doing.

Technically: when the flag g_use_deterministic_gpu_thread (currently
stuck on) is on, the CPU thread calls RunGpu like in single core mode.
This function synchronously copies the data from the FIFO to the
internal video buffer and updates the CP registers, interrupts, etc.
However, instead of the regular ReadDataFromFifo followed by running the
opcode decoder, it runs ReadDataFromFifoOnCPU ->
OpcodeDecoder_Preprocess, which relatively quickly scans through the
FIFO data, detects SetFinish calls etc., which are immediately fired,
and saves certain associated data from memory (e.g. display lists) in
AuxBuffers (a parallel stream to the main FIFO, which is a bit slow at
the moment), before handing the data off to the GPU thread to actually
render.  That makes up the bulk of this commit.

In various circumstances, including the aforementioned EFB pokes and
performance queries as well as swap requests (i.e. the end of a frame -
we don't want the CPU potentially pumping out frames too quickly and the
GPU falling behind*), SyncGPU is called to wait for actual completion.

The overhead mainly comes from OpcodeDecoder_Preprocess (which is,
again, synchronous), as well as the actual copying.

Currently, display lists and such are escrowed from main memory even
though they usually won't change over the course of a frame, and
textures are not even though they might, resulting in a small chance of
graphical glitches.  When the texture locking (i.e. fault on write) code
lands, I can make this all correct and maybe a little faster.

* This suggests an alternate determinism method of just delaying results
until a short time before the end of each frame.  For all I know this
might mostly work - I haven't tried it - but if any significant work
hinges on the competion of render to texture etc., the frame will be
missed.
2014-09-28 21:34:29 -04:00
Tony Wasserka
1d23c2ca8b GPU: Only load the relevant color components upon writes to the tev color registers.
The other two components need not be valid upon write, hence loading them results in glitches.

Fixes issue 6783.
2014-09-21 10:38:22 +02:00
Rachel Bryk
f93aa7087c Kill Core::g_CoreStartupParameter. 2014-09-09 00:24:49 -04:00
Shawn Hoffman
4bf031c064 msvc: resolve all warnings in VideoCommon. 2014-08-19 22:33:46 -07:00
degasus
81ed17be53 avoid the extern keyword in .cpp files 2014-07-11 16:10:20 +02:00
Paul Olszewski
5d793881b0 Fix the capitalization of "GameCube" throughout the project. 2014-06-08 11:24:49 +09:00
Lioncash
776e36b10a Fix a typo in a BP register name (BPMEM_TX_SETLUT_4 -> BPMEM_TX_SETTLUT_4).
Also fixed the alignment of the register values.
2014-06-02 02:26:30 -04:00
Lioncash
12db989098 Add missing registers in GetBPRegInfo 2014-06-02 02:19:53 -04:00
Lioncash
c96407bd2a Make GetBPRegInfo just take two strings as parameters
Gets rid of the size parameters.
2014-05-29 19:44:14 -04:00
shuffle2
b58753bd69 Merge pull request #370 from Sonicadvance1/remove_specialized_memcmp
Removes ZeroFrog's "optimized" memcpy and memcmp functions.
2014-05-22 13:02:11 -07:00
Jasper St. Pierre
9d161b4170 BPStructs: Consistently put the two shared copy args first
And rename them so they make a bit more sense.
2014-05-20 11:28:15 -04:00
Jasper St. Pierre
1ae8edc1d0 BPStructs: Remove another function wrapper 2014-05-20 11:28:15 -04:00
Jasper St. Pierre
b1d3c5937a BPStructs: Move LoadBPReg here 2014-05-20 11:28:14 -04:00
Jasper St. Pierre
763ad77a1c BPStructs: Flatten out BPWritten 2014-05-20 11:28:14 -04:00
Jasper St. Pierre
07ab77d31c BPStructs: Reindent BPWritten 2014-05-20 11:28:08 -04:00
Jasper St. Pierre
c33a1b4b28 BPStructs: Document BPMEM_BP_MASK better 2014-05-20 11:26:31 -04:00
Jasper St. Pierre
2f122ea63c BPMemory: Fix "DISPLAYCOPYFILER" typo 2014-05-20 11:15:10 -04:00
Jasper St. Pierre
4e8e51b278 BPStructs: Remove calls to SetInterlacedMode when reloading state
SetInterlacedMode is a dummy no-op that does nothing.
2014-05-20 11:15:10 -04:00
Jasper St. Pierre
833b7ee584 BPFunctions: Remove the rest of GetConfig 2014-05-20 11:15:09 -04:00
Jasper St. Pierre
08611c3f36 PixelShaderManager: Fizzle out fog changes when disabled here
This lets us remove a use of GetConfig.
2014-05-20 11:15:09 -04:00
Jasper St. Pierre
fe645b888b BPFunctions: Remove use of a dumb method
GetPointer serves no purpose.
2014-05-20 11:15:08 -04:00
Ryan Houdek
a4bb0dafb4 Removes ZeroFrog's "optimized" memcpy and memcmp functions.
These were only compiled in on Windows and x86_32.
They provided "optimized" copies and compares based on blocksizes for the AMD Athlon and Duron CPU families.
The code was taken from something that AMD provides with a as-is license.
Just get rid of this crap.
2014-05-17 18:03:31 -05:00
Tony Wasserka
16d3dbc5ea BPMemory: Use BitField for the GenMode fields. 2014-04-21 22:34:23 +02:00
Tony Wasserka
16105db709 BPMemory: Make use of BitField in a number of structures. 2014-03-25 23:57:58 +01:00
Tony Wasserka
77a7bab5ae BPMemory: Use the new BitField class in two selected structures. 2014-03-25 23:57:57 +01:00
Matthew Parlane
31cfc73a09 Fixes spacing for "for", "while", "switch" and "if"
Also moved && and || to ends of lines instead of start.
Fixed misc vertical alignments and some { needed newlining.
2014-03-11 00:35:07 +13:00
Tillmann Karras
d802d39281 clang-modernize -use-nullptr
and s/\bNULL\b/nullptr/g for *.cpp/h/mm files not compiled on my machine
2014-03-09 21:14:26 +01:00
Lioncash
2afe215271 Convert all includes to relative paths. 2014-02-18 02:19:10 -05:00
degasus
1f4219b5b4 move perfquery enable checks into videocommon (caller side) 2014-02-15 11:33:43 +01:00
crudelios
cdfe58f7ed Rewrote bounding box algotithm. Fixes issues 5967, 6154, 6196, 6211.
Instead of being vertex-based, it is now primitive (point, line or dissected triangle) based, with proper clipping.
Also, screen position is now calculated based on viewport values, instead of "guesstimating".

This fixes many graphical glitches in Paper Mario: TTYD and Super Paper Mario.

Also, the new code allows Mickey's Magical Mirror and Disney's Hide & Sneak to work (mostly) bug-free. I changed their inis to use bbox.

These changes have a slight cost in performance when bbox is being used (rare), mostly due to the new clipping algorithm.

Please check for any regressions or crashes.
2014-01-25 15:36:23 +00:00
degasus
331af32038 fixup "Remove the ZTP speedup hack"
This fixes revision b49c09c36b67
2014-01-16 00:26:49 +01:00
Tony Wasserka
b49c09c36b Remove the ZTP speedup hack. Also remove useless debugging code, and a presumably outdated workaround (which was commented out).
Fixes issue 6875.
2014-01-16 00:11:12 +01:00
degasus
0f0a3cc509 ogl: clamp to edge for out of bound efb access
fixes issue 6898

OpenGL defaults are GL_REPEAT, which is even more unlikely than GL_CLAMP_TO_EDGE.
As I can't test the behavoir of the real hardware, I changed it to how it works before,
but I guess just clip the texture makes more sense.
2014-01-03 08:15:19 +01:00
Jasper St. Pierre
34692ab826 Remove unnecessary Src/ folders 2013-12-31 14:03:19 -05:00