This was causing a race condition where the "absurdly large aux buffer"
panic alert would be triggered in the last bit of fifo processing on the
CPU thread in deterministic mode (i.e. netplay). SyncGPU is supposed to
move the auxiliary queue data to the beginning of the containing buffer
so we don't have to deal with wraparound; if GpuRunningState is false,
however, it just returns, because it's set to false by another thread -
thus it doesn't know whether RunGpuLoop is still executing (in which
case it can't just reset the pointers, because it may still be using the
buffer) or not (in which case the condition variable it normally waits
for to avoid the previous problem will never be signaled). However,
SyncGPU's caller PushFifoAuxBuffer wasn't aware of this, so if the
buffer was filling at just the right time, it'd stay full and that
function would complain that it was about to overflow it. Similar
problem with ReadDataFromFifoOnCPU afaik. Fix this by returning early
from those as well; other callers of SyncGPU should be safe. A
*slightly* cleaner alternative would be giving the CPU thread a way to
tell when RunGpuLoop has actually exited, but whatever, this works.
This drops the "feature" to load level 0 from the custom texture
and all other levels from the native one if the size matches.
But in my opinion, when a custom texture only provide one level,
no more should be used at all.
A number of games make an EFB copy in I4/I8 format, then use it as a
texture in C4/C8 format. Detect when this happens, and decode the copy on
the GPU using the specified palette.
This has a few advantages: it allows using EFB2Tex for a few more games,
it, it preserves the resolution of scaled EFB copies, and it's probably a
bit faster.
D3D only at the moment, but porting to OpenGL should be straightforward..
The obvious question here is, why does it matter if we round or truncate?
The key is that GC/Wii does fixed-point interpolation, where PC GPUs do
floating-point interpolation. Discarding fractional bits makes the conversion
from floating-point to fixed point give more consistent results.
I'm not confident this is really the right fix, or that my explanation is
completely correct; ideally, we don't want to depend on floating-point
interpolation at all.