Being able to preserve the address register is useful for the
next commit, and W0 is the address register used for loads. Saving
the address register used for stores, W1, was already supported.
If a host register has been newly allocated for the destination
guest register, and the load triggers an exception, we must make
sure to not write the old value in the host register into ppcState.
This commit achieves this by not marking the register as dirty
until after the load is done.
This does this following things:
- Default to the runtime automatic number of threads for pre-compiling shaders
- Adds a distinct automatic thread count computation for pre-compilation (which has less other things going on
and should scale better beyond 4 cores)
- Removes the unused logical_core_count field from the CPU detection
- Changes the semantics of num_cores from maximaum addressable number of cores to actually available CPU cores
(which is also how it was actually used)
- Updates the computation of the HTT flag now that AMD no longer lies about it for its Zen processors
- Background shader compilation is *not* enabled by default
Removed useless locks to DeviceContainer::m_devices_mutex, as they were all already protected by m_devices_population_mutex.
We have no interest in blocking other threads that were potentially reading devices at the same time so this seems fine.
This simplifies the code, and I've adjusted a few comments which mentioned possible deadlock that should now be totally gone.
The deadlock could have happen if a thread directly called EmulatedController::UpdateReferences(), while another another thread also reached EmulatedController::UpdateReferences() within a call to ControllerInterface::UpdateDevices(), as the mentioned function locked both the DeviceContainer::m_devices_mutex and s_get_state_mutex at the same time.
The deadlock was frequent on game emulation startup on Android, due to the UpdateReferences() call in InputConfig::LoadConfig() and the UI thread triggering calls to ControllerInterface::UpdateDevices().
It could also have happened on Desktop if a user pressed "Refresh Devices" manually in the UI while the input config was loading.
Also brought some UpdateReferences() comments and thread safety fixes from https://github.com/dolphin-emu/dolphin/pull/9489
This commit changes the default value of Fast Texture Sampling to true, and also moves the setting that controls it to the experimental section of the advanced tab. This is its own commit so that it can be easily reverted when we want to default to Manual Texture Sampling.
Co-authored-by: JosJuice <josjuice@gmail.com>
Specifically, when using Manual Texture Sampling, if textures sizes don't match the size the game specifies, things previously broke. That can happen with custom textures, and also with scaled EFB copies at non-native IRs. It breaks most obviously by not scaling the texture coordinates (so only part of the texture shows up), but the hardware wrapping functionality also assumes texture sizes are a power of 2 (or else it will behave weirdly in a way that matches how hardware behaves weirdly). The fix is to provide alternative texture wrapping logic when custom texture sizes are possible.
Note that both GLSL and HLSL provide a fwidth (fragment width) function defined as `fwidth(p) = abs(dFdx(p)) + abs(dFdy(p))`. However, it's easy enough to implement this ourselves (and it makes the code a bit more obvious).
The benefit to exposing this over the raw BP state is that adjustments Dolphin makes, such as LOD biases from arbitrary mipmap detection, will work properly.
This was added in 0b9a72a62d38481ff08f742b2318d07f29ff5dff but became irrelevant in 70f9fc4e7526fc9cfc008a43cd33229f62be99b6 as the check is now self-explanatory due to a rejiggering of the bitfields.
These messages apply to the User directory regardless of
whether it's global or local, so we shouldn't specify "global".
Also changing "directory" to "folder", just for consistency
with "GC folder" in the same sentence.
We implement this by first rounding to nearest integer using the current
rouding mode, then converting this value from floating point to an integral
value.
Prefer using eax to isolate the sign bit. This saves a byte when the
destination ends up as r8-15, because those require a REX prefix.
Before:
41 8B C5 mov eax,r13d
41 C1 ED 1F shr r13d,1Fh
44 03 E8 add r13d,eax
41 D1 FD sar r13d,1
After:
41 8B C5 mov eax,r13d
C1 E8 1F shr eax,1Fh
44 03 E8 add r13d,eax
41 D1 FD sar r13d,1