27073 Commits

Author SHA1 Message Date
JosJuice
1c8ddcdda1 JitArm64: Implement memcheck for dcbz 2021-11-20 23:39:27 +01:00
JosJuice
4fe15e788f JitArm64: Implement memcheck for lmw/stmw 2021-11-20 23:39:27 +01:00
JosJuice
b4ffdce800 JitArm64: Implement memcheck for lXX/stX with update 2021-11-20 23:39:27 +01:00
JosJuice
662ae570a0 JitArm64: Make EmitBackpatchRoutine support saving W0
Being able to preserve the address register is useful for the
next commit, and W0 is the address register used for loads. Saving
the address register used for stores, W1, was already supported.
2021-11-20 23:39:27 +01:00
JosJuice
e316d0e94f JitArm64: Don't update dest reg when load triggers exception, part 2
If a host register has been newly allocated for the destination
guest register, and the load triggers an exception, we must make
sure to not write the old value in the host register into ppcState.
This commit achieves this by not marking the register as dirty
until after the load is done.
2021-11-20 23:39:26 +01:00
JosJuice
96190887ce JitArm64: Don't update dest reg when load triggers exception
Fixes a problem introduced in the previous commit.
2021-11-20 23:39:26 +01:00
JosJuice
ab1ceee16f JitArm64: Implement memcheck for lXX/stX without update 2021-11-20 23:39:26 +01:00
JosJuice
8c905e152a JitArm64: Make WriteConditionalExceptionExit more flexible
You can now specify an already allocated register for it to use as a
temporary register, and it also supports being called while in farcode.
2021-11-20 23:37:02 +01:00
JMC47
e5a4a86672
Merge pull request #10055 from JosJuice/jitarm64-reuse-memory
JitArm64: Codegen space reuse
2021-11-20 17:35:24 -05:00
DevJPM
61cfd8696e Fix CPU Core Count detection and Enable Parallel Shader Compilation
This does this following things:

- Default to the runtime automatic number of threads for pre-compiling shaders
- Adds a distinct automatic thread count computation for pre-compilation  (which has less other things going on
and should scale better beyond 4 cores)
- Removes the unused logical_core_count field from the CPU detection
- Changes the semantics of num_cores from maximaum addressable number of cores to actually available CPU cores
(which is also how it was actually used)
- Updates the computation of the HTT flag now that AMD no longer lies about it for its Zen processors
- Background shader compilation is *not* enabled by default
2021-11-20 16:08:10 +01:00
Filoppi
1badceb455 ControllerInterface: fix UpdateReferences() deadlock
Removed useless locks to DeviceContainer::m_devices_mutex, as they were all already protected by m_devices_population_mutex.
We have no interest in blocking other threads that were potentially reading devices at the same time so this seems fine.
This simplifies the code, and I've adjusted a few comments which mentioned possible deadlock that should now be totally gone.

The deadlock could have happen if a thread directly called EmulatedController::UpdateReferences(), while another another thread also reached EmulatedController::UpdateReferences() within a call to ControllerInterface::UpdateDevices(), as the mentioned function locked both the DeviceContainer::m_devices_mutex and s_get_state_mutex at the same time.

The deadlock was frequent on game emulation startup on Android, due to the UpdateReferences() call in InputConfig::LoadConfig() and the UI thread triggering calls to ControllerInterface::UpdateDevices().
It could also have happened on Desktop if a user pressed "Refresh Devices" manually in the UI while the input config was loading.

Also brought some UpdateReferences() comments and thread safety fixes from https://github.com/dolphin-emu/dolphin/pull/9489
2021-11-20 16:54:36 +02:00
Admiral H. Curtiss
0fc563ee2e
RiivolutionPatcher: Use case-insensitive filename comparison when searching for files in a folder patch. 2021-11-20 01:15:18 +01:00
JMC47
dbaebdc585
Merge pull request #10222 from phire/fix-copy-filter-clamping
Fix copy filter clamping
2021-11-18 17:48:33 -05:00
Pokechu22
94ccf765af Add option for setting the PNG zlib compression level 2021-11-18 13:10:22 -08:00
Pokechu22
95b9941044 Use Fast Texture Sampling by default
This commit changes the default value of Fast Texture Sampling to true, and also moves the setting that controls it to the experimental section of the advanced tab.  This is its own commit so that it can be easily reverted when we want to default to Manual Texture Sampling.

Co-authored-by: JosJuice <josjuice@gmail.com>
2021-11-17 21:29:57 -08:00
Pokechu22
1adff1c467 VideoCommon: Skip textureQueryLevels if it doesn't exist 2021-11-17 21:28:39 -08:00
Pokechu22
bdcfb31187 VideoCommon: Handle custom texture sizes correctly
Specifically, when using Manual Texture Sampling, if textures sizes don't match the size the game specifies, things previously broke.  That can happen with custom textures, and also with scaled EFB copies at non-native IRs.  It breaks most obviously by not scaling the texture coordinates (so only part of the texture shows up), but the hardware wrapping functionality also assumes texture sizes are a power of 2 (or else it will behave weirdly in a way that matches how hardware behaves weirdly).  The fix is to provide alternative texture wrapping logic when custom texture sizes are possible.
2021-11-17 21:28:36 -08:00
Pokechu22
93eea7cb13 VideoCommon: Add option to use old behavior (Fast Texture Sampling)
Co-authored-by: JosJuice <josjuice@gmail.com>
2021-11-17 21:27:32 -08:00
Pokechu22
ee80298ca4 VideoCommon: Implement diagonal LOD
Note that both GLSL and HLSL provide a fwidth (fragment width) function defined as `fwidth(p) = abs(dFdx(p)) + abs(dFdy(p))`.  However, it's easy enough to implement this ourselves (and it makes the code a bit more obvious).
2021-11-17 20:04:34 -08:00
Pokechu22
b9288212a0 Software: Adjust diagonal LOD implementation
This produces behavior matching the behavior on hardware (see Wario's Gold Mine in Mario Kart Wii).
2021-11-17 20:04:34 -08:00
Pokechu22
51e3334526 VideoCommon: Use coarse derivatives for Manual Texture Sampling if possible 2021-11-17 20:04:34 -08:00
Pokechu22
ddf2691395 VideoCommon: Manually handle texture wrapping and sampling 2021-11-17 20:04:34 -08:00
Pokechu22
4a9b26de86 VideoCommon: Expose SamplerState to shaders
The benefit to exposing this over the raw BP state is that adjustments Dolphin makes, such as LOD biases from arbitrary mipmap detection, will work properly.
2021-11-17 20:04:34 -08:00
Pokechu22
9ef228503a VideoCommon: Provide raw texdims to shaders 2021-11-17 20:04:34 -08:00
Pokechu22
a273b65566 RenderState: Use operator== for operator!= and adjust constructors 2021-11-17 20:04:34 -08:00
Pokechu22
6236a0d494 Eliminate SamplerCommon 2021-11-17 20:04:34 -08:00
Pokechu22
3096f77ba0 Eliminate SamplerCommon::AreBpTexMode0MipmapsEnabled
This was added in 0b9a72a62d38481ff08f742b2318d07f29ff5dff but became irrelevant in 70f9fc4e7526fc9cfc008a43cd33229f62be99b6 as the check is now self-explanatory due to a rejiggering of the bitfields.
2021-11-17 20:04:34 -08:00
Pokechu22
d2041b4c2a VideoCommon: Add signed version of BitfieldExtract 2021-11-17 20:04:33 -08:00
Pokechu22
555a93057c VideoCommon: Allow BitfieldExtract in specialized shaders 2021-11-17 20:04:33 -08:00
Scott Mansell
9bbc843542 VideoSoftware: Fix copy filter clamping 2021-11-17 09:29:16 +13:00
Scott Mansell
7128befb39 Fix copy filter clamping regression in Spyro
This fixes horizontal lines in the bloom effect of Spyro: A Hero's Tail,
which is a regression caused by PR #10204

Screenshot of regression:
https://user-images.githubusercontent.com/138484/142030503-90fcd8d5-63d3-4820-874a-72e9be0c4768.png

Fixed:
https://user-images.githubusercontent.com/138484/142031598-b85ff55c-1302-4e4d-bcb2-57848974056b.png

Spyro uses an 640x80 pixel sub-buffer within the EFB to calculate
it's bloom effects, which it places below the main 640x448 buffer.

EFB layout:
https://user-images.githubusercontent.com/138484/142030573-e933b6ae-c37e-4be6-86d4-0bc779b92535.png
Note: Colors are wrong because the main color buffer uses RGBA6,
      while the bloom is calculated in RGB8

This allows it to do bloom without backing up part of the EFB to
main memory, as most games do.

But, since some of the sub-buffers used in the bloom effect are taller
than 80 pixels, they need to be sliced up into smaller sub, sub buffers
which get combined later when copied to main memory.

At one point, a 320x224 buffer is broken up into 320x80, 320x64 and
320x80 slices. These are copied out with the copy filter set to a
vertical blur.

Because there was an off-by-one errror in the clamping coordinates,
the bottom line of the color buffer would be blurred into
the top of each slice.

Final combined EFB copy:
https://user-images.githubusercontent.com/138484/142031360-2c076839-7c96-4b3b-a093-d899d0a2c7ae.png

Fixed version:
https://user-images.githubusercontent.com/138484/142031370-72e41a35-3b3e-4662-a483-79203e357ecc.png

Before #10204 the copy filter wasn't enabled for efb copies, and most
other games don't do this type of slicing.

FIFO CI shows that a few other games are effected, it's always just a minor difference to the top line where there was previously a slight hint of garbage.
2021-11-17 06:12:46 +13:00
JMC47
26fdc19c8d
Merge pull request #10204 from Pokechu22/efb-copy-filter
VideoCommon: Use the copy filter for EFB copies as well as XFB copies
2021-11-15 17:04:27 -05:00
Shawn Hoffman
a69adafdd8 msbuild: use /external:anglebrackets
Revert usage of ExternalIncludePath, as that var is specifically
for external includes which do not get scanned for changes.
2021-11-15 00:33:51 -08:00
JosJuice
da0be24b2f DSP: Reword inappropriate references to Global User Directory
These messages apply to the User directory regardless of
whether it's global or local, so we shouldn't specify "global".

Also changing "directory" to "folder", just for consistency
with "GC folder" in the same sentence.
2021-11-14 22:55:50 +01:00
OatmealDome
2209dc0355 MoltenVK: Use an external project instead of a pre-compiled dylib
Also, update MoltenVK to match Vulkan SDK 1.2.189.
2021-11-13 11:43:23 -08:00
JosJuice
3e84279919
Merge pull request #10208 from thatSteveFan/patch-1
Minor comment fix in Matrix.cpp
2021-11-13 15:09:28 +01:00
Pokechu22
868de78f16 VideoCommon: Use the copy filter for EFB copies as well as XFB copies
This fixes the pink screens in EA Sports Active.  See https://gist.github.com/Pokechu22/49455f9094ed0ff017da64e3f7aa0404 for details.
2021-11-11 17:22:50 -08:00
Shawn Hoffman
58dc9e7049 msvc: fix compile warning on arm64 2021-11-11 08:01:39 -08:00
Shawn Hoffman
4008188654 msvc: update to vs2022 and windows sdk 10.0.22000 2021-11-11 08:01:26 -08:00
thatSteveFan
834a59d89b
Minor comment fix in Matrix.cpp
Fix comment to say (NxM times MxP).
2021-11-08 15:30:39 -08:00
Merry
850d281cb8 Jit_Integer: Fix pure rotation rlwimix case
We are writing to a.
2021-11-07 14:10:19 +00:00
Mai M
6c72e6814d
Merge pull request #10169 from leoetlino/fmt-localtime
Use fmt::localtime instead of thread-unsafe std::localtime
2021-11-07 00:08:14 -04:00
Mai M
c89226cbd4
Merge pull request #10203 from merryhime/ctx_lr
MachineContext: Correct CTX_LR for Apple ARM64
2021-11-07 00:06:24 -04:00
Merry
c385e7e57b MachineContext: Correct CTX_LR for Apple ARM64 2021-11-06 21:46:35 +00:00
Merry
9c75957319 JitArm64_FloatingPoint: Implement fctiwx in ARM64 JIT
We implement this by first rounding to nearest integer using the current
rouding mode, then converting this value from floating point to an integral
value.
2021-11-06 20:54:26 +00:00
Merry
7c2b09e156 Arm64Emitter: Add FRINTI instruction 2021-11-06 19:15:26 +00:00
Léo Lam
2c5d11cace
Merge pull request #9721 from linkmauve/fix-warnings
Fix some warnings found with gcc 11 and ffmpeg master
2021-11-06 14:38:07 +01:00
Sintendo
dfb32040bf Jit64: divwx - Micro-optimize division by 2
Prefer using eax to isolate the sign bit. This saves a byte when the
destination ends up as r8-15, because those require a REX prefix.

Before:
41 8B C5             mov         eax,r13d
41 C1 ED 1F          shr         r13d,1Fh
44 03 E8             add         r13d,eax
41 D1 FD             sar         r13d,1

After:
41 8B C5             mov         eax,r13d
C1 E8 1F             shr         eax,1Fh
44 03 E8             add         r13d,eax
41 D1 FD             sar         r13d,1
2021-11-03 21:01:41 +01:00
Sintendo
f18f6cd0a2 Jit64: divwx - Improve comments 2021-11-02 21:50:28 +01:00
Emmanuel Gil Peyrot
5a1333026b VideoCommon: Add missing algorithm include for std::none_of
Otherwise this is an error on gcc/libstdc++, and there are no transitive
includes for this header.
2021-11-02 13:50:21 +01:00