9855 Commits

Author SHA1 Message Date
Wunk
1d4d421097
shader_jit_a64: Optimize MOVA dest-enable (#7122)
Rather than branching the 3 cases of dest-enablement, just emit a single
move-and-sign-extend instruction for each case.

From this review:
https://github.com/citra-emu/citra/pull/7002#discussion_r1381560584
2023-11-07 11:46:40 -08:00
JosJuice
3f4b57635e
android: Use case insensitivity in DocumentsTree (#7115)
* android: Unify DocumentNode's `key` and `name`

They're effectively the same data, just obtained in different ways.

* android: Remove getFilenameWithExtensions method

After the previous commit, there's only one remaining use of
getFilenameWithExtensions. Let's get rid of that one in favor of
DocumentFile.getName so we no longer need to do manual URI parsing.

* android: Use case insensitivity in DocumentsTree

External storage on Android is case insensitive. This is still the case
when accessing it through SAF. (Of course, SAF makes no guarantees about
whether the storage location picked by the user is backed by external
storage or whether it's case insensitive, but I'm just going to ignore
that for now because I am *so tired of SAF*)

Because the underlying file system is case insensitive, Citra's caching
layer that had to be implemented because SAF's performance is atrocious
also needs to be case insensitive. Otherwise, we get a problem in the
following scenario:

1. Citra wants to check if a particular folder exists in sdmc, and if
   not, create it.
2. The folder does exist, but it has a different capitalization than
   Citra expects, due to a mismatch between Citra's code and (typically)
   files dumped from a real 3DS using ThreeSD.
3. Citra tries to open the folder, but DocumentsTree fails to find it,
   because the case doesn't match.
4. Citra then tries to create the folder, but creating the folder fails,
   because the underlying filesystem considers the folder to exist.
5. The game fails to start.

(Sorry, did I say creating the folder fails? Actually, a new folder does
get created, with " (1)" appended to the end of the name. SAF makes no
guarantees whatsoever about what happens in this situation – it's all
determined by the storage provider!)

This commit makes the caching layer case insensitive so that the
described scenario will work better.
2023-11-07 11:46:25 -08:00
Steveice10
86566f1c14
build: Fortify non-MSVC builds. (#7120) 2023-11-06 17:55:41 -08:00
GPUCode
3f1f0aa7c2
arm: De-virtualize ThreadContext (#7119)
* arm: Move ARM_Interface to core namespace

* arm: De-virtualize ThreadContext
2023-11-06 17:55:30 -08:00
Wunk
8fe147b8f9
video_core: Use binary memory-literals for memory-sizes (#7127)
Replaces `... * 1024 * 1024` with `_MiB`/`_GiB` literals.
2023-11-06 23:38:54 +02:00
Steveice10
5193a5d222
build: Remove need for system Python to download Qt on macOS. (#7125) 2023-11-06 12:26:50 -08:00
GPUCode
1f6393e7d5
video_core: Refactor GLSL fragment emitter (#7093)
* video_core: Refactor GLSL fragment emitter

* shader: Add back custom normal maps
2023-11-06 12:26:28 -08:00
Steveice10
9b2a5926a6
frontend: Use inverted use_gles as a fallback for GL initialization. (#7117) 2023-11-05 17:23:54 -08:00
Wunk
e13735b624
video_core: Implement an arm64 shader-jit backend (#7002)
* externals: Add oaksim submodule

Used for emitting ARM64 assembly

* common: Implement aarch64 ABI

Utilize oaknut to implement a stack frame.

* tests: Allow shader-jit tests for x64 and a64

Run the shader-jit tests for both x86_64 and arm64 targets

* video_core: Initialize arm64 shader-jit backend

Passes all current unit tests!

* shader_jit_a64: protect/unprotect memory when jit-ing

Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.

* shader_jit_a64: Fix ARM64-Imm overflow

These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.

* shader_jit_a64: Fix Geometry shader conditional

* shader_jit_a64: Replace `ADRL` with `MOVP2R`

Fixes some immediate-generation exceptions.

* common/aarch64: Fix CallFarFunction

* shader_jit_a64: Optimize `SantitizedMul`

Co-authored-by: merryhime <merryhime@users.noreply.github.com>

* shader_jit_a64: Fix address register offset behavior

Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.

* shader_jit_a64: Fix `RET` address offset

A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.

* shader_jit_a64: Increase max program size

Tuned for A64 program size.

* shader_jit_a64: Use `UBFX` for extracting loop-state

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `CMP+B` to `CBNZ`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Use `FMOV` for `ONE` vector

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove x86-specific documentation

* shader_jit_a64: Use `UBFX` to extract exponent

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check

Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:

No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f

* shader_jit/tests: Add Dest Mask unit test

* shader_jit_a64: Fix Dest-Mask `BSL` operand order

Passes the dest-mask unit tests now.

* shader_jit_a64: Use `MOVI` for DestEnable mask

Accelerate certain cases of masking with MOVI as well

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests: Add source-swizzle unit test

This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.

* shader_jit_a64: Fix swizzle index generation

This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.

* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`

Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly.  This enabled the ability
emit assembly that is not directly supported by nihstro.

* shader_jit/tests: Add `CALL` unit-test

Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.

EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.

* shader_jit_a64: Fix nested `BL` subroutines

`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.

Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.

* shader_jit/tests: Allocate generated tests on heap

Each of these generated shader-test objects were causing the stack to
overflow.  Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.

* shader_jit_a64: Preserve `lr` register from external function calls

`EMIT` makes an external function call, and should be preserving `lr`

* shader_jit/tests: Add `MAD` unit-test

The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68

Instead, the program code is manually configured and added.

* shader_jit/tests: Fix uninitialized instructions

These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.

* shader_jit_a64: Remove unneeded `MOV`

Residue from the direct-port of x64 code.

* shader_jit_a64: Use `std::array` for `instr_table`

Add some type-safety and const-correctness around this type as well.

* shader_jit_a64: Avoid c-style offset casting

Add some more const-correctness to this function as well.

* video_core: Add arch preprocessor comments

* common/aarch64: Use X16 as the veneer register

https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard

* shader_jit/tests: Add uniform reading unit-test

Particularly to ensure that addresses are being properly truncated

* common/aarch64: Use `X0` as `ABI_RETURN`

`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.

* common/aarch64: Add veneer register note

`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.

* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`

* shader_jit_a64: Fix CALLU condition

Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!

---------

Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
2023-11-05 21:40:31 +01:00
Wunk
3218af38d0
renderer_vulkan: Add scissor and viewport to dynamic pipeline state (#7114)
Adds the current viewport and scissor to the dynamic pipeline state to
reduce redundant viewport/scissor assignments in the command buffer.
This greatly reduces the amount of API calls to `vkCmdSetViewport` and
`vkCmdSetScissor` by only emitting the API call when the state actually
changes.
2023-11-05 12:26:09 -08:00
Wunk
1cf64ffaef
vk_stream_buf: Allow dedicated allocations (#7103)
* vk_stream_buf: Avoid protected memory heaps

* Add an "Exclude" argument when finding a memory-type that avoids
  `VK_MEMORY_PROPERTY_PROTECTED_BIT` by default

* vk_stream_buf: Utilize dedicated allocations when preferred by driver

`VK_KHR_dedicated_allocation` is part of the core Vulkan 1.1
specification and should be utilized when `prefersDedicatedAllocation`
is set.
2023-11-05 12:25:59 -08:00
GPUCode
998b9a9525
kernel: Add ticks to low priority threads that arbitrate zero threads (#7096) 2023-11-05 00:20:55 +02:00
Steveice10
27bad3a699
audio_core: Replace AAC decoders with single FAAD2-based decoder. (#7098) 2023-11-04 14:56:13 -07:00
Tobias
1570aeffcb
game_list: Treat demos as applications (#7097)
* game_list: Treat demos as applications

Allows the dumping of RomFS from demos.

* game_list: Add TODO about using bitmasks for title ID high checks.

---------

Co-authored-by: Steveice10 <1269164+Steveice10@users.noreply.github.com>
2023-11-04 12:15:21 -07:00
Steveice10
09ee80f590
file_sys: Replace commented log lines from previous PR with trace logs. (#7109) 2023-11-04 20:37:55 +05:30
Wunk
b10f3d96f5
command_processor: Fix out-of-bounds float-uniform access (#7111)
Addresses:
https://github.com/citra-emu/citra/issues/6696
https://github.com/citra-emu/citra/issues/6871
2023-11-03 03:35:52 -07:00
Steveice10
b5d744bcae
ci: Work around macOS GitHub runner pip install failures. (#7110) 2023-11-03 03:35:32 -07:00
Castor215
89d5d4a2b6
externals: allow user to use system cubeb (#7107) 2023-11-02 17:33:40 -07:00
PabloMK7
4284893044
Implement RomFS cache and async reads. (#7089)
* Implement RomFS cache and async reads.

* Suggestions and fix compilation.

* Apply suggestions
2023-11-02 17:19:00 -07:00
Steveice10
79ea06b226
qt: Update to 6.6.0 (#7099) 2023-11-01 17:58:02 -07:00
Castor215
8d811913a5
externals: allow user to use system cryptopp (#7105) 2023-11-01 17:57:10 -07:00
Wunk
ac9d72a95c
vk_texture_runtime: Fix debug scope label lambda-capture (#7102)
`DebugScope` was capturing a `string_view` in a lambda which is only
valid during the scope of this ctor. When the lambda gets invoked at a
later time, it will read undefined garbage. The lambda needs to make a
deep copy of this `string_view` into a `string` so that it is valid by
the time the scheduler invokes this lambda.
2023-11-01 21:30:54 +01:00
Castor215
d3ce43782d
externals: allow users to use system libenet (#7100) 2023-10-31 14:01:50 -07:00
PabloMK7
597a2e8ead
Add missing FS:USER functions (#7051) 2023-10-31 14:01:25 -07:00
TGP17
b231a22ea5
Switch compiler to clang on Linux (#7077) 2023-10-25 11:00:52 -07:00
Steveice10
1110c01657
ci: Install Vulkan SPIRV-Tools on Windows to fix glslang install error. (#7092) 2023-10-25 11:00:10 -07:00
Steveice10
45ef11654a
audio_core: Clear time stretcher after flushing to avoid sample bleed. (#7081) 2023-10-24 17:22:10 -07:00
Dominik Kreutzer
259dbf17dc
citra-qt: ensure image interface is registered before starting game (#7090) 2023-10-24 17:21:38 -07:00
Castor215
ec55807669
build: fix build failure when not using precompiled headers (#7087)
Co-authored-by: vitor-k <vitor-kiguchi@hotmail.com>
2023-10-23 17:21:35 -03:00
GPUCode
36146459f8
renderer_vulkan: Fix screenshots under NVIDIA vulkan (#7082) 2023-10-22 22:53:14 +03:00
Wunk
597297ffb4
tests: Fix out-of-bounds access (#7085) 2023-10-22 11:07:06 -07:00
Castor215
4ac10c4a9d
externals: allow users to use system Zstandard (#7083) 2023-10-21 16:10:02 -07:00
Castor215
2416258117
externals: add overarching USE_SYSTEM_LIBS variable (#7078) 2023-10-20 17:02:20 -07:00
Steveice10
6d4e462e42
build: Ensure we default to Release build type. (#7080) 2023-10-19 15:02:49 -07:00
GPUCode
ef43776c7b
shader: Fix address register offset behavior in x64 Jit (#6942)
* shader: Fix address register offset behavior in x64 Jit

* shader: Remove redundant jump

* tests: Add address register tests

* shader: Remove additional pre-multiplications by 16

* tests: Add catch-stringifier for vec4f

* tests: Format
2023-10-18 19:41:36 +03:00
Steveice10
1caf569f16
externals: Update cryptopp-cmake and cryptopp. (#7041) 2023-10-17 11:03:03 -07:00
Castor215
2d83fff581
externals: allow user to use system glslang (#7075) 2023-10-17 11:02:50 -07:00
Steveice10
e49b3c75bd
externals: Make system dynamic library headers flags instead of auto-detect. (#7065) 2023-10-16 19:32:58 -07:00
Castor215
956b0868fd
externals: allow user to use system inih (#7073) 2023-10-16 19:31:56 -07:00
Steveice10
4c59443ed2
common: Add more robust ZSTD handling. (#7071) 2023-10-15 14:08:29 -07:00
Steveice10
07839fb3ce
qt: Add option to uninstall a game. (#7064)
* qt: Add option to uninstall a game.

* Address review comments.
2023-10-14 18:11:59 -07:00
Castor215
3d55270de6
externals: allow users to use system xbyak (#7068) 2023-10-13 15:03:50 -07:00
GPUCode
40ba5226c6
vk_instance: Perform vulkan logging as early as possible (#7058) 2023-10-11 15:11:43 -07:00
Castor215
775a25b492
externals: allow system cpp-cttplib to be used with both meson and cpp-httplib builds (#7062)
Co-authored-by: Violet Purcell <vimproved@inventati.org>
2023-10-11 14:43:36 -07:00
PabloMK7
897d1fa957
Implement more HTTP:C functionality (#7035)
* Implement missing http:c functionality.

* More implementation details and cleanup.

* Organize code

* Disable treat errors as warnings for httplib

* Fix defines

* Remove pragmas that do nothing and mark as SYSTEM

* Make httplib system

* Try to fix issue from httplib

* Apply suggestions

* Fix header ordering

* Fix compilation issue

* Create and use ctx.CommandID()

* Add and use Common::TruncateString

* Apply more suggestions

* Apply suggestions

* Fix compilation

* Apply suggestions

* Fix format

* Revert SplitURL to previous version

* Apply suggestions
2023-10-11 10:09:16 -07:00
SachinVin
1acb03b579
dsp_dsp.cpp: fix registering Interrupt handler on loading savestates (#7055) 2023-10-10 12:52:57 -07:00
Steveice10
4220f69c06
renderer/opengl: Deduce GLES from actual driver in use. (#7056) 2023-10-10 12:52:47 -07:00
PabloMK7
6264b6d43c
Use RunAsync in multiple socket operations (#7053)
* Use RunAsync in multiple socket operations

* EOF newline

* Fix linux compilation

* Fix compilation on macos
2023-10-09 14:59:08 -07:00
Steveice10
6cfd00e42d
qt: Partially fix Wayland support. (#7054)
* qt: Partially fix Wayland on NVIDIA.

* qt: Fix Vulkan under Wayland.

Showing and hiding the window here messes up the surface,
causing an instant crash on load.

* qt: Properly set up GLES context when requested.
2023-10-09 14:58:38 -07:00
Steveice10
6244f9e3fd
ci: Support Android x86_64 and optimize build caching. (#7045)
* android: Support x86_64 devices.

* ci: Improve ccache hits and stats.

* ci: Compress Android artifacts.

* ci: Re-enable PCH and set ccache sloppiness appropriately.
2023-10-08 23:56:01 -07:00