Commit Graph

913 Commits

Author SHA1 Message Date
PixelyIon
a6599c30b4
Correct IntervalMap insertion end calculation
The `end` pointer for `interval` was incorrectly calculated as `interval.data() + interval.size_bytes()` which would be incorrect when the interval span type is not `u8` as the pointer derived from `interval.data()` would be a pointer to the span type rather than a byte pointer and be subject to arithmetic of that object's size rather than in terms of a byte.
2022-08-06 22:18:42 +05:30
PixelyIon
b0910e7b1a
Avoid locking Texture/Buffer in trap handler
We generally don't need to lock the `Texture`/`Buffer` in the trap handler, this is particularly problematic now as we hold the lock for the duration of a submission of any workloads. This leads to a large amount of contention for the lock and stalling in the signal handler when the resource may be `Clean` and can simply be switched over to `CpuDirty` without locking and utilizing atomics which is what this commit addresses.
2022-08-06 22:18:42 +05:30
PixelyIon
a60d6ec58f
Replace host immutability FenceCycle with GPU usage tracking
We utilized a `FenceCycle` to keep track of if the buffer was mutable or not and introduced another cycle to track GPU-side requirements only on fulfillment of which could the buffer be utilized on the host but due to the recent change in the behavior this system ended up being unoptimal. 

This commit replaces the cycle with a boolean tracking if there are any usages of the resource on the GPU within the current context that may prevent it from being mutated on the CPU. The fence of the context is simply attached to the buffer based off this which was allowed as the new behavior of buffer fences matches all the requirements for this.
2022-08-06 22:18:42 +05:30
PixelyIon
217d484cba
Abstract TextureView/BufferDelegate locking into LockableSharedPtr
An atomic transactional loop was performed on the backing `std::shared_ptr` inside `BufferView`/`TextureView`'s `lock`/`LockWithTag`/`try_lock` functions, these locks utilized `std::atomic_load` for atomically loading the value from the `shared_ptr` recursively till it was the same value pre/post-locking. 

This commit abstracts the locking functionality of `TextureView`/`BufferDelegate` into `LockableSharedPtr` to avoid code duplication and removes the usage of `std::atomic_load` in either case as it is not necessary due to the implicit memory barrier provided by locking a mutex.
2022-08-06 22:18:42 +05:30
PixelyIon
2d08886e4e
Utilize TextureView rather than Texture for presentation
`PresentationEngine` and `GraphicBufferProducer` methods that utilized textures for the surface utilized the `Texture` type rather than the `TextureView` type, this was never correct but at the time of authoring this code `TextureView` was not finalized and in a major flux which is why it was not utilized and `Texture` was utilized instead. Now that is is far more stable, it has been replaced with `TextureView`.
2022-08-06 22:18:42 +05:30
PixelyIon
d7399e33c1
Avoid waiting on mutex in PresentationEngine::Present
We want to block on the host thread during presentation while the host surface isn't present to implicitly pause the game, this can end up being fairly costly as it involves locking the `PresentationEngine` mutex which can lead to a lot of contention with the presentation thread. This fixes the issue by polling if there is a surface and only if there isn't then doing the wait as it isn't mandatory to wait always, we'll eventually run into the guest thread stalling.
2022-08-06 22:18:42 +05:30
PixelyIon
30475ffc43
Fix queueBuffer GraphicBuffer Compatibility Check
Newer versions of the Deko3D homebrew were crashing due to this check and it was discovered that the check was incorrect and rather than comparing the `NvSurface` what had to be compared was the `GraphicBuffer` associated with the slot directly.

Co-authored-by: lynxnb <niccolo.betto@gmail.com>
2022-08-06 22:18:42 +05:30
PixelyIon
c2685d5f5c
Fix consistency issues with external project copyright headers
The copyright headers for external project such as yuzu/Ryujinx were inconsistent in ordering, Skyline should always be the first item in the list. In addition, they didn't always link to the project's GitHub which has also been fixed.
2022-08-06 22:18:42 +05:30
PixelyIon
0ac5f4ce27
Lock TextureManager/BufferManager during submission
Multiple threads concurrently accessing the `TextureManager`/`BufferManager` (Referred to as "resource managers") has a potential deadlock with a resource being locked while acquiring the resource manager lock while the thread owning it tries to acquire a lock on the resource resulting in a deadlock.

This has been fixed with locking of resource manager now being externally handled which ensures it can be locked prior to locking any resources, `CommandExecutor` provides accessors for retrieving the resource manager which automatically handles locking aside doing so on attachment of resources.
2022-08-06 22:18:42 +05:30
PixelyIon
1239907ce8
Rework Texture & Buffer for Context and FenceCycle Chaining
GPU resources have been designed with locking by fences in mind, fences were treated as implicit locks on a GPU, design paradigms such as `GraphicsContext` simply unlocking the texture mutex after attaching it which would set the fence cycle were considered fine prior but are unoptimal as it enforces that a `FenceCycle` effectively ensures exclusivity. This conflates the function of a mutex which is mutual exclusion and that of the fence which is to track GPU-side completion and led to tying if it was acceptable to use a GPU resource to GPU completion rather than simply if it was not currently being used by the CPU which is the function of the mutex.

This rework fixes this with the groundwork that has been laid with previous commits, as `Context` semantics are utilized to move back to using mutexes for locking of resources and tracking the usage on the GPU in a cleaner way rather than arbitrary fence comparisons. This also leads to cleaning up a lot of methods that involved usage of fences that no longer require it and therefore can be entirely removed, further cleaning up the codebase. It also opens the door for future improvements such as the removal of `hostImmutableCycle` and replacing them with better solutions, the implementation of which is broken at the moment regardless.

While moving to `Context`-based locking the question of multiple GPU workloads being in-flight while using overlapping resources came up which brought a fundamental limitation of `FenceCycle` to light which was that only one resource could be concurrently attached to a cycle and it could not adequately represent multi-cycle dependencies. `FenceCycle` chaining was designed to fix this inadequacy and allows for several different GPU workloads to be in-flight concurrently while utilizing the same resources as long as they can ensure GPU-GPU synchronization.
2022-08-06 22:18:42 +05:30
PixelyIon
07d45ee504
Introduce FenceCycle Chaining
If we want to allow submitting multiple pieces of work to the GPU at once while still requiring CPU synchronization, we'll need to track all past fence cycles associated with a resource alongside the current one. To solve this the concept of chaining fences has been introduced, fences from past usages can be chained to the latest fence which'll then recursively forward operations to chained fences.

This change also ends up mandating a move away from `FenceCycleDependency` as it would prevent fences from concurrently locking the same resources which is required for chaining to work as two fences being chained fundamentally means they're locking the same resources. The `AtomicForwardList` is therefore used as the new container.
2022-08-06 22:18:42 +05:30
PixelyIon
cf9e31c1eb
Implement Atomic Forward List
An implementation of a singly-linked list with atomic access to allow for lock-free access semantics, it eliminates the requirement for a mutex which can introduce additional consideration for synchronization.
2022-08-06 22:18:42 +05:30
PixelyIon
6b9269b88e
Introduce Context semantics to GPU resource locking
Resources on the GPU can be fairly convoluted and involve overlaps which can lead to the same GPU resources being utilized with different views, we previously utilized fences to lock resources to prevent concurrent access but this was overly harsh as it would block usage of resources till GPU completion of the commands associated with a resource.

Fences have now been replaced with locks but locks run into the issue of being per-view and therefore to add a common object for tracking usage the concept of "tags" was introduced to track a single context so locks can be skipped if they're from the same context. This is important to prevent a deadlock when locking a resource which has been already locked from the current context with a different view.
2022-08-06 22:18:42 +05:30
PixelyIon
3139889a09
Implement Asynchronous Presentation
We currently present all frames synchronously on the thread that calls into SurfaceFlinger functions, this is unoptimal as it doesn't match guest behavior which can lead to delaying the guest from working on the next frame. This commit queuing up frames to non-blocking and handles all waiting then presenting the frame on a dedicated thread.
2022-08-06 22:18:42 +05:30
PixelyIon
6e09dc5204
Fix thread name setting
We utilize `pthread_setname_np` to set the thread names but didn't check for any errors which resulted in the `Skyline-Choreographer` and `ChannelCmdFifo` not having proper names as they exceeded the 16 character limit on thread names for the pthread function.  This has now been fixed by changing the names and introducing error checking to invocations of this function.
2022-08-06 22:18:42 +05:30
PixelyIon
7a0cfb484c
Add NPOT AlignUp utility
All our normal alignment functions are designed to only handle power of 2 (`POT`) multiples as we only align or check alignment to `POT` multiples but there are cases where this is not possible and we deal with `NPOT` multiples which is why this function is required.
2022-08-06 22:18:42 +05:30
PixelyIon
662ea532d8
Skip waiting on host GPU after command buffer submission
We waited on the host GPU after `Execute` but this isn't optimal as it causes a major stall on the CPU which can lead to several adverse effects such as downclocking by the governor and losing the opportunity to work in parallel with the GPU.

This has now been fixed by splitting `Execute`'s functionality into two functions: `Submit` and `SubmitWithFlush` which both execute all nodes and submit the resulting command buffer to the GPU but flushing will wait on the GPU to complete while the non-flush variant will not wait and work ahead of the GPU.
2022-08-06 22:18:42 +05:30
PixelyIon
5129d2ae78
Add move-assignment semantics to ActiveCommandBuffer/MegaBuffer
We need move-assignment semantics to viably utilize these objects as class members, they cannot be replaced without move-assign (or copy-assign but that is undesirable here). This commit fixes that by introducing a move assignment operator to them while making the `slot` a pointer which has the necessary nullability semantics.
2022-08-06 22:18:42 +05:30
lynxnb
240e7033d7 Support loading a user-selected driver during vulkan initialization 2022-08-06 22:00:19 +05:30
lynxnb
1815199d2b Add utilities for reading and installing gpu driver packages 2022-08-06 22:00:19 +05:30
Billy Laws
f52ea7bddb Make deferred draw and constant buffer updates reentrant-safe
At some point we will call Submit within draws or constant buffer updates, to avoid any infinite recursion mark draw/cbuf pending as false before performing any operation
2022-07-29 20:07:14 +01:00
Billy Laws
dbb684835f Fix depthClampDisable register offset in Maxwell 3D 2022-07-29 20:07:14 +01:00
Billy Laws
7fd9d347e3 Use per-RT blend enable registers even when independent blend is disabled
The common blend enable register seems to be used for something else. This is required for blending to work correctly in OpenGL games
2022-07-29 20:07:14 +01:00
Billy Laws
048c2fdd29 Fix Vulkan framebuffer dimensions calculations
The framebuffer needs to be large enough to contain both the render area extent and offset
2022-07-29 20:07:14 +01:00
Billy Laws
0e1aa765fc Prevent CNTVCT_EL0 reads from being optimised out by the compiler
Without this the compiler will assume the read always produces the same value, causing issues when the register is used to time function execution
2022-07-29 20:07:14 +01:00
lynxnb
d183d14e2a Make accesses to setting values thread-safe 2022-07-26 20:16:24 +05:30
lynxnb
5aa2a4cd1c Rename SettingsValues to NativeSettings
The previous name was chosen as an afterthought and didn't clearly indicate what the purpose of the class is. We needed a separate, simple class without delegates members (like PreferenceSettings), so that its fields can be easily accessed via JNI to get settings values from native code.
2022-07-26 20:16:24 +05:30
lynxnb
f734c4d145 Make log level setting changes immediately active 2022-07-26 20:16:24 +05:30
lynxnb
2840a126dd Introduce AndroidSettings class and use inheritance
The `Settings` class now has a pure virtual `Update` method, and uses inheritance over template specialization for platform-specific behavior override.
2022-07-26 20:16:24 +05:30
lynxnb
3905728447 Make every setting observable individually
A `Setting` delegate class has been introduced, holding the raw value of the setting and adding support for registering callbacks to that setting. Callbacks will then be called when the value of that setting changes.
As a result of this, raw setting values have been made accessible through pointer dereference semantics.
2022-07-26 20:16:24 +05:30
lynxnb
3b27540250 Rename operationMode setting to isDocked 2022-07-26 20:16:24 +05:30
lynxnb
69cf25b1a7 Initial support for updating settings during emulation + observing settings changes 2022-07-26 20:16:24 +05:30
lynxnb
c5dde5953a Rework how settings are shared between Kotlin and native side
Settings are now shared to the native side by passing an instance of the Kotlin's `Settings` class. This way the C++ `Settings` class doesn't need to parse the SharedPreferences xml anymore.
2022-07-26 20:16:24 +05:30
lynxnb
cbc896c8f8 Fix waitForFences crash on Mali drivers
Mali GPU drivers utilize the `ppoll()` syscall inside `waitForFences` which isn't correctly restarted after a signal, which we can receive at any time on a guest thread. This commit fixes that by recursively calling the function on failure till it succeeds or returns an unexpected error.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-07-14 20:34:16 +02:00
MCredstoner2004
942e22f275 Write ApplicationErrorArg ErrorApplets to log
These applets are used by applications to display a custom error message to the user. Both the error message and the detailed error message are printed to the error log.


Co-authored-by: lynxnb <niccolo.betto@gmail.com>
2022-07-02 09:48:59 +05:30
MCredstoner2004
f9a0394577
Implement Software Keyboard applet
This implements the non-inline version of the Software Keyboard (swkbd) applet, which games use to get text input from the user.
2022-07-01 15:19:53 -05:00
Billy Laws
a0275418d6 Add a single-header linear allocator implementation
This conforms to the C++ 'Allocator' named requirement allowing it to be used with any STL type and allows drastically reducing allocation times in cases which are suited for linear allocation.
2022-06-28 21:33:04 +01:00
Billy Laws
e816256220 Add blend, scissor, viewport and vertex state to shader hash
These caused a ton of additional comparisons in Zelda Link's Awakening as many shaders would have the same hash.
2022-06-28 21:32:59 +01:00
lynxnb
e6cfdeb06a Fix non-indexed quad draws
Certain non-indexed quad draws would mistakenly take the indexed quad path because of the assumption that they would not have a bound index buffer. This resulted in a crash for most games using quads due to a faulty exception `Indexed quad conversion is not supported`, when in fact they were not using indexed quads.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-06-23 10:57:11 +02:00
lynxnb
8fc3bc75f4 Allow providing an index type to calculate quad conversion buffer size 2022-06-23 00:15:44 +02:00
Billy Laws
7709dc8cf6 Rewrite buffer megabuffering to be per view and more efficient
This commit implements several key optimisations in megabuffering that are all inherently interlinked.
- Megabuffering is moved from per-buffer to per-view copies, this makes megabuffering possible for small views into larger underlying buffers which is often the case with even the simplest of games,
- Megabuffering is no longer the default option, it is only enabled for buffer views that have had inline GPU writes applied to them in the past as that is the only case where they are beneficial. In any other case the cost of copying, even with a 128KiB limit can be significant.
- With both of these changes, there is now possibility for overlapping views where one uses megabuffering and one does not. In order to allow GPU inline writes to work consistently in such cases a system of 'host immutability' has been implemented, when a buffer is marked as host immutable for a given cycle, all writes to the buffer from that point to the point the cycle is signalled will be performed on the GPU, ensuring that the backing contents are correctly sequenced
2022-06-11 17:05:39 +05:30
MCredstoner2004
2e356b8f0b Use spans instead of ptr and size in kernel memory 2022-06-11 17:05:39 +05:30
Billy Laws
22039df301 Transition to std::unordered_set for buffer view tracking
Has the same guarantees of pointer stabilty while also being significantly faster in cases where a buffer has thousands of views. This is the case in RE4 and this change leads to an almost 1000% performance improvement in that game.
2022-06-09 23:52:13 +01:00
PixelyIon
a5ca370c36 Implement thread-safe MegaBuffer pool
We currently have a global `MegaBuffer` instance that is shared across all channels, this is very problematic as `MegaBuffer` fundamentally works like a state machine with allocations (especially resetting/freeing) and is thread-specific. Therefore, we now have a pool of several `MegaBuffer`s which is allocated from by the `CommandExecutor` and kept channel specific as a result which also limits its usage to a single thread, this allows for individually resetting or freeing any allocations.
2022-06-05 13:04:40 +05:30
PixelyIon
3e08494146 Minor CommandScheduler refactor
There was a lot of redundant code in the `CommandScheduler` when the same functionality could be achieved with much shorter and cleaner code which this commit fixes. This includes no changes to the user-facing API and does not require any changes on the user side as a result.
2022-06-05 13:04:40 +05:30
Billy Laws
bd99d79b51 OsFileSystem: Close directory after file listing is finished 2022-06-04 21:46:23 +01:00
Billy Laws
4888919515 Stub GetFriendInvitationStorageChannelEvent (0x8C) 2022-06-04 21:45:53 +01:00
Billy Laws
d9f6540831 Fix VFS CreateFile directory creation 2022-06-04 19:19:30 +01:00
Billy Laws
f5bcb40c41 Return number of audio outs in ListAudioOuts 2022-06-04 19:12:37 +01:00
Billy Laws
5d6902b3f8 Stub audin:u 2022-06-04 19:11:57 +01:00