Commit Graph

1541 Commits

Author SHA1 Message Date
Billy Laws
bb3baa888d Add a hack to disable shader subgroup shuffles
These are about 100x as expensive on adreno than nvidia due to the lack of a dedicated instruction, since some games work fine without them add a hack to disable them.
2023-02-04 23:10:45 +00:00
Billy Laws
568306195f Prevent Vulkan guest crashes by avoiding intermediate syncpt event signal state
The vulkan guest driver doesn't expect a 0xB return code from SyncptEventWait, even though this is valid when an event is being signalled. Just ignore the intermediate state instead as doing so avoids races without causing any more.
2023-02-04 23:10:45 +00:00
Billy Laws
fcb8f2a229 Apply texture shader compiler generated descriptor shifts
These were missed on a hades version upgrade.
2023-02-04 23:10:45 +00:00
Billy Laws
bbef006051 Simplify free descriptor set accounting and update ratios
Slightly reduces descriptor usage in Breath of The Wild
2023-02-04 23:10:45 +00:00
Billy Laws
5e862cf5f7 Bail out early if the new pipeline key matches that of the current one
Prevents the transition cache of some pipelines from getting full of copies of itself in cases where an update happens redundantly.
2023-02-04 23:10:45 +00:00
Billy Laws
3e971d4043 Wait for pipeline compilation to finish before loading the guest
The excessive blocking caused by initial compilation happening async to the guest caused issues in some cases, now we have a Vulkan pipeline cache to speed it up we can wait for a full compile before launch without too many issues.
2023-02-04 23:10:45 +00:00
Billy Laws
be6f08cd97 Add debug pipeline statistics recording for finding redundant pipelines 2023-02-04 23:10:45 +00:00
Billy Laws
6333a92b53 Only include active RTs in pipeline state key
This was causing a buildup of many redundant pipelines in SMO as a depth-only shader was being called without previous RTs being unbound.
2023-02-04 23:10:45 +00:00
Billy Laws
9d3a9f63d5 Move graphics piplines away from storing hades shader info struct
By only using what we need, and mirroring the descriptor structs to allow for much tighter packing (while keeping the same member names) we can reduce pipeline memory to about 1/3 of what it was before.
2023-02-04 23:10:45 +00:00
Billy Laws
dd92cb1536 Implement support for (de)serialising VkPipelineCaches to/from storage
Significantly improves launch times in games with many shader combinations, giving an 5x speedup in some cases.
2023-02-04 23:10:45 +00:00
PabloG02
35617930d5 Fix rebase 2023-01-28 11:57:19 +00:00
PabloG02
8b9d6f79ab Add option to enable/disable shader cache 2023-01-28 11:57:19 +00:00
PixelyIon
8bfda0d84d Fix hole punching in mappings with SVC UnmapPhysicalMemory
Certain titles such as Super Smash Bros Ultimate can use SVC `UnmapPhysicalMemory` to punch holes into physical memory mappings, this wasn't handled correctly as we completely deleted the portion after the hole. It has now been fixed which results in these titles which depend on this behavior to work now.
2023-01-23 21:28:59 +00:00
Billy Laws
6b9be2edd4 Add note about circular queue append contiguosity guarantees 2023-01-20 21:19:04 +00:00
PabloG02
d544ccf5ea Stub INotificationServicesForApplication 2023-01-20 21:08:12 +00:00
PabloG02
2fa5ea451e Stub IPrepoService::SaveReportWithUserOld2 2023-01-20 21:08:12 +00:00
PabloG02
7327cdbde9 Stub some functions in IDeliveryCacheStorageService 2023-01-20 21:08:12 +00:00
PabloG02
c53d99d393 Stub IDeliveryCacheFileService and IDeliveryCacheDirectoryService 2023-01-20 21:08:12 +00:00
PabloG02
299d11d86f Stub IApplicationFunctions::GetNotificationStorageChannelEvent 2023-01-20 21:08:12 +00:00
Billy Laws
7c623f8301 Use a spinlock for thread waiter mutex
Since the waitermutex is only ever locked for a short amount of time, spinning in contention-heavy scenarios ends up quite a bit more efficient than a kernel wait.
2023-01-20 21:07:59 +00:00
Billy Laws
e2463b7619 Adjust gpfifo WFI to only do a pipeline barrier 2023-01-20 21:07:59 +00:00
Billy Laws
2b282ece1a Add more fine-grained buffer recreation locking 2023-01-20 21:07:59 +00:00
Billy Laws
85a23e73ba Implement a shared spinlock and use it for GPU VMM 2023-01-20 21:07:59 +00:00
Billy Laws
fd5c141dbf Correct GetNpadIrCameraHandle return value 2023-01-20 21:07:59 +00:00
Billy Laws
a8b32c3cef Cleanup helper pipeline cache code 2023-01-20 21:07:59 +00:00
Billy Laws
1f99d63a80 Incr transition cache size 2023-01-20 21:07:59 +00:00
Billy Laws
262f92900d Ensure unmapped VMM ranges return an invalid span 2023-01-20 21:07:59 +00:00
Billy Laws
0a608fb4b2 Update to latest hades 2023-01-20 21:07:59 +00:00
Billy Laws
44f6aada18 Always set blend state for all colour attachments 2023-01-20 21:07:59 +00:00
Billy Laws
177925be93 Avoid OOB memory acceses when trying to read OOB TICs
Some games pass in invalid texture handles (0xffff) when they don't need the texture so return the null texture in this case.
2023-01-20 21:07:59 +00:00
Billy Laws
d8a4a2b08d Use a spinlock for GPU waiter thread 2023-01-20 21:07:59 +00:00
Billy Laws
f1aed86177 Add a workaround for split-mapping shaders
Some games split shaders across multiple mappings and *also* miss the end header, so read a suitably large amount and hope that's enough for now.
2023-01-20 21:07:59 +00:00
Billy Laws
704660bbeb Store render nodes in a linearly allocated linked list
This is much faster in reldebug builds than boost::stable_vector while still providing iterator stability
2023-01-20 21:07:59 +00:00
Billy Laws
326c05a5de Add guest shader replacement and dumping support 2023-01-20 21:07:59 +00:00
Billy Laws
2f6d27e8d7 Rework circular queue locking
Should now be (hopefully) race-free, also switch to a spinlock to avoid any locking overhead.
2023-01-20 21:07:59 +00:00
lynxnb
5d527cb965 Add CNTFRQ_EL0 workaround value for Exynos 1280 2023-01-15 10:16:01 +00:00
PabloG02
ea0217de47 Add TIC format: 0x78D24952 2023-01-13 18:05:22 +00:00
Maccraft123
c3924e0f08 Stub out InlineKeyboard instead of throwing an error 2023-01-11 20:47:39 +00:00
lynxnb
950438bf58 Enable VK_KHR_image_format_list during device init
`VK_KHR_image_format_list` is a requirement for `VK_KHR_imageless_framebuffer`, which we use.
2023-01-11 23:38:57 +05:30
PixelyIon
d39112e9b9 Enable IApplicationDisplayService::ConvertScalingMode implementation
The implementation for this service function wasn't added to the service function table. Additionally, the type for the output `ScalingMode` was implicitly `int` as it was unspecified in the `enum class` which has now been corrected to `u64` as it should be.
2023-01-11 23:38:57 +05:30
PixelyIon
45d0558d00 Check for no Vulkan physical devices
Due to broken drivers, it's possible to find no Vulkan physical devices but this can lead to a cryptic segfault. This explicitly checks for it instead and throws an exception which will be emitted into logcat thus can be easily caught.
2023-01-11 23:38:57 +05:30
PixelyIon
f882b613bc Fix .hook section being allocated without any hooked symbols
Due to the trampoline and save/load context functions, `GetHookSectionSize` returned a non-zero size for when there were no hooked symbols supplied to it. This is problematic as it isn't required and hooking is currently not stable so it can lead to crashes or freezes in certain titles.
2023-01-11 00:13:15 +05:30
PixelyIon
3fa314f6cb Always print thread IDs rather than handles for SVC logs
Handles are rather arbitrary and difficult to reference, as a result, we've moved to thread IDs across the board for logs.
2023-01-11 00:13:15 +05:30
PixelyIon
e192d4e5c1 Warn when RemoveThread is called on a non-inserted thread 2023-01-11 00:13:15 +05:30
PixelyIon
3a6f205e6f Clear insertThreadOnResume in RemoveThread
A thread can be paused while it is in a synchronization primitive which will do `RemoveThread`, we need to update the state of `insertThreadOnResume` in this case by clearing it so it isn't incorrectly reinserted on resuming the thread.
2023-01-11 00:13:15 +05:30
PixelyIon
7fef849594 Make UpdateCore's locking coreMigrationMutex requirement explicit
`Scheduler::UpdateCore` implicitly depended on `KThread::coreMigrationMutex` being locked during calls to it, this requirement has now been made explicit to avoid confusion.
2023-01-11 00:13:15 +05:30
PixelyIon
c4b4532222 Check waitThread rather than waitMutex during condvar timeouts
When a timeout occurs in `ConditionVariableWait`, we used to check `waitMutex` which is cleared by `MutexUnlock` but when we hit the CAS case in `ConditionVariableSignal` then we don't clear `waitMutex`. It's far more reliable to check `waitThread` as an indication for if the thread has already been unlocked as it's cleared at the start of `ConditionVariableWait` and would implicitly stay cleared in the CAS case while being set in `MutexLock` and being unset in `MutexUnlock`.
2023-01-11 00:13:15 +05:30
PixelyIon
2525bafe06 Consolidate thread yielding in Scheduler
There's multiple locations where a thread is yielded in the scheduler and all of them repeat the code of checking for `pendingYield` and signalling with an optional optimization of checking if the thread being yielded is the calling thread.

All this functionality has now been consolidated into `Scheduler::YieldThread` which checks for `pendingYield` and does the calling thread yield optimization. This should lead to better readability and better performance in cases where `UpdatePriority` would signal the calling thread.
2023-01-11 00:13:15 +05:30
PixelyIon
8b973a3de3 Always set forceYield for running threads in PauseThread
`forceYield` was incorrectly not set when pausing running threads if the thread already had `pendingYield` set. This could lead to cases where `Rotate` would later throw an exception due to it being unset.
2023-01-11 00:13:15 +05:30
PixelyIon
6645692288 Don't block while inserting paused threads
Blocking while inserting a paused thread can lead to deadlocks where the inserting thread later resumes the paused thread.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2023-01-11 00:13:15 +05:30
Billy Laws
643f4cf864 Ensure thread doesn't migrate during InsertThread
As we didn't hold `coreMigrationMutex`, the thread could simply migrate during `InsertThread` which would lead to the thread potentially never waking up as it's been inserted on a non-resident core.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2023-01-11 00:13:15 +05:30
PixelyIon
7f7352ed59 Recalculate highest-priority waiters during cvar/address signaling
`SignalToAddress`/`ConditionVariableSignal` need to wake waiters in priority order, while threads are inserted in order this doesn't remain the case as priority updates don't reinsert the thread into `syncWaiters`. 

It was determined that reinsertion into `syncWaiters` would be fairly complex due to locking the `syncWaitersMutex` with the thread's mutexes. To avoid this, this commit instead sorts waiters by priority at signal time to always wake threads in the right order.
2023-01-11 00:13:15 +05:30
PixelyIon
626008d8e2 Fix WaitForAddress timeout mutex deadlock
Calling `WaitSchedule` inside the block where `syncWaiterMutex` is locked causes a race with other threads which lock the core mutex and `syncWaiterMutex` together. This commit moves the `WaitSchedule` outside the block while simply setting a flag to wait later similar to `ConditionVariableWait`'s timeout case.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2023-01-11 00:13:15 +05:30
PixelyIon
4df3c98225 Add double-insertion debug check to InsertThread
This is a cause for a large amount of scheduler bugs so we should generally check for this on debug builds as it is a fairly easy way to check for issues for some performance cost.
2023-01-11 00:13:15 +05:30
PixelyIon
5694c9b34b Rename KThread::waitKey to KThread::waitMutex
It was determined that `waitKey` is too ambiguous when waiter members are used for both mutexes and condition variables.
2023-01-11 00:13:15 +05:30
PixelyIon
91bb8d231a Rename ConditionalVariable -> ConditionVariable
"Conditional Variable" is a typo which was propagated through the codebase, it has been corrected to "Condition Variable".
2023-01-11 00:13:15 +05:30
PixelyIon
f487d81769 Refactor Condition Variable Waiting/Signalling
The way we handled waking/timeouts of condition variables was fairly inaccurate to HOS as we moved locking of the mutex to the waker thread which could change the order of operations and would cause what were functionally spurious wakeups for all awoken threads.

This commit fixes it by doing all locks on the waker thread and only awakening the waiter thread once the condition variable was signalled and the mutex was unlocked. In addition, this fixes races between a timeout and a signal that could lead to double-insertion as a result of a refactor of how timeouts work in the new system.
2023-01-11 00:13:15 +05:30
PixelyIon
1eb4eec103 Allow locking external thread in MutexLock
We want the ability to lock mutexes on behalf of other threads to refactor condition variables to match HOS on waking behavior.
2023-01-11 00:13:15 +05:30
PixelyIon
6bbe9de881 Fix result returned by MutexLock
`MutexLock` incorrectly returned `InvalidCurrentMemory` for cases where the userspace value didn't match the expected value. It's been corrected to return no error in those cases while preserving the error code for usage in `ConditionalVariableWait`.
2023-01-11 00:13:15 +05:30
PixelyIon
08ef88b156 Add early-timeout path for WaitForAddress 2023-01-11 00:13:15 +05:30
PixelyIon
d0c56235f4 Read address atomically in WaitForAddress
We didn't read the values for arbitration atomically in all cases as we should have, this consolidates the reading of the value and uses the value across all cases.
2023-01-11 00:13:15 +05:30
PixelyIon
e8a1bd1aad Fix WaitForAddress timeout signal race
A race could occur from the timeout path in `WaitForAddress` taking place at the same time as `SignalToAddress` has been caused, this causes a deadlock due to double-insertion.
2023-01-11 00:13:15 +05:30
Billy Laws
31fb6d30eb Fake maxwell occlusion query results 2023-01-08 19:30:52 +00:00
Billy Laws
a92c26531e Keep holes in descriptors for unsupported bindings 2023-01-08 19:30:52 +00:00
Billy Laws
81d82008c7 Pre-signal suspend ticks event 2023-01-08 19:30:52 +00:00
Billy Laws
3e5992e366 Update hades 2023-01-08 19:30:52 +00:00
Billy Laws
45bbf3bb2a Fix indirect draws with direct buffers
We need to wait on the GPFIFO manually as we won't hit the traps when accesing the indirect params with direct as we usually would.
2023-01-08 19:30:52 +00:00
Billy Laws
68ad052cb1 Add geometry passthrough shader support for vertex layer writes 2023-01-08 19:30:52 +00:00
Billy Laws
ec519a7d52 Return null texture on encountering unmapped textures 2023-01-08 19:30:52 +00:00
Billy Laws
97e127153b Make shader trap mutex recursive
There are cases there we hit a shader trap within the GPU, by making it recursive we avoid deadlocking on reads within the GPU.
2023-01-08 19:30:52 +00:00
Billy Laws
1a6165f74d Fix GetReadOnlyBackingSpan for non-direct buffers
This was missed in the initial implementation
2023-01-08 19:30:52 +00:00
Billy Laws
4e5141f879 Fix missed attempt increment in spinlock
Should hog CPU slightly less and correctly yield now
2023-01-08 19:30:52 +00:00
Billy Laws
35a46acbb1 Determine storage buffer alignment dynamically 2023-01-08 19:30:52 +00:00
Billy Laws
12d80fe6c2 Use a shared mutex for GPU VMM to avoid deadlocks
Two reads need to be able to occur simultanously or deadlocks ccan occur (e.g read traps to wait on GPU but GPU needs to read).
2023-01-08 19:30:52 +00:00
Billy Laws
28b2a7a8a1 Dynamically apply GPU turbo clocks only when GPU submissions are queued
Allows for the GPU to clock down in cases where it's idle for most of the time, while still forcing maximum clocks when we care.
2023-01-08 19:30:52 +00:00
Billy Laws
81f3ff348c Transition memory handling from memfd to anonymous shared mappings
Memfd mappings are incompatible with KGSL user memory importing on older kernels, transition to shared anon mappings to avoid this.
2023-01-08 19:30:52 +00:00
Billy Laws
cc3c869b9f Attempt to signal the vsync event at present time if possible
Some games rely on the vsync event to schedule frames, by matching its timing with presentation we can reduce needless waiting as the game will immediely be able to queue the next frame after presentation.
2023-01-08 19:30:52 +00:00
Billy Laws
918a493a45 Implement wfi and setReference GPFIFO barriers 2023-01-08 19:30:52 +00:00
Billy Laws
7315ba04e6 Fixup optional flattenable binder obj structure 2023-01-08 19:30:52 +00:00
Billy Laws
90e21b0ca1 Split syncpoints into host-guest pairs
This allows for the presentation engine to grab the presentation image early when direct buffers are in use, since it'll handle sync on its own using semaphores it doesn't need to wait for GPU execution.
2023-01-08 19:30:52 +00:00
Billy Laws
966c31810a Return appropriate fences in surfaceflinger queue buffer 2023-01-08 19:30:52 +00:00
Billy Laws
afef6c5123 Always populate all colour attachments
This better follow the Vulkan spec, which doesn't mention anything about writes to OOB attachments, only those marked as unused.
2023-01-08 19:30:52 +00:00
Billy Laws
3571737392 Reset maxwell3d quick bind state before adding subpasses to executor
If a submission happens during the call to addsubpass we could end up with invalid quick bind state, move this to to before to prevent that.
2023-01-08 19:30:52 +00:00
Billy Laws
3d31ade35f Implement an alternative buffer path using direct memory importing
By importing guest memory directly onto the host GPU we can avoid many of the complexities that occur with memory tracking as well as the heavy performance overhead in some situations. Since it's still desired to support the traditional buffer method, as it's faster in some cases and more widely supported, most of the exposed buffer methods have been split into two variants with just a small amount of shared code. While in most cases the code is simpler, one area with more complexity is handling CPU accesses that need to be sequenced, since we don't have any place we can easily apply writes to on the GPFIFO thread that wont also impact the buffer on the GPU, to solve this, when the GPU is actively using a buffer's contents, an interval list is used to keep track of any GPFIO-written regions on the CPU and any CPU reads to them will instead be directed to a shadow of the buffer with just those writes applied. Once the GPU has finished using buffer contents the shadow can then be removed as all writes will have been done by the GPU.

The main caveat of this is that it requires tying host sync to guest sync, this can reduce performance in games which double buffer command buffers as it prevents us from fully saturating the CPU with the GPFIFO thread.
2023-01-08 19:30:52 +00:00
Billy Laws
b3f7e990cc Allow for tying guest GPU sync operations to host GPU sync
This is necessary for the upcoming direct buffer support, as in order to use guest buffers directly without trapping we need to recreate any guest GPU sync on the host GPU. This avoids the guest thinking work is done that isn't and overwriting in-use buffer contents.
2023-01-08 19:30:52 +00:00
Billy Laws
89c6fab1cb Implement a way to check if the command record thread is idle
Useful for debugging and testing
2023-01-08 19:30:52 +00:00
Billy Laws
c67f27e914 Add a setting to control the maximum number of accumulated GPU cmds
This helps to keep the GPU fed when processing large command buffers which don't have any syncpoints to force a flush inbetween.
2023-01-08 19:30:52 +00:00
Billy Laws
77214a98dd Add a setting to force maximum GPU clocks on KGSL devices 2023-01-08 19:30:52 +00:00
Billy Laws
3ecaedd71e Add adrenotools direct mapping support 2023-01-08 19:30:52 +00:00
Pablo
8846a85d3a Stub some IPurchaseEventManager functions 2022-12-31 10:45:18 +00:00
PabloG02
80c0f8f04d
Implement full profile picture support
Extends the profile picture stub into a full-fledged implementation with the ability for users to set their profile picture in settings while having the Skyline icon as the default profile picture.
2022-12-27 22:53:41 +05:30
PixelyIon
7a3d2e4a26 Start KThread TID from 1 rather than 0
HOS's TIDs are one-based rather than zero-based, certain titles such as Pokémon Arceus, Naruto Shippuden: Ultimate Ninja Storm 3, Splatoon 3, etc. use the TID being zero as a sentinel value but as we assigned this ID to our first thread prior it broke this logic which has now been fixed by this commit as it now matches HOS behavior.
2022-12-27 22:36:06 +05:30
Billy Laws
bab659587f Use e1 sample count for blits 2022-12-22 18:05:45 +00:00
Billy Laws
516ece6b04 Calculate renderarea from attachment min size 2022-12-22 18:05:45 +00:00
Billy Laws
4a3cd69257 Populate graphics pipeline manager from cache at launch-time 2022-12-22 18:05:45 +00:00
Billy Laws
e9bcdd06eb Introduce a pipeline cache manager for simple read/write cache accesses
All writes are done async into a staging file, which is then merged into the main pipeline cache file at the time of the next launch. Upon encountering file corruption the cache can be trimmed up to the last-known-good entry to avoid any excessive loss of data from just one error.
2022-12-22 18:05:45 +00:00
Billy Laws
06bf1b38af Introduce a pipeline state accessor that reads from a bundle 2022-12-22 18:05:45 +00:00
Billy Laws
7dd3a1db0f Avoid InterconnectContext use in graphics PipelineManager
We will soon move to a global pipeline manager instance, so it wont be possible to use InterconnectContext at pipeline-creation time anymore
2022-12-22 18:05:45 +00:00
Billy Laws
ffe7263848 Add quirk for 615 drivers with broken multithreaded compilation 2022-12-22 18:05:45 +00:00
Billy Laws
755f7c75af Add pipeline (de)serialisation support to bundle
See comments in code for details on the on-disk format.
2022-12-22 18:05:45 +00:00