I missed that addSubpass was only called once-per-subpass, meaning that if a new barrier req was discovered several draws into the RP it wouldn't be applied. Split out barriers into a seperate function to avoid this.
Full pipeline barriers between every RP can be extremely expensive on HW, by analysing the inputs and outputs of a draw it's possible to construct a much more optimal barrier that only syncs what is neccessary.
Sometimes view pointers may change despite the underlying Vulkan image view not actually changing, so use vk::ImageViews for tracking to keep RP breaks to a minimum.
The layer stride provided by the depth register in Maxwell3D needs to be shifted by 2, this caused the stride to be 1/4th of what it needed to be resulting in OOB access.
When calculating mip-level dimensions in terms of GOBs, they need to be divided by 2 while rounding upwards rather than downwards. This fixes corrupted textures and OOB access on lower mip levels across a substantial amount of titles, reducing arbitrary crashes as a result.
mapped
Since we align up when allocating, not doing so when deallocating would result in a gradual buildup of boundary pages that eventually fill the whole address space.
These are about 100x as expensive on adreno than nvidia due to the lack of a dedicated instruction, since some games work fine without them add a hack to disable them.
The vulkan guest driver doesn't expect a 0xB return code from SyncptEventWait, even though this is valid when an event is being signalled. Just ignore the intermediate state instead as doing so avoids races without causing any more.
The excessive blocking caused by initial compilation happening async to the guest caused issues in some cases, now we have a Vulkan pipeline cache to speed it up we can wait for a full compile before launch without too many issues.
By only using what we need, and mirroring the descriptor structs to allow for much tighter packing (while keeping the same member names) we can reduce pipeline memory to about 1/3 of what it was before.
Certain titles such as Super Smash Bros Ultimate can use SVC `UnmapPhysicalMemory` to punch holes into physical memory mappings, this wasn't handled correctly as we completely deleted the portion after the hole. It has now been fixed which results in these titles which depend on this behavior to work now.
Since the waitermutex is only ever locked for a short amount of time, spinning in contention-heavy scenarios ends up quite a bit more efficient than a kernel wait.
This removes the need to concatenate the variable multiple times, recycles the scaled bitmap after it has been stored, addresses the Android Studio complaint about that method name, and generates a preview of the current profile image as the preference icon.
The implementation for this service function wasn't added to the service function table. Additionally, the type for the output `ScalingMode` was implicitly `int` as it was unspecified in the `enum class` which has now been corrected to `u64` as it should be.
Due to broken drivers, it's possible to find no Vulkan physical devices but this can lead to a cryptic segfault. This explicitly checks for it instead and throws an exception which will be emitted into logcat thus can be easily caught.
Due to the trampoline and save/load context functions, `GetHookSectionSize` returned a non-zero size for when there were no hooked symbols supplied to it. This is problematic as it isn't required and hooking is currently not stable so it can lead to crashes or freezes in certain titles.
A thread can be paused while it is in a synchronization primitive which will do `RemoveThread`, we need to update the state of `insertThreadOnResume` in this case by clearing it so it isn't incorrectly reinserted on resuming the thread.
`Scheduler::UpdateCore` implicitly depended on `KThread::coreMigrationMutex` being locked during calls to it, this requirement has now been made explicit to avoid confusion.
When a timeout occurs in `ConditionVariableWait`, we used to check `waitMutex` which is cleared by `MutexUnlock` but when we hit the CAS case in `ConditionVariableSignal` then we don't clear `waitMutex`. It's far more reliable to check `waitThread` as an indication for if the thread has already been unlocked as it's cleared at the start of `ConditionVariableWait` and would implicitly stay cleared in the CAS case while being set in `MutexLock` and being unset in `MutexUnlock`.
There's multiple locations where a thread is yielded in the scheduler and all of them repeat the code of checking for `pendingYield` and signalling with an optional optimization of checking if the thread being yielded is the calling thread.
All this functionality has now been consolidated into `Scheduler::YieldThread` which checks for `pendingYield` and does the calling thread yield optimization. This should lead to better readability and better performance in cases where `UpdatePriority` would signal the calling thread.
`forceYield` was incorrectly not set when pausing running threads if the thread already had `pendingYield` set. This could lead to cases where `Rotate` would later throw an exception due to it being unset.
Blocking while inserting a paused thread can lead to deadlocks where the inserting thread later resumes the paused thread.
Co-authored-by: Billy Laws <blaws05@gmail.com>
As we didn't hold `coreMigrationMutex`, the thread could simply migrate during `InsertThread` which would lead to the thread potentially never waking up as it's been inserted on a non-resident core.
Co-authored-by: PixelyIon <pixelyion@protonmail.com>
`SignalToAddress`/`ConditionVariableSignal` need to wake waiters in priority order, while threads are inserted in order this doesn't remain the case as priority updates don't reinsert the thread into `syncWaiters`.
It was determined that reinsertion into `syncWaiters` would be fairly complex due to locking the `syncWaitersMutex` with the thread's mutexes. To avoid this, this commit instead sorts waiters by priority at signal time to always wake threads in the right order.