The layer stride provided by the depth register in Maxwell3D needs to be shifted by 2, this caused the stride to be 1/4th of what it needed to be resulting in OOB access.
When calculating mip-level dimensions in terms of GOBs, they need to be divided by 2 while rounding upwards rather than downwards. This fixes corrupted textures and OOB access on lower mip levels across a substantial amount of titles, reducing arbitrary crashes as a result.
mapped
Since we align up when allocating, not doing so when deallocating would result in a gradual buildup of boundary pages that eventually fill the whole address space.
These are about 100x as expensive on adreno than nvidia due to the lack of a dedicated instruction, since some games work fine without them add a hack to disable them.
The vulkan guest driver doesn't expect a 0xB return code from SyncptEventWait, even though this is valid when an event is being signalled. Just ignore the intermediate state instead as doing so avoids races without causing any more.
The excessive blocking caused by initial compilation happening async to the guest caused issues in some cases, now we have a Vulkan pipeline cache to speed it up we can wait for a full compile before launch without too many issues.
By only using what we need, and mirroring the descriptor structs to allow for much tighter packing (while keeping the same member names) we can reduce pipeline memory to about 1/3 of what it was before.
Certain titles such as Super Smash Bros Ultimate can use SVC `UnmapPhysicalMemory` to punch holes into physical memory mappings, this wasn't handled correctly as we completely deleted the portion after the hole. It has now been fixed which results in these titles which depend on this behavior to work now.
Since the waitermutex is only ever locked for a short amount of time, spinning in contention-heavy scenarios ends up quite a bit more efficient than a kernel wait.