We run into a lot of successive subpasses with the exact same framebuffer configuration which we now exploit to avoid the creation of a new subpass due to the overhead involved with this. This provides significant performance boosts in certain cases due to the magnitude of difference in the amount of subpasses being created while providing next to no benefit in other cases.
As we require a relaxed version of the Vulkan render pass compatibility clause for caching multi-subpass render passes, we now utilize a quirk to determine if this is supported which it is on Nvidia/Adreno while AMD/Mali where it isn't supported we force single-subpass render passes.
Previously constant buffer updates would be handled on the CPU and only the end result would be synced to the GPU before execute. This caused issues as if the constant buffer contents was changed between each draw in a renderpass (e.g. text rendering) the draws themselves would only see the final resulting constant buffer.
We had earlier tried to fix this by using vkCmdUpdateBuffer however this caused significant performance loss due to an oversight in Adreno drivers. We could have worked around this simply by using vkCmdCopy buffer however there would still be a performance loss due to renderpasses being split up with copies inbetween.
To avoid this we introduce 'megabuffers', a brand new technique not done before in any other switch emulators. Rather than replaying the copies in sequence on the GPU, we take advantage of the fact that buffers are generally small in order to replay buffers on the GPU instead. Each write and subsequent usage of a buffer will cause a copy of the buffer with that write, and all prior applied to be pushed into the megabuffer, this way at the start of execute the megabuffer will hold all used states of the buffer simultaneously. Draws then reference these individual states in sequence to allow everything to work without any copies. In order to support this buffers have been moved to an immediate sync model, with synchronisation being done at usage-time rather than execute (in order to keep contents properly sequenced) and GPU-side writes now need to be explictly marked (since they prevent megabuffering). It should also be noted that a fallback path using cmdCopyBuffer exists for the cases where buffers are too large or GPU dirty.
The terminology "Non-Graphics pass" was deemed to be fairly inaccurate since it simply covered all Vulkan commands (not "passes") outside the render-pass scope, these may be graphical operations such as blits and therefore it is more accurate to use the new terminology of "Outside-RenderPass command" due to the lack of such an implication while being consistent with the Vulkan specification.
Older Adreno proprietary drivers (5xx and below) will segfault while destroying the renderpass and associated objects if more than 64 subpasses are within a renderpass due to internal driver implementation details. This commit introduces checks to automatically break up a renderpass when that limit is hit.
This commit encapsulates a complex sequence of cascading changes in the process of supporting overlaps for buffers:
* We determined that it is impossible to resolve overlaps with multiple intervals per buffer within the constraints of each overlap being a contiguous view, support for multiple intervals was therefore dropped. The older buffer manager code was entirely reworked to be simpler due to only handling one interval per buffer with code now being based off `IntervalMap` but tailored specifically for buffers.
* During overlap resolution, the problem of how existing views into the buffer being recreated would be updated, it had to be replaced with a larger buffer that could contain all overlaps and all existing views would need to be repointed to it. This was addressed by a buffer owning all views to itself, we could automatically recalculate the offset of all views and update the buffers with it.
* We still needed to update usage of existing views which was done by handling all access (such as inside a recorded draw) to buffer view properties via `BufferView::RegisterUsage` which dispatches a callback with the view and the corresponding backing buffer. This callback can be stored and called during overlap resolution with the new buffer.
* We had issues with lifetime of the buffer with the handle-like semantics of `BufferView` introduced in the last buffer-related commit, if we updated the view to be owned by a new buffer we'd need to extend the lifetime of the new buffer not the older one and the only way to do this was a proxy owner object `BufferDelegate` which holds a shared pointer to the real `Buffer` which in-turn holds a pointer to all `BufferDelegate` objects to update on repointing. A `BufferView` is effectively just a wrapper around `std::shared_ptr<BufferDelegate>` with more favorable semantics but generally just forwarding calls.
It should be additionally noted that to support usage of `RegisterUsage` the code around buffers in `GraphicsContext` was refactored to defer truly binding till the recording phase.
We wanted views to extend the lifetime of the underlying buffers and at the same time preserve all views until the destruction of the buffer to prevent recreation which might be costly in the future when we need `VkBufferView`s of the buffer but also require a centralized list of all views for recreation of the buffer. It also removes the inconsistency between `BufferView*` being returned in `GetXView` in `GraphicsContext`.
Any usage of a resource in a command now requires attaching that resource externally and will not be implicitly attached on usage, this makes attaching of resources consistent and allows for more lax locking requirements on resources as they can be locked while attaching and don't need to be for any commands, it also avoids redundantly attaching a resource in certain cases.
The descriptor sets should now contain a combined image and sampler handle for any sampled textures in the guest shader from the supplied offset into the texture constant buffer.
Note: Games tend to rely on inline constant buffer updates for writing the texture constant buffer and due to it not being implemented, the value will be read as 0 which is incorrect.
The lifetime of a texture and buffer view is now bound by the `FenceCycle` in `CommandExecutor`, this ensures that a `VkImageView` isn't destroyed prior to usage leading to UB.
Implements `AddClearDepthStencilSubpass` in `CommandExecutor` which is similar to `ClearColorAttachment` in that it uses `VK_ATTACHMENT_LOAD_OP_CLEAR` for the clear which is far more efficient than using `VK_ATTACHMENT_LOAD_OP_LOAD` then doing the clear.
We require a handle to the current renderpass and the index of the subpass in certain cases, this is now tracked by the `CommandExecutor` and passed in as a parameter to `NextSubpassFunctionNode` and the newly-introduced `SubpassFunctionNode`.
Switch from `SubmitWithCycle` to manually allocating the active command buffer to tag dependencies with the `FenceCycle` that prevents them from being mutated prior to execution. This new paradigm could also allow eager recording of commands with only submission being deferred.
We want `TextureView`(s) to be disconnected from the backing on the host and instead represent a specific texture on the guest with a backing that can change depending on mapping of new textures which'd invalidate the backing but should now be automatically repointed to an appropriate new backing. This approach also requires locking of the backing to function as it is mutable till it has been locked or the backing has an attached `FenceCycle` that hasn't been signaled which will be added for `CommandExecutor` in a subsequent commit.
Infrastructure for always syncing textures has been introduced now, they will be synced prior to and after every execution. This does considerably reduce the performance alongside waiting on GPU execution to finish but it will be partially recouped once conditional syncing is performed.
* Move Shared Font TTFs to AAsset storage + Support external shared font loading from `/data/data/skyline.emu/data/fonts`
* Fix bug in `IApplicationFunctions::PopLaunchParameter` caused by ignoring `LaunchParameterKind`
* Fix bug with Choreographer causing it to be awoken and exit prior to the destruction of `PresentationEngine`
* Fix bug with `IDirectory::Read` where it used `inputBuf` for the output buffer rather than `outputBuf`
* Improve `GetFunctionStackTrace` logs when `dli_sname` or `dli_fname` are missing
* Support more RT Formats
Support for subpasses was added by reworking attachment reuse code to account for preserved attachments and subpass dependencies. A lot of RT formats were also added to allow SMO to boot up entirely, it should be noted that it doesn't render anything.
`FenceCycle` had a cyclic dependency which broke clean exit, we now utilize `std::weak_ptr<FenceCycle>` inside the `Texture` object. A minor fix for broken stack traces was also made caused by supplying a `nullptr` C-string to libfmt when a symbol was unresolved which caused an `abort` due to invocation of `strlen` with it.
This commit introduces the `CommandExecutor` which is responsible for creating and orchestrating a Vulkan command graph comprising of `CommandNode`s that construct all the objects required for rendering. As a result of the infrastructure provided by `CommandExecutor`, `ClearBuffers` could be implemented and be appropriately utilized.
A bug regarding scissors was also determined and fixed in the PR, the extent of them were previously inaccurate and this has now been fixed.
Note: We don't synchronize any textures from the guest for now as this would override the contents on the host, this'll be fixed with the appropriate write tracking but it also results in a black screen for anything that writes to FB