Commit Graph

111 Commits

Author SHA1 Message Date
PixelyIon
e294fa8c91 Add subpass limit quirk to fix Adreno driver bug
Older Adreno proprietary drivers (5xx and below) will segfault while destroying the renderpass and associated objects if more than 64 subpasses are within a renderpass due to internal driver implementation details. This commit introduces checks to automatically break up a renderpass when that limit is hit.
2022-04-14 14:14:52 +05:30
PixelyIon
cb1ec9a7f4 Rework BufferManager, Buffer and BufferView
This commit encapsulates a complex sequence of cascading changes in the process of supporting overlaps for buffers:
* We determined that it is impossible to resolve overlaps with multiple intervals per buffer within the constraints of each overlap being a contiguous view, support for multiple intervals was therefore dropped. The older buffer manager code was entirely reworked to be simpler due to only handling one interval per buffer with code now being based off `IntervalMap` but tailored specifically for buffers.
* During overlap resolution, the problem of how existing views into the buffer being recreated would be updated, it had to be replaced with a larger buffer that could contain all overlaps and all existing views would need to be repointed to it. This was addressed by a buffer owning all views to itself, we could automatically recalculate the offset of all views and update the buffers with it.
* We still needed to update usage of existing views which was done by handling all access (such as inside a recorded draw) to buffer view properties via `BufferView::RegisterUsage` which dispatches a callback with the view and the corresponding backing buffer. This callback can be stored and called during overlap resolution with the new buffer.
* We had issues with lifetime of the buffer with the handle-like semantics of `BufferView` introduced in the last buffer-related commit, if we updated the view to be owned by a new buffer we'd need to extend the lifetime of the new buffer not the older one and the only way to do this was a proxy owner object `BufferDelegate` which holds a shared pointer to the real `Buffer` which in-turn holds a pointer to all `BufferDelegate` objects to update on repointing. A `BufferView` is effectively just a wrapper around `std::shared_ptr<BufferDelegate>` with more favorable semantics but generally just forwarding calls.
It should be additionally noted that to support usage of `RegisterUsage` the code around buffers in `GraphicsContext` was refactored to defer truly binding till the recording phase.
2022-04-14 14:14:52 +05:30
PixelyIon
a6781b38f4 Clear syncBuffers after CommandExecutor execution
Due to an oversight, we weren't clearing the list of buffers that needed to be synced after every execution which led to them building up. Due to the relatively cheap synchronization of buffers and only doing so on faults this wasn't caught until now, it does depress the framerate significantly over time due to the size of the list growing to be in the range of 100k buffer views depending on the title.
2022-04-14 14:14:52 +05:30
Robin Kertels
594f061b21 Implement SSBOs
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-04-14 14:14:52 +05:30
Billy Laws
5c387f5c5a Fixup depth mode init value to allow ignoring redundant calls 2022-04-14 14:14:52 +05:30
PixelyIon
7a5c771f44 Rework GPU BufferView to have handle-like semantics
We wanted views to extend the lifetime of the underlying buffers and at the same time preserve all views until the destruction of the buffer to prevent recreation which might be costly in the future when we need `VkBufferView`s of the buffer but also require a centralized list of all views for recreation of the buffer. It also removes the inconsistency between `BufferView*` being returned in `GetXView` in `GraphicsContext`.
2022-04-14 14:14:52 +05:30
Billy Laws
fc2c123ae2 Implement GPU depthMode register
This controls the depth range used by the shader, hades already has support for the necessary patching so we only need to pass the current mode over to it and it'll do the necessary work.
2022-04-14 14:14:52 +05:30
Billy Laws
7e088ca465 Fix constbuf updates to actually increment the write offset
Uses the register directly now as when we modify it we want the changes to be visible from macros too.
2022-04-14 14:14:52 +05:30
PixelyIon
730bf504f8 Correct Adreno texture binding quirk
We incorrectly determined an Adreno driver bug to require padding between binding slots but the real issue was not supporting consecutive binding writes for `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` and was fixed by the padding slot unintentionally requiring individual writes. The quirk has now been corrected to explicitly specify this as the bug and the solution is more apt.
2022-04-14 14:14:52 +05:30
PixelyIon
881bb969c4 Implement access-driven Buffer synchronization
Similar to constant redundant synchronization for textures, there is a lot of redundant synchronization of buffers. Albeit, buffer synchronization is far cheaper than texture synchronization it still has associated costs which have now been reduced by only synchronizing on access.
2022-04-14 14:14:52 +05:30
PixelyIon
3268b3779a Implement access-driven Texture synchronization
There was a lot of redundant synchronization of textures to and from host constantly as we were not aware of guest memory access, this has now been averted by tracking any memory accesses to the texture memory using the NCE Memory Trapping API and synchronizing only when required.
2022-04-14 14:14:52 +05:30
PixelyIon
5c9e42e384 Use mirror mappings for Textures and Buffers
This is a prerequisite to memory trapping as we need to write to the mirror to avoid a race condition with external threads writing to a texture/buffer while we do so ourselves for the sync on a read/write, it also avoids an additional `mprotect` to `-WX`/`RWX` on a read access.

An additional advantage for textures especially is that we now support split-mapping textures due to laying them out in a contiguous mirror and they will not require costly algorithmic changes. Buffers should also benefit from not needing to iterate over every region when they are split into multiple mappings.
2022-04-14 14:14:52 +05:30
Billy Laws
011de98940 Rework formats to support passing through guest swizzle values
Almost every Maxwell format now directly corresponds to a Vulkan format. This allows formats to be passed through and the swizzle used directly from guest (with some extra swizzle handling for edge cases) thus saving the need to explicitly support each swizzle combination which is adds a lot of code bloat. The format header is additionally reordered with line breaks to separate formats by their bits-per-block.
2022-04-14 14:14:52 +05:30
PixelyIon
727f83e969 Fix Incorrect Vertex Binding Divisor State Submission
We always submit pipeline divisor descriptions regardless of binding input rate being vertex rather than instance. This is invalid behavior and has been fixed by only submitting binding descriptors when the input rate is per-instance.
2022-04-14 14:14:52 +05:30
PixelyIon
9f7e80cf8f Fix Adreno Texture Sampler Binding Bug
Adreno proprietary drivers suffer from a bug where `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` requires 2 descriptor slots rather than one, we add a padding slot to fix this issue. `QuirkManager` was introduced to handle per-vendor/per-device errata and allow enabling this on Adreno proprietary drivers specifically as to not affect the performance of other devices.
2022-04-14 14:14:52 +05:30
PixelyIon
ddb2ba8a1b Rename QuirkManager to TraitManager
Quirk terminology was deemed to be inappropriate for describing the features/extensions of a device. It has been replaced with traits which is far more fitting but quirks will be used as a terminology for errata in devices.
2022-04-14 14:14:52 +05:30
PixelyIon
0b2ce6a8f3 Fix Texture Handle Offset Calculation
The texture handle offset calculation involved an incorrect shift by descriptor size which was found to be unnecessary and would result in an invalid handle that had the wrong TIC/TSC index and caused broken rendering.
2022-04-14 14:14:52 +05:30
PixelyIon
aa57ec6d55 Destroy CommandExecutor Nodes Before Waiting on Execution
`nodes` and `syncTextures` were cleared after waiting on the `CommandExecutor` fence rather than before, this wasted execution time after the wait for something that could be performed prior to the wait.
2022-04-14 14:14:52 +05:30
PixelyIon
90a1b3348c Implement D24S8 + R11G11B10 Formats 2022-04-14 14:14:52 +05:30
PixelyIon
22ce531e6f Force Memory Barrier at VkRenderPass Start
We depend on past commands to have completed execution in a renderpass, a subpass dependency on all graphics stages from `VK_SUBPASS_EXTERNAL` to subpass #0 is used to enforce this. Nvidia and Adreno proprietary drivers implicitly do this but Turnip or Mali drivers require this or they execute out of order.
2022-04-14 14:14:52 +05:30
PixelyIon
043be4d8f7 Implement Maxwell3D Two-Side Stencil Toggle
Stencil operations are configurable to be the same for both sides or have independent stencil state for both sides. It is controlled via the previously unimplemented `stencilTwoSideEnable`.
2022-04-14 14:14:52 +05:30
PixelyIon
80ae7b255a Implement Maxwell3D Front Face Flip 2022-04-14 14:14:52 +05:30
PixelyIon
40a3887695 Implement Maxwell3D Viewport Y Swizzle & Lower-Left Origin 2022-04-14 14:14:52 +05:30
Billy Laws
3be30e68c3 Add D16 depth format and ZF32 TIC format
Used by One Piece Unlimited World Red
2022-04-14 14:14:52 +05:30
Billy Laws
6e48460c0d Add BC2/3 format support 2022-04-14 14:14:52 +05:30
PixelyIon
41aad83c33 Tie Shader ObjectPool Lifetime to Shader Program
Shader programs allocate instructions and blocks within an `ObjectPool`, there was a global pool prior that was never reaped aside from on destruction. This led to a leak where the pool would contain resources from shader programs that had been deleted, to avert this the pools are now tied to shader programs.
2022-04-14 14:14:52 +05:30
PixelyIon
e747de37cf Implement Blocklinear TIC Type 2022-04-14 14:14:52 +05:30
PixelyIon
723189a948 Calculate Blocklinear Texture Aligned Size Correctly
The size of blocklinear textures did not consider alignment to Block/ROB boundaries before, it is aligned to them now. Incorrect sizes led to textures not being aliased correctly due to different size calculations for GraphicBufferProducer surfaces and Maxwell3D color RTs.
2022-04-14 14:14:52 +05:30
Billy Laws
e7bfd93541 Implement BC7 format support
Used by ARMS
2022-04-14 14:14:52 +05:30
Billy Laws
99652c5eda Support partially mapped cbufs
Buggy games sometimes supply an incorrect cbuf size so limit buffers to the first unmapped region.
2022-04-14 14:14:52 +05:30
PixelyIon
6a6f51ea84 Implement Maxwell3D Depth/Stencil State
Implements the entirety of Maxwell3D Depth/Stencil state for both faces including compare/write masks and reference value. Maxwell3D register `stencilTwoSideEnable` is ignored as its behavior is unknown and could mean the same behavior for both stencils or the back facing stencil being disabled as a result of this it is unimplemented.
2022-04-14 14:14:52 +05:30
Billy Laws
ab4962c4e4 Implement additional texture formats, including BCn
BCeNabler is required for BCn textures, the pre-swizzled formats will be removed when arbitary swizzle support is added later.
2022-04-14 14:14:52 +05:30
Billy Laws
600b94505c Fix A2R10G10B10 render target format
This was wrongly described as R10G10B10A2 in the enum when it's actually A2R10G10B10, a format natively supported in Vulkan with just a swizzle.
2022-04-14 14:14:52 +05:30
PixelyIon
edd51c3dfa Fix Color RT Disabling Bug
Color RTs are disabled by setting their format as `None`, it was removed while transitioning to macros and resulted in a missing format exception. It has been readded as several applications depend on this behavior.
2022-04-14 14:14:52 +05:30
PixelyIon
a2285669b3 Use static vector for shader bytecode to prevent constant reallocation
Using `std::vector` for shader bytecode led to a lot of reallocation due to constant resizing, switching over the static vector allows for a single static allocation of the maximum possible guest shader size (1 MiB) to be done for every stage resulting in a 6 MiB preallocation which is unnoticeable given the total memory overhead of running a Switch application.
2022-04-14 14:14:52 +05:30
PixelyIon
21a6866def Fix Maxwell3D Blend Enum Conversion Bugs
The `OneMinusSourceAlpha` blending factor was converted to `eOneMinusSrcColor` rather than `eOneMinusSrcAlpha` leading to incorrect blending behavior in certain titles. A similar issue with the order of `MinimumGL`/`MaximumGL` and `SubtractGL`/`ReverseSubtractGL` being the opposite of what it should've been, both of these issues have been fixed.
2022-04-14 14:14:52 +05:30
PixelyIon
0a506088f4 Fix NextSubpassNode Subpass Index Bug
`NextSubpassNode` didn't increment `subpassIndex` which runs commands with the wrong subpass index resulting in them accessing invalid attachments or other bugs that may arise from using the wrong subpass.
2022-04-14 14:14:52 +05:30
PixelyIon
defbfe8f78 Serialize Maxwell3D Draw State for Subpass
All Maxwell3D state was passed by reference to the draw command lambda, this would break if there was more than one pass or the state was changed in any way before execution. All state has now been serialized by value into the draw command lambda capture, retaining state regardless of mutations of the class state.
2022-04-14 14:14:52 +05:30
PixelyIon
934130b3e6 Remove Implicit Command Executor Resource Attachment
Any usage of a resource in a command now requires attaching that resource externally and will not be implicitly attached on usage, this makes attaching of resources consistent and allows for more lax locking requirements on resources as they can be locked while attaching and don't need to be for any commands, it also avoids redundantly attaching a resource in certain cases.
2022-04-14 14:14:52 +05:30
Billy Laws
3ff8075151 Move vertex and RT format conv to macros and fill them fully in
Makes the format conversions easier to read and shorter, and adds in
some new formats needed to complete the RT table properly.
2022-04-14 14:14:52 +05:30
Billy Laws
68f31c3688 Use macros for defining texture formats and their conversions
Avoids the need to repeat all the possible component types for each texture format while also making them simpler to add and easier to read.
2022-04-14 14:14:52 +05:30
PixelyIon
bc29b23972 Implement CPU-only Maxwell3D Inline Constant Buffer Updates
Implements inline constant buffer updates that are written to the CPU copy of the buffer rather than generating an actual inline buffer write, this works for TIC/TSC index updates but won't work when the buffer is expected to actually be updated inline with regard to sequence rather than just as a buffer upload prior to rendering. 

GPU-sided constant buffer updates will be implemented later with optimizations for updating an entire range by handling GPFIFO `Inc`/`NonInc`directly and submitting it as a host inline buffer update.
2022-04-14 14:14:52 +05:30
PixelyIon
bb14af4f7a Implement Maxwell3D Sampled Textures
The descriptor sets should now contain a combined image and sampler handle for any sampled textures in the guest shader from the supplied offset into the texture constant buffer.

Note: Games tend to rely on inline constant buffer updates for writing the texture constant buffer and due to it not being implemented, the value will be read as 0 which is incorrect.
2022-04-14 14:14:52 +05:30
PixelyIon
d9a9e52350 Use ConstantBuffer instead of BufferView for Shader Constant Buffers
We want read semantics inside the constant buffer object via the mappings to avoid a pointless GPU VMM mapping lookup. It is a fairly frequent operation so this is necessary, the ability to write directly will be added in the future as well.
2022-04-14 14:14:52 +05:30
PixelyIon
adb0a16873 Implement Maxwell 3D Textures
Implements parsing for the Maxwell 3D TIC pool and conversion of a TIC into a `GuestTexture`, support is limited to pitch-linear RGB565/A8R8G8B8 textures at the moment but will be extended as games utilize more formats and layouts. Support for 1D buffers is also omitted at the moment since they need special handling with them effectively being treated as buffers in Vulkan rather than images.
2022-04-14 14:14:52 +05:30
PixelyIon
a9aa16798f Add -fsigned-bitfields for defined bitfield int behavior
We want consistent behavior between signed `int`s in bitfields and outside of bitfields, the `-fsigned-bitfields` flag enforces this behavior.
2022-04-14 14:14:52 +05:30
PixelyIon
87c8dc94d2 Implement Maxwell3D Samplers
Maxwell3D `TextureSamplerControl` (TSC) are fully converted into Vulkan samplers with extension backing for all aspects that require them (border color/reduction mode) and approximations where Vulkan doesn't support certain functionality (sampler address mode) alongside cases where extensions may not be present (border color).
2022-04-14 14:14:52 +05:30
PixelyIon
e48a7d7009 Fix Mapping Caching For Maxwell 3D Buffers
Code involving caching of mappings was copied from `RenderTarget` without much consideration for applicability in buffers, the reason for caching mappings in RTs was that the view may be invalidated by more than the IOVA/Size being changed but this doesn't hold true for buffers generally so invalidation can only be on the view level with the mappings being looked up every time since the invalidation would likely change them.
2022-04-14 14:14:52 +05:30
PixelyIon
c11962e8e4 Implement Maxwell3D Bindless Texture Constant Buffer Index
The index of the constant buffer with bindless texture descriptors is now retrieved from Maxwell3D register state and passed to the shader compiler.
2022-04-14 14:14:52 +05:30
PixelyIon
1c3f62b7b4 Implement Maxwell3D Indexed Drawing 2022-04-14 14:14:52 +05:30