Billy Laws
040db37a28
Fix descriptor copies to be one per descriptor type
2022-11-02 17:46:07 +00:00
Billy Laws
d482e0ea98
Fix shader stage iteration to not miss the pixel stage
2022-11-02 17:46:07 +00:00
Billy Laws
0b808cc22b
Use Sint/Uint attribute type in place of Sscaled/Uscaled
...
Scaled formats are not supported by any mobile GPUs.
2022-11-02 17:46:07 +00:00
Billy Laws
92ce220d3a
Ignore constant buffer selector size for updates
...
Clustertruck performs a giant (0x3000 byte) update with a selector size of only 0x500, without this the update would only partially go through
2022-11-02 17:46:07 +00:00
Billy Laws
33f16ca26e
Handle unmapped blocks in CachedMappedBufferView
2022-11-02 17:46:07 +00:00
Billy Laws
5c0e4a839d
Fix SW BC2 decoding pitch
2022-11-02 17:46:07 +00:00
Billy Laws
60863fa162
Make viewports fallback to viewport 0 when their dimensions are invalid
...
OpenGL games use a zero width/height for viewports 1-15 when they're unused
2022-11-02 17:46:07 +00:00
Billy Laws
586f872655
Update indexed quad conversion for new API
2022-11-02 17:46:07 +00:00
Billy Laws
498b4966d3
Avoid crashing on unmapped buffers
...
Just log a warning instead
2022-11-02 17:46:07 +00:00
Billy Laws
9f2b20443b
Implement Maxwell3D draws
2022-11-02 17:46:07 +00:00
Billy Laws
5020478ace
Always rebind pipeline if it has changed from the previous draw
2022-11-02 17:46:07 +00:00
Billy Laws
af1b4ca4f8
Skip clears if attachments are invalid
2022-11-02 17:46:07 +00:00
Billy Laws
01a5e95ce1
Implement non-indexed quad conversion support in new GPU
...
Uses memory::Buffer directly to allow for cleaner recreation and allow for removing host-only buffers from Buffer in the future.
2022-11-02 17:46:07 +00:00
Billy Laws
d42814bdc1
Add callback for when non-Maxwell3D engines alter pipeline state
...
This is required so that Maxwell3D knows to restore all dynamic state before the next draw
2022-11-02 17:46:07 +00:00
Billy Laws
7133c5d6b3
Drop exclusiveSubpass in favour of only ending RPs when attachments change
...
Significantly helps Mali performance by avoiding creating an excessive number of RPs.
2022-11-02 17:46:07 +00:00
Billy Laws
a3f38c0cf7
Add perfetto tracepoints to async record
2022-11-02 17:46:07 +00:00
Billy Laws
6dfef095e8
Up default record slot count to 6
2022-11-02 17:46:07 +00:00
Billy Laws
7d3a117a6f
Use a spinlock for descriptor set allocator
2022-11-02 17:46:07 +00:00
Billy Laws
78ddd03d1f
Mark newly allocated descriptor slots as active
2022-11-02 17:46:07 +00:00
Billy Laws
0867c593be
Support binding pipelines in state updater
2022-11-02 17:46:07 +00:00
Billy Laws
f42a0df72c
Use ctSelect register for colour targets and impl null color attachments
...
Some games have invalid colour attachments sandwiched inbetween valid ones, these need to be skipped in Vulkan with UNUSED_ATTACHMENT
2022-11-02 17:46:07 +00:00
Billy Laws
38aad21d29
Share single flag variable for Maxwell3D batch draw/constant buffer update
...
Slightly cheaper
2022-11-02 17:46:07 +00:00
Billy Laws
bd7eee8e2b
Optimise GPFIFO command processing
...
GPFIFO code is very high throughput due to the sheer number of commands used for rendering. Adjust some types and switch to a if statement with hints to slightly increase processing speed.
2022-11-02 17:46:07 +00:00
Billy Laws
2cdf6c1fe6
Add branch hint attributes to a couple branches
2022-11-02 17:46:07 +00:00
Billy Laws
b310b99bdc
Handle unmapped ranges in TranslateRange
2022-11-02 17:46:07 +00:00
Billy Laws
acfa58ea8c
State uopdater MERGEBACK desC
2022-11-02 17:46:07 +00:00
Billy Laws
68b6b20f78
Bunch of cleanup/bugfixes for initial pipeline desc set impl
...
Cleans up code slightly, fixes keeping track of per-stage descriptor counts and fixes storage buffer caching + more random stuff.
2022-11-02 17:46:07 +00:00
Billy Laws
eb4a9bab11
Mark freshly allocated descriptor slots as active
2022-11-02 17:46:07 +00:00
Billy Laws
f3184cdff1
Reorder active descriptor set slots to end of list
...
Speeds up allocations by significantly reducing the number of needed atomic ops per alloc. Could be optimised further in the future if needed.
2022-11-02 17:46:07 +00:00
Billy Laws
128b68d8b2
Avoid resetting command buffers manually it's implicit
...
Somewhat costly on adreno, and with the one time submit flag the reset is implicit for the next beginCommandBuffers call.
2022-11-02 17:46:07 +00:00
Billy Laws
e1717ed811
Implement Maxwell samplers
2022-11-02 17:46:07 +00:00
Billy Laws
f1600f5ad0
Support allocating into spans in the linear allocator
2022-11-02 17:46:07 +00:00
Billy Laws
04cea9239f
Implement descriptor set updating through StateUpdater
...
Will automatically resolve buffer views at record-time into descriptor write/copy structures then apply the write and bind the set.
2022-11-02 17:46:07 +00:00
Billy Laws
d174ca950b
Revert "Reset executor command buffers asynchronously"
...
This reverts commit fc7956df4ff56fdb2afc4b2bb0bbca82196179ca.
2022-11-02 17:46:07 +00:00
Billy Laws
2bbe975ea7
Reset executor command buffers asynchronously
...
This took a little while to do on qcom drivers, moving it to the cycle waiter thread gives a tiny speedup.
2022-11-02 17:46:07 +00:00
Billy Laws
054d32567d
Allow mutation of input data by callback in CircularQueue::AppendTranform
2022-11-02 17:46:07 +00:00
Billy Laws
7c9212743c
Implement asynchronous command recording
...
Recording of command nodes into Vulkan command buffers is very easily parallelisable as it can effectively be treated as part of the GPU execution, which is inherently async. By moving it to a seperate thread we can shave off about 20% of GPFIFO execution time. It should be noted that the command scheduler command buffer infra is no longer used, since we need to record texture updates on the GPFIFO thread (while another slot is being recorded on the record thread) and then use the same command buffer on the record thread later. This ends up requiring a pool per slot, which is reasonable considering we only have four slots by default.
2022-11-02 17:46:07 +00:00
Billy Laws
a197dd2b28
Allow for creating signalled fence cycles
2022-11-02 17:46:07 +00:00
Billy Laws
542651232b
Add a mutex to allow preventing buffer recreation
2022-11-02 17:46:07 +00:00
Billy Laws
379b4f163d
Implement popping from CircularQueue
2022-11-02 17:46:07 +00:00
Billy Laws
6d9dc9c6fb
Implement some more of Draw
...
Now performs VK state updates on execution and takes advantage of quick descriptor binding.
2022-11-02 17:46:07 +00:00
Billy Laws
3b26f4f48a
Expose way to check inter-pipeline descriptor compatibility
...
Allows users to skip/use quick descriptor sync even after switching pipelines.
2022-11-02 17:46:07 +00:00
Billy Laws
943a38e168
Implement StateUpdater for rapid recording of VK state updates
...
Using command executor for each state individual update was found to be infeasible due to the shear number of state updates per draw and it relying on per-node heap allocations. Instead this commit takes advantage of each state update being used only once to implement a system of linearly-allocated state update commands that are linked together. After setting up all draw state with StateUpdateBuilder, the built StateUpdater can then be used in the execution phase to record all of the draw state into the command buffer with almost zero ovehead.
2022-11-02 17:46:07 +00:00
Billy Laws
7b4da52445
Add a fast binding sync path for when only one cbuf has changed
...
SMO implements instanced draws by repeating the same draw just with a different constant buffer bound. Reduce the cost of this significantly by detecting such cases and instead of processing every descriptor, copy the previous descriptor set and update only the ones affected by the bound constant buffer.
Credits to ripinperiperi for the initial idea and making me aware of how SMO does these draws
2022-11-02 17:46:07 +00:00
Billy Laws
89edd9b303
Reset megabuffer binding for disabled vertex buffers
2022-11-02 17:46:07 +00:00
Billy Laws
6a1615a104
Expose color and depth attachments to Draw
2022-11-02 17:46:07 +00:00
Billy Laws
aae957819e
Simplify BufferView locking by requiring buffer manager be locked
...
Avoids the need to repeat the lookup after locking since recreations are impossible if buffer manager is locked.
2022-11-02 17:46:07 +00:00
Billy Laws
9449b52f36
Reduce minimum megabuffer alignment to 128 bytes
2022-11-02 17:46:07 +00:00
Billy Laws
b3cf9c40ba
Update megabuffer execution/sequence numbers after updating an allocation
2022-11-02 17:46:07 +00:00
Billy Laws
4b2b6fc6e9
Avoid calling SynchronizeGuest when attempting to megabuffer unless necessary
2022-11-02 17:46:07 +00:00
Billy Laws
e5919e84a1
Pipeline state if statment cleanups
2022-11-02 17:46:07 +00:00
Billy Laws
bf536aa168
Sync pipeline descriptors every draw
2022-11-02 17:46:07 +00:00
Billy Laws
9223d7f524
Fix descriptor initialisation order
...
They need to be setup before the pipeline is created to avoid passing in garbage data.
2022-11-02 17:46:07 +00:00
Billy Laws
4652cc5a0a
Avoid parsing descriptors for disabled shader stages
2022-11-02 17:46:07 +00:00
Billy Laws
3456fb39fa
Fix pipeline to shader stage conversion when filling in shader infos
...
The two vertex pipeline stages need to be both treated as a single stage, and all subsequent stages need to be offset by -1
2022-11-02 17:46:07 +00:00
Billy Laws
a9213debc7
Implement constant buffer reading
2022-11-02 17:46:07 +00:00
Billy Laws
afcfe8a7fa
Don't update scissor state >0 unless multiview is supported
2022-11-02 17:46:07 +00:00
Billy Laws
55d77b7eb0
Update user code for new megabuffering
2022-11-02 17:46:07 +00:00
Billy Laws
cc776ae395
Keep track of an 'execution number' in CommandExecutor
...
Allows users to efficiently cache resources that are valid for only one execution without resorting to callbacks.
2022-11-02 17:46:07 +00:00
Billy Laws
99a34df4cc
Avoid trapping frequently synced buffers by using megabuffer copies
...
When a buffer is trapped nearly every frame, the cost of trapping and synchronising its contents starts to quickly add up. By always using the megabuffer when this is the case, since megabuffer copies are done directly from the guest, we skip the need to synchronise/trap the backing.
2022-11-02 17:46:07 +00:00
Billy Laws
a24aec03a6
Rework per-view megabuffering to cache allocs in the buffer itself
...
The original intention was to cache on the user side, but especially with shader constant buffers that's difficult and costly. Instead we can cache on the buffer side, with a page-table like structure to hold variable sized allocations indexed by the aligned view base address. This avoids most redundant copies from repeated use of the same buffer without updates inbetween.
2022-11-02 17:46:07 +00:00
Billy Laws
b810470601
Invalidate HLE macro state on macro updates
2022-11-02 17:46:07 +00:00
Billy Laws
2360ca24da
Implement constant buffer and storage buffer pipeline descriptor types
2022-11-02 17:46:07 +00:00
Billy Laws
25255b01c7
Keep track of more pipeline descriptor information
...
This is needed in order to allow allocating per-draw descriptor arrays without recounting their lengths each time
2022-11-02 17:46:07 +00:00
Billy Laws
ad0275dbef
Expose active pipeline for access by Maxwell3D class
2022-11-02 17:46:07 +00:00
Billy Laws
6e22373b59
Add array support to AllocateUntracked
2022-11-02 17:46:07 +00:00
Billy Laws
388cff3353
Implement simple pipeline transition cache
...
Avoids the need to hash PipelineState when we can guess the pipeline that will be used next. This could very easily be optimised in the future with generational, usage-based caching if necessary.
2022-11-02 17:46:07 +00:00
Billy Laws
302b2fcc3f
Force flush when dirty refresh returns true
2022-11-02 17:46:07 +00:00
Billy Laws
ec4ea5c5d7
Supply dispatcher manually for shader creation
2022-11-02 17:46:07 +00:00
Billy Laws
3404a3abdb
Implement macro HLE for instanced draw macros
...
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
2022-11-02 17:46:07 +00:00
Billy Laws
cf0752f937
Use NCE memory tracking for guest shaders
...
Prevents needing to hash them for every single pipeline state update, without this just hashing shaders takes up a significant amount of time.
2022-11-02 17:46:07 +00:00
Billy Laws
19a75c3f65
Bind all pipeline states to main pipeline dirty state
2022-11-02 17:46:07 +00:00
Billy Laws
a04d8fb5cf
Setup minimal viewport Vulkan pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
fe51db366b
Mark all dirty resources as dirty initially
2022-11-02 17:46:07 +00:00
Billy Laws
abfa5929f1
Treat vertex buffers with base addr 0 as disabled
2022-11-02 17:46:07 +00:00
Billy Laws
e71ca05f19
Avoid bitfields for signed enum types in PackedPipelineState
2022-11-02 17:46:07 +00:00
Billy Laws
2f2b615780
Add dynamic state support to VK graphics pipeline cache
2022-11-02 17:46:07 +00:00
Billy Laws
ad045058ee
Cleanup some redundant comments and includes in pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
9b05c9c0c3
Introduce a pipeline manager and partial pipeline object
...
gpu-new will use a monolithic pipeline object for each pipeline to store state, keyed by the PackedPipelineState contents. This allows for a greater level of per-pipeline optimisations and a reduction in the overall number of lookups in a draw compared to the previous system.
2022-11-02 17:46:07 +00:00
Billy Laws
2c1b40c9a8
Add some missed pipeline state members used by Hades
2022-11-02 17:46:07 +00:00
Billy Laws
0c6fa22c6b
Introduce pipeline shader stage state
...
Simple state that generates a hash that can be used in the packed state and spans over the binary for pipeline creation.
2022-11-02 17:46:07 +00:00
Billy Laws
a6bb716123
Move packed pipeline state to a seperate file
2022-11-02 17:46:07 +00:00
Billy Laws
4dcbf5c3a0
Drop the caching aspect of shader manager entirely
...
Caching here was deemed unnecessary since it will be done implicitly by the pipeline cache and creates issues with the legacy attribute conversion pass. It now purely serves as a frontend for Hades.
2022-11-02 17:46:07 +00:00
Billy Laws
e77e4891dc
Tidy up Maxwell 3D regs a bit
2022-11-02 17:46:07 +00:00
Billy Laws
405d26fc22
Introduce Maxwell 3D shader state
...
Simple state holder that hashes the stored shader and reads it into a buffer
2022-11-02 17:46:07 +00:00
Billy Laws
6865f0bdaf
Transition color blend state to packed pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
7049a521d2
Use Vulkan types directly in PackedPipelineState where possible
2022-11-02 17:46:07 +00:00
Billy Laws
effeb074b6
Move pipeline cache Key to cpp file and rename to PackedPipelineState
...
The new name is more indicative of what the struct contains and how it will be used as more than just a key.
2022-11-02 17:46:07 +00:00
Billy Laws
e1512c91a0
Transition depth stencil state to pipeline cache key
2022-11-02 17:46:07 +00:00
Billy Laws
1f844e2c18
Transition rasterization state to pipeline cache key
2022-11-02 17:46:07 +00:00
Billy Laws
9a6efb091c
Transition tessellation state to pipeline cache key
...
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ae5d419586
Transition input assembly state to pipeline cache key
2022-11-02 17:46:07 +00:00
Billy Laws
3f9161fb74
Transition vertex input state to pipeline cache key
...
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ffe24aa075
Introduce a base pipeline cache key starting with RTs
...
It was determined that a general purpose Vulkan pipeline cache isn't viable for the significant performance reqs of Draw(), by using a Maxwell 3D specific key we can shrink state significantly more than if we used Vulkan structs.
2022-11-02 17:46:07 +00:00
Billy Laws
1e8f7d7fcb
Implement dynamic pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
a94040ac7d
Add default cases to enum conversions where necessary
2022-11-02 17:46:07 +00:00
Billy Laws
6e55d4dcf4
Implement color blend pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
cb11662ea5
Implement depth stencil pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
2484a2d6b5
Add A1R5G5B5Unorm and S8Uint formats
2022-11-02 17:46:07 +00:00
Billy Laws
690e96bce0
Implement rasterization pipeline state
2022-11-02 17:46:07 +00:00