Commit Graph

1392 Commits

Author SHA1 Message Date
Billy Laws
acfa58ea8c State uopdater MERGEBACK desC 2022-11-02 17:46:07 +00:00
Billy Laws
68b6b20f78 Bunch of cleanup/bugfixes for initial pipeline desc set impl
Cleans up code slightly, fixes keeping track of per-stage descriptor counts and fixes storage buffer caching + more random stuff.
2022-11-02 17:46:07 +00:00
Billy Laws
eb4a9bab11 Mark freshly allocated descriptor slots as active 2022-11-02 17:46:07 +00:00
Billy Laws
f3184cdff1 Reorder active descriptor set slots to end of list
Speeds up allocations by significantly reducing the number of needed atomic ops per alloc. Could be optimised further in the future if needed.
2022-11-02 17:46:07 +00:00
Billy Laws
128b68d8b2 Avoid resetting command buffers manually it's implicit
Somewhat costly on adreno, and with the one time submit flag the reset is implicit for the next beginCommandBuffers call.
2022-11-02 17:46:07 +00:00
Billy Laws
e1717ed811 Implement Maxwell samplers 2022-11-02 17:46:07 +00:00
Billy Laws
f1600f5ad0 Support allocating into spans in the linear allocator 2022-11-02 17:46:07 +00:00
Billy Laws
04cea9239f Implement descriptor set updating through StateUpdater
Will automatically resolve buffer views at record-time into descriptor write/copy structures then apply the write and bind the set.
2022-11-02 17:46:07 +00:00
Billy Laws
d174ca950b Revert "Reset executor command buffers asynchronously"
This reverts commit fc7956df4ff56fdb2afc4b2bb0bbca82196179ca.
2022-11-02 17:46:07 +00:00
Billy Laws
2bbe975ea7 Reset executor command buffers asynchronously
This took a little while to do on qcom drivers, moving it to the cycle waiter thread gives a tiny speedup.
2022-11-02 17:46:07 +00:00
Billy Laws
054d32567d Allow mutation of input data by callback in CircularQueue::AppendTranform 2022-11-02 17:46:07 +00:00
Billy Laws
7c9212743c Implement asynchronous command recording
Recording of command nodes into Vulkan command buffers is very easily parallelisable as it can effectively be treated as part of the GPU execution, which is inherently async. By moving it to a seperate thread we can shave off about 20% of GPFIFO execution time. It should be noted that the command scheduler command buffer infra is no longer used, since we need to record texture updates on the GPFIFO thread (while another slot is being recorded on the record thread) and then use the same command buffer on the record thread later. This ends up requiring a pool per slot, which is reasonable considering we only have four slots by default.
2022-11-02 17:46:07 +00:00
Billy Laws
a197dd2b28 Allow for creating signalled fence cycles 2022-11-02 17:46:07 +00:00
Billy Laws
542651232b Add a mutex to allow preventing buffer recreation 2022-11-02 17:46:07 +00:00
Billy Laws
379b4f163d Implement popping from CircularQueue 2022-11-02 17:46:07 +00:00
Billy Laws
6d9dc9c6fb Implement some more of Draw
Now performs VK state updates on execution and takes advantage of quick descriptor binding.
2022-11-02 17:46:07 +00:00
Billy Laws
3b26f4f48a Expose way to check inter-pipeline descriptor compatibility
Allows users to skip/use quick descriptor sync even after switching pipelines.
2022-11-02 17:46:07 +00:00
Billy Laws
943a38e168 Implement StateUpdater for rapid recording of VK state updates
Using command executor for each state individual update was found to be infeasible due to the shear number of state updates per draw and it relying on per-node heap allocations. Instead this commit takes advantage of each state update being used only once to implement a system of linearly-allocated state update commands that are linked together. After setting up all draw state with StateUpdateBuilder, the built StateUpdater can then be used in the execution phase to record all of the draw state into the command buffer with almost zero ovehead.
2022-11-02 17:46:07 +00:00
Billy Laws
7b4da52445 Add a fast binding sync path for when only one cbuf has changed
SMO implements instanced draws by repeating the same draw just with a different constant buffer bound. Reduce the cost of this significantly by detecting such cases and instead of processing every descriptor, copy the previous descriptor set and update only the ones affected by the bound constant buffer.

Credits to ripinperiperi for the initial idea and making me aware of how SMO does these draws
2022-11-02 17:46:07 +00:00
Billy Laws
89edd9b303 Reset megabuffer binding for disabled vertex buffers 2022-11-02 17:46:07 +00:00
Billy Laws
6a1615a104 Expose color and depth attachments to Draw 2022-11-02 17:46:07 +00:00
Billy Laws
aae957819e Simplify BufferView locking by requiring buffer manager be locked
Avoids the need to repeat the lookup after locking since recreations are impossible if buffer manager is locked.
2022-11-02 17:46:07 +00:00
Billy Laws
9449b52f36 Reduce minimum megabuffer alignment to 128 bytes 2022-11-02 17:46:07 +00:00
Billy Laws
b3cf9c40ba Update megabuffer execution/sequence numbers after updating an allocation 2022-11-02 17:46:07 +00:00
Billy Laws
4b2b6fc6e9 Avoid calling SynchronizeGuest when attempting to megabuffer unless necessary 2022-11-02 17:46:07 +00:00
Billy Laws
e5919e84a1 Pipeline state if statment cleanups 2022-11-02 17:46:07 +00:00
Billy Laws
bf536aa168 Sync pipeline descriptors every draw 2022-11-02 17:46:07 +00:00
Billy Laws
9223d7f524 Fix descriptor initialisation order
They need to be setup before the pipeline is created to avoid passing in garbage data.
2022-11-02 17:46:07 +00:00
Billy Laws
4652cc5a0a Avoid parsing descriptors for disabled shader stages 2022-11-02 17:46:07 +00:00
Billy Laws
3456fb39fa Fix pipeline to shader stage conversion when filling in shader infos
The two vertex pipeline stages need to be both treated as a single stage, and all subsequent stages need to be offset by -1
2022-11-02 17:46:07 +00:00
Billy Laws
a9213debc7 Implement constant buffer reading 2022-11-02 17:46:07 +00:00
Billy Laws
afcfe8a7fa Don't update scissor state >0 unless multiview is supported 2022-11-02 17:46:07 +00:00
Billy Laws
55d77b7eb0 Update user code for new megabuffering 2022-11-02 17:46:07 +00:00
Billy Laws
cc776ae395 Keep track of an 'execution number' in CommandExecutor
Allows users to efficiently cache resources that are valid for only one execution without resorting to callbacks.
2022-11-02 17:46:07 +00:00
Billy Laws
99a34df4cc Avoid trapping frequently synced buffers by using megabuffer copies
When a buffer is trapped nearly every frame, the cost of trapping and synchronising its contents starts to quickly add up. By always using the megabuffer when this is the case, since megabuffer copies are done directly from the guest, we skip the need to synchronise/trap the backing.
2022-11-02 17:46:07 +00:00
Billy Laws
a24aec03a6 Rework per-view megabuffering to cache allocs in the buffer itself
The original intention was to cache on the user side, but especially with shader constant buffers that's difficult and costly. Instead we can cache on the buffer side, with a page-table like structure to hold variable sized allocations indexed by the aligned view base address. This avoids most redundant copies from repeated use of the same buffer without updates inbetween.
2022-11-02 17:46:07 +00:00
Billy Laws
b810470601 Invalidate HLE macro state on macro updates 2022-11-02 17:46:07 +00:00
Billy Laws
2360ca24da Implement constant buffer and storage buffer pipeline descriptor types 2022-11-02 17:46:07 +00:00
Billy Laws
25255b01c7 Keep track of more pipeline descriptor information
This is needed in order to allow allocating per-draw descriptor arrays without recounting their lengths each time
2022-11-02 17:46:07 +00:00
Billy Laws
ad0275dbef Expose active pipeline for access by Maxwell3D class 2022-11-02 17:46:07 +00:00
Billy Laws
6e22373b59 Add array support to AllocateUntracked 2022-11-02 17:46:07 +00:00
Billy Laws
388cff3353 Implement simple pipeline transition cache
Avoids the need to hash PipelineState when we can guess the pipeline that will be used next. This could very easily be optimised in the future with generational, usage-based caching if necessary.
2022-11-02 17:46:07 +00:00
Billy Laws
302b2fcc3f Force flush when dirty refresh returns true 2022-11-02 17:46:07 +00:00
Billy Laws
ec4ea5c5d7 Supply dispatcher manually for shader creation 2022-11-02 17:46:07 +00:00
Billy Laws
3404a3abdb Implement macro HLE for instanced draw macros
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
2022-11-02 17:46:07 +00:00
Billy Laws
cf0752f937 Use NCE memory tracking for guest shaders
Prevents needing to hash them for every single pipeline state update, without this just hashing shaders takes up a significant amount of time.
2022-11-02 17:46:07 +00:00
Billy Laws
19a75c3f65 Bind all pipeline states to main pipeline dirty state 2022-11-02 17:46:07 +00:00
Billy Laws
a04d8fb5cf Setup minimal viewport Vulkan pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
fe51db366b Mark all dirty resources as dirty initially 2022-11-02 17:46:07 +00:00
Billy Laws
abfa5929f1 Treat vertex buffers with base addr 0 as disabled 2022-11-02 17:46:07 +00:00
Billy Laws
e71ca05f19 Avoid bitfields for signed enum types in PackedPipelineState 2022-11-02 17:46:07 +00:00
Billy Laws
2f2b615780 Add dynamic state support to VK graphics pipeline cache 2022-11-02 17:46:07 +00:00
Billy Laws
ad045058ee Cleanup some redundant comments and includes in pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
9b05c9c0c3 Introduce a pipeline manager and partial pipeline object
gpu-new will use a monolithic pipeline object for each pipeline to store state, keyed by the PackedPipelineState contents. This allows for a greater level of per-pipeline optimisations and a reduction in the overall number of lookups in a draw compared to the previous system.
2022-11-02 17:46:07 +00:00
Billy Laws
2c1b40c9a8 Add some missed pipeline state members used by Hades 2022-11-02 17:46:07 +00:00
Billy Laws
0c6fa22c6b Introduce pipeline shader stage state
Simple state that generates a hash that can be used in the packed state and spans over the binary for pipeline creation.
2022-11-02 17:46:07 +00:00
Billy Laws
a6bb716123 Move packed pipeline state to a seperate file 2022-11-02 17:46:07 +00:00
Billy Laws
4dcbf5c3a0 Drop the caching aspect of shader manager entirely
Caching here was deemed unnecessary since it will be done implicitly by the pipeline cache and creates issues with the legacy attribute conversion pass. It now purely serves as a frontend for Hades.
2022-11-02 17:46:07 +00:00
Billy Laws
e77e4891dc Tidy up Maxwell 3D regs a bit 2022-11-02 17:46:07 +00:00
Billy Laws
405d26fc22 Introduce Maxwell 3D shader state
Simple state holder that hashes the stored shader and reads it into a buffer
2022-11-02 17:46:07 +00:00
Billy Laws
6865f0bdaf Transition color blend state to packed pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
7049a521d2 Use Vulkan types directly in PackedPipelineState where possible 2022-11-02 17:46:07 +00:00
Billy Laws
effeb074b6 Move pipeline cache Key to cpp file and rename to PackedPipelineState
The new name is more indicative of what the struct contains and how it will be used as more than just a key.
2022-11-02 17:46:07 +00:00
Billy Laws
e1512c91a0 Transition depth stencil state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
1f844e2c18 Transition rasterization state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
9a6efb091c Transition tessellation state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ae5d419586 Transition input assembly state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
3f9161fb74 Transition vertex input state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ffe24aa075 Introduce a base pipeline cache key starting with RTs
It was determined that a general purpose Vulkan pipeline cache isn't viable for the significant performance reqs of Draw(), by using a Maxwell 3D specific key we can shrink state significantly more than if we used Vulkan structs.
2022-11-02 17:46:07 +00:00
Billy Laws
1e8f7d7fcb Implement dynamic pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
a94040ac7d Add default cases to enum conversions where necessary 2022-11-02 17:46:07 +00:00
Billy Laws
6e55d4dcf4 Implement color blend pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
cb11662ea5 Implement depth stencil pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
2484a2d6b5 Add A1R5G5B5Unorm and S8Uint formats 2022-11-02 17:46:07 +00:00
Billy Laws
690e96bce0 Implement rasterization pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
90cd6adb91 Implement tessellation pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
3649d4c779 Adapt Maxwell 3D engine to new interconnect code
Removes all usage of graphics_context.h from the codebase, exclusively using the new interconnect and its dirty tracking system. While porting the code a number of bugs were discovered such as not respecting the base instance or primitive type override, which have all been fixed. Currently only clears and constant buffer updates are implemented but due to the dirty state system allowing register handling on the interconnect end there shouldn't end up being many more changes.
2022-11-02 17:46:07 +00:00
Billy Laws
ef11900a39 Introduce reworked Maxwell 3D core interconnect
This mainly distributes operations down to activeState and pipelineState, aside from clears which are implemented in-place. The exposed interface is much reduced as opposed to the previous GraphicsContext system due to the newly introduced dirty system, this should hopefully make the code more maintainable and keep actual rendering operations seperate from primitive restart state or whatever. Currently draws are unimplemented and the only full implemented things are clears and constant buffer operations.
2022-11-02 17:46:07 +00:00
Billy Laws
37b821a4dc Introduce Maxwell 3D interconnect active state
Active state encapsulates all state that isn't part of a pipeline and can  be set dynamically with Vulkan calls. This includes both dynamic state like stencil faces, and command buffer state like vertex buffer bindings.
Simililarly to the last commit, the main goal of this is to reduce the number of redundant work done per draw by employing dirty state as much as possible. Without using dirty state for this every active state operation would need to be performed every draw, which gets very expensive when things like buffer lookups end up being reqiored. Code has also been heavily cleaned up as is described in the previous commit.
2022-11-02 17:46:07 +00:00
Billy Laws
5fdda78073 Introduce Maxwell 3D interconnect pipeline state
The main goal of this is to reduce the number of redundant lookups and work done per draw as much as possible, this is mainly achived through heavy used of dirty tracking though other optimisations like heavily using the linear allocator are also in play. In addition to the goal of performance, the code has been cleaned up and abstracted significantly from its state in graphics_context, hopefully making the GPU interconnect code much more maintainable in the future and reducing the boilerplace needed to add even simple functionality. This commit includes partial pipeline state, enough for implementing clears + a slight bit extra.
2022-11-02 17:46:07 +00:00
Billy Laws
21f5611231 Rewrite constant buffer interconnect code
Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.
2022-11-02 17:46:07 +00:00
Billy Laws
d1e7bbc1d8 Introduce common code for Maxwell 3D interconnect rewrite
This common code will be used across the entirety of the 3D rewrite, it also includes a stub for StateUpdateBuilder, which will be used by active state code to apply state updates.
2022-11-02 17:46:07 +00:00
Billy Laws
a6c49115f9 Rewrite all Maxwell 3D registers up to clears to match Nvidia docs
All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.
2022-11-02 17:46:07 +00:00
Billy Laws
d7eab40f1c Introduce resource based dirty tracking infrastructure
This will be heavily used by the upcoming GPU rework. It provides an intuitive way to track dirtiness based on using the underlying pointers of objects, as opposed to other methods which often need an enum entry per dirty state and don't support overlaps. Wrappers for dirty state objects are also provided to abstract as much of the dirty tracking as possible from user code. The pointer based mechanism also serves to avoid having to handle dirty bindings on the user side of the dirty resources, allowing them to bind things internally instead.
2022-11-02 17:46:07 +00:00
Billy Laws
8471ab754d Introduce a spin lock for resources locked at a very high frequency
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
2022-11-02 17:46:07 +00:00
Billy Laws
d810619203 Drop 3D engine method calling fast path in GPFIFO
This ended up actually turning out to be a slow path when Maxwell 3D method handling code was inlined.
2022-11-02 17:46:07 +00:00
Billy Laws
ded02e3eac Small engine.h fixups 2022-11-02 17:46:07 +00:00
Billy Laws
38ba963311 Drop usage of unique_ptr for Maxwell3D
Since graphics context is being replaced and split into cpp files there will no longer be any circular includes that previously prevented this.
2022-11-02 17:46:07 +00:00
Billy Laws
90db743c56 Source AsGpu GMMU page sizes from GMMU class 2022-11-02 17:46:07 +00:00
Billy Laws
e72fe02c15 Add inline fast-path for Buffer::FindOrCreate()
This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.
2022-11-02 17:46:07 +00:00
Billy Laws
49478e178a Avoid redundantly syncing buffers before every Write in an execution
This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.
2022-11-02 17:46:07 +00:00
Billy Laws
f7a726e452 Allow attempting to write to buffers without passing a GPU copy callback
Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.
2022-11-02 17:46:07 +00:00
Billy Laws
5dca5cc10e Redesign buffer view infra to remarkably reduce creation overhead
Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.
2022-11-02 17:46:07 +00:00
Billy Laws
09f376e500 Add const accessors to OffsetMember 2022-11-02 17:46:07 +00:00
Billy Laws
64a9db2e82 Introduce MergeInto helper for simplified construction of arrays of structs
In the upcoming GPU code each state member will hold a reference to its corresponding Maxwell 3D regs, this helper is needed to allow easy transformation from the the main 3D register struct into them.

Example:
```c++
struct Regs {
    std::array<View, 10> viewRegs;
    u32 enable;
} regs;

struct ViewState {
    const View &view;
    const u32 &enable;
    size_t index;
};

std::array<ViewState, 10> viewStates{MergeInto<ViewState, 10>(regs.viewRegs, regs.enable, IncrementingT{})
```
2022-11-02 17:46:07 +00:00
Billy Laws
2c682f19a6 Add untracked linear allocator emplace/allocate functions
Useful for cases where allocations are guaranteed to be unused by the time `Reset()` is called and calling `Free()` would be difficult or add extra performance cost due to how the allocation is used.
2022-11-02 17:46:07 +00:00
Billy Laws
6359852652 Introduce page size constants and replace all usages of PAGE_SIZE
Avoids using macros and results in code which looks slightly cleaner.
2022-11-02 17:46:07 +00:00
Billy Laws
30ec844a1b Use GPFIFO pushbuffer contents in-place if possible
The memcpy within `Read()` was taking up a fair amount of time, avoid this by using the mapped range in-place when the mapping isn't split.
2022-11-02 17:46:07 +00:00
Billy Laws
be825b7aad Utilise SegmentTable for rapid FlatMemoryManager lookups
In some games performing the binary search in `TranslateRange()` ended up taking a fairly large (~8%) proportion of GPFIFO time. By using a segment table for O(1) lookups this is reduced to <2% for non-split mappings at the cost of slightly increased memory usage (2GiB in the absolute worse case but more like 50MiB in real world situations).

In addition to adapting `TranslateRange()` to use the segment table, a new function `LookupBlock()` for cases where only a single mapping would ever be looked up so the small_vector handling and fallback paths can be skipped and the entire lookup be inlined.
2022-11-02 17:46:07 +00:00
Morph
4ea0b0e1e5 fssrv: IFileSystemProxy: Implement OpenReadOnlySaveDataFileSystem
Forward this function to OpenSaveDataFileSystem for now. A proper implementation should wrap the underlying filesystem with nn::fs::ReadOnlyFileSystem.
2022-10-30 20:04:40 +00:00