Billy Laws
14af383238
Only allow submitting swapchainImageCount
images for host present at a time
...
Prevents situations where nothing would otherwise be waiting on the GPU and since presentation no longer blocks too many images would be submitted for presentation.
2022-11-02 17:46:07 +00:00
Billy Laws
bcd96ac77d
Fixup A8R8G8B8 TIC format mapping
...
8-bit formats are inverted in TICs compared to Vulkan
2022-11-02 17:46:07 +00:00
Billy Laws
90466b8830
Implement depth clamp rasterisation state
...
Used in SMO for shadows.
2022-11-02 17:46:07 +00:00
Billy Laws
1cfc4278f9
Disable preserve buffer/texture attachment opt for now
...
Causes several issues and crashes in Pokemon without an obvious cause.
2022-11-02 17:46:07 +00:00
Billy Laws
e483cf9634
Use shader memory mirror when reading guest shaders
...
Avoids triggering any traps that may be present on the region
2022-11-02 17:46:07 +00:00
Billy Laws
f6e4328b5a
Ensure blit src/dst textures are attached as execution cycle dependencies
...
Since they're not in the TIC pool they would otherwise be freed
2022-11-02 17:46:07 +00:00
Billy Laws
77a131df60
Support using in-app renderdoc API to capture individual executions
2022-11-02 17:46:07 +00:00
Billy Laws
576bc6f37e
Add CommandExecutor slot count setting
2022-11-02 17:46:07 +00:00
Billy Laws
1a0819fb76
Use semaphores for presentation engine frame synchronisation
...
Avoids waits on the CPU which can be costly and confuse the scheduler, also reduces latency significantly.
2022-11-02 17:46:07 +00:00
Billy Laws
0670e0e0dc
Support using Vulkan semaphores with fence cycles
...
In some cases like presentation, it may be possible to avoid waiting on the CPU by using a semaphore to indicate GPU completion. Due to the binary nature of Vulkan semaphores this requires a fair bit of code as we need to ensure semaphores are always unsignalled before they are waited on and signalled again. This is achieved with a special kind of chained cycle that can be added even after guest GPFIFO processing for a given cycle, the main cycle's semaphore can be waited and then the cycle for the wait attached to the main cycle and it will be waited on before signalling.
2022-11-02 17:46:07 +00:00
Billy Laws
5b72be88c3
Stub ldn:u service
2022-11-02 17:46:07 +00:00
Billy Laws
77d76ed05a
Batch contiguous GMMU ranges into one
2022-11-02 17:46:07 +00:00
Billy Laws
e52dbf202f
Pass more Maxwell3D registers into interconnect
2022-11-02 17:46:07 +00:00
Billy Laws
83c7ed314e
Setup KThread pthread handle in StartThread
...
Avoids a race with starting the thread and the handle not being set yet
2022-11-02 17:46:07 +00:00
Billy Laws
9784ae23e9
Skip checking affinity before taking load-balance WaitScheduler path
...
The affinity mask may be set after the wait has began
2022-11-02 17:46:07 +00:00
Billy Laws
ad3195e06f
Split out guest texture layer size calcs into a seperate func
2022-11-02 17:46:07 +00:00
Billy Laws
8fa83fdf13
Fix deswizzling non-pow2 block size formats
...
We need to use DivideCeil to avoid rounding off part of the texture.
Fixes texture in Nier Automata: Game of the YoRHa edition.
2022-11-02 17:46:07 +00:00
Billy Laws
27de42f8df
Use surfaceClip as a hint for the underlying rendertarget size
...
TIC sizes may not be aligned to block linear dimensions whereas RT sizes are and then limited by the surface clip. By using this to determine surface size we are more likely to get a match in texture manager for any future usages.
2022-11-02 17:46:07 +00:00
Billy Laws
297597f697
Fix texture manager depth compat comparison
2022-11-02 17:46:07 +00:00
Billy Laws
500f817a28
Synchronize all non-matching textures back to host before recreation
2022-11-02 17:46:07 +00:00
Billy Laws
05581f2230
Remove now redundant buffer/texture/megabuffer manager locks
...
They have been superseeded by the global channel lock
2022-11-02 17:46:07 +00:00
Billy Laws
f5a141a621
Add dirty resource operator*
2022-11-02 17:46:07 +00:00
Billy Laws
b72720e8db
Finish off transform feedback implementation
2022-11-02 17:46:07 +00:00
Billy Laws
36fd885b49
Pack all draw state into a struct to avoid std::function allocations
2022-11-02 17:46:07 +00:00
Billy Laws
b5d0060c3f
Only use scissor for clear rect when enabled
2022-11-02 17:46:07 +00:00
Billy Laws
f93df35e6c
Only set line width when wideLines feature is supported
2022-11-02 17:46:07 +00:00
Billy Laws
4cebdfc8d3
Pass texture and cbuf state into pipeline manager for hades callbacks
2022-11-02 17:46:07 +00:00
Billy Laws
9ce848d4e0
Implement descriptor update batching and push descriptors
...
Batching helps to avoid the need to attach so many objects to the fence cycle, which ends up taking a fair bit of time due to the allocation required.
2022-11-02 17:46:07 +00:00
Billy Laws
62a165b51e
Reformat maxwell3d interconnect codebase
2022-11-02 17:46:07 +00:00
Billy Laws
3766be59e7
Zero out vertex attribute state when disabled to avoid creating redundant pipelines
2022-11-02 17:46:07 +00:00
Billy Laws
751e3356e1
Keep shader trap lock held for the duration of an execution
...
Avoids constant relocking on the GPFIFO thread (~0.5% of total time)
2022-11-02 17:46:07 +00:00
Billy Laws
314a9bccbc
Allow megabuffering readonly SSBOs
2022-11-02 17:46:07 +00:00
Billy Laws
4c2db0ba01
Implement ReadCbufValue and ReadTextureType hades callbacks
...
Used for bindless and BRX instruction emulation.
2022-11-02 17:46:07 +00:00
Billy Laws
2163f8cde6
Implement alpha test pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
c86ad638c4
Keep track of transform feedback varyings pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
7ad2d94345
Zero out blend state when disabled to avoid creating redundant pipelines
2022-11-02 17:46:07 +00:00
Billy Laws
4052a93051
Force non-pushdescriptors for blit helper shader
2022-11-02 17:46:07 +00:00
Billy Laws
cb2a8c6d24
Enable wideLines Vulkan feature
2022-11-02 17:46:07 +00:00
Billy Laws
a3369637a9
Don't entirely wipe out per-index TIC cache efter each execution
...
Keep a copy of the old TIC entry and view even after purge caches and use the execution number to check validity instead, if that doesn't match then just memcmp can be used as opposed to a full hash and map lookup.
2022-11-02 17:46:07 +00:00
Billy Laws
98c0cc3e7f
Impl preserve attached buffers/textures to avoid GPFIFO lock thrashing
...
When profiling SMO, it became obvious that the constant locking of textures and buffers in SyncDescriptors took up a large amount of CPU time (3-5%), a precious resource in intensive areas like Metro. This commit implements somewhat of a workaround to avoid constant relocking, if a buffer is frequently attached on the GPU and almost never used on the CPU we can keep the lock held between executions. Of course it's not that simple though, if the guest tries to lock a texture for the first time which has already been locked as preserve on the GPFIFO we need to avoid a deadlock. This is acheived through a combination of two things: first we periodically clear the locked attachments every 2*SlotCount submissions, preventing a complete deadlock on the CPU (just a long wait instead) and meaning that the next time the resource is attached on the GPU it will not be marked for preservation due to having been locked on the guest before; second, we always need to unlock everything when the GPU thread runs out of work, as the perioding clearing will not execute in this case which would otherwise leave the textures locked on the GPFIFO thread forever (if guest was waiting on a lock to submit work). It should be noted that we don't clear preserve attached resources in the latter scenario, only unlock them and then relock when more work is available.
2022-11-02 17:46:07 +00:00
Billy Laws
0e8ccf1e99
Use memory_order_release for new descriptor set allocations
2022-11-02 17:46:07 +00:00
Billy Laws
34db5097da
Avoid using a shared_ptr reference to cycle for command buffer submission
...
Somehow without this we can sometimes get crashes during appending to circular queue
2022-11-02 17:46:07 +00:00
Billy Laws
0428e8c7da
Support forcing regular descriptor sets in VK pipeline cache
2022-11-02 17:46:07 +00:00
Billy Laws
0f394d516b
Lock FenceCycle inbetween waiting on chained cycles and checking signalled
...
Avoids one race where we would end up hogging all the locks of chained cycles and ourself when waiting for submission of previous cycles and prevent any forward progress due to another thread locking one of the chained cycles.
2022-11-02 17:46:07 +00:00
Billy Laws
a015fe753d
Only write npad controllerInfo entry on the HID thread if it is valid
2022-11-02 17:46:07 +00:00
Billy Laws
b5446846f7
Stub IsSixAxisSensorAtRest
2022-11-02 17:46:07 +00:00
Billy Laws
6719572b3b
Keep track of how often textures/buffers are locked on the CPU
...
For the upcoming preserve attachment optimisation, which will keep buffers/textures locked on the GPU between executions, we don't want to preserve any which are frequently locked on the CPU as that would result in lots of needless waiting for a resource to be unlocked by the GPU when it occasionally frees all preserve attachments when it could have been done much sooner. By checking if a resource has ever been locked on the CPU and using that to choose whether we preserve it we can avoid such waiting.
2022-11-02 17:46:07 +00:00
Billy Laws
993ffb56f4
Avoid waiting on texture/buffer fence with trapMutex locked
...
This could cause deadlocks under certain circumstances by preventing the GPFIFO from making any progress while also waiting on it at the same time.
2022-11-02 17:46:07 +00:00
Billy Laws
3e8bd26978
Add a global gm20b channel lock
...
Allowing for parallel execution of channels never really benefitted many games and prevented optimisations such as keeping frequently used resources always locked to avoid the constant overhead of locking on the hot path.
2022-11-02 17:46:07 +00:00
Billy Laws
57a4699bd1
Add IOCTL trace events
2022-11-02 17:46:07 +00:00
Billy Laws
7861968c05
Fix memory::Buffer move constructor
2022-11-02 17:46:07 +00:00
Billy Laws
ef0ae30667
Implement Maxwell3D texture pool management and view creation
...
Ontop of the TIC cache from previous code a simple index based lookup has been added which vastly speeds things up by avoding the need to hash the TIC structure every time.
2022-11-02 17:46:07 +00:00
Billy Laws
5542459c75
Use a SpinLock for guest shader code cache trap mutex
2022-11-02 17:46:07 +00:00
Billy Laws
3e12cde4d5
Make active Vulkan pipeline public
2022-11-02 17:46:07 +00:00
Billy Laws
2556966ec5
Don't attach textures to the active cycle in AttachTexture
2022-11-02 17:46:07 +00:00
Billy Laws
7dc3dde815
Introduce support for waiting for submission to FenceCycle
...
Introducing async record resulted in breaking the assumption that any work submitted through command scheduler would be submitted in order with graphics submits. Since async record now unlocks the texture before it's submitted a seperate mechanism is needed to ensure ordering of submits. This is achieved by building support into fence cycle itself, with a conditional variable that is waited on for submission before any fence waits occur.
2022-11-02 17:46:07 +00:00
Billy Laws
54b85583ae
Fix layout transition in Texture::CopyFrom
2022-11-02 17:46:07 +00:00
Billy Laws
0f7c04ffb4
Use target format bpb when calculating linear mip level size
2022-11-02 17:46:07 +00:00
Billy Laws
849184452c
Add function to check if any guest texture mappings are unmapped
2022-11-02 17:46:07 +00:00
Billy Laws
98cb94ca6c
Bind an empty uniform buffer in place of unbound constant buffers
2022-11-02 17:46:07 +00:00
Billy Laws
55b85d0691
Implement combined image samplers and make descriptor code common between quick/normal updates
2022-11-02 17:46:07 +00:00
Billy Laws
ccf2d59351
Fixup input rate reading from packed pipeline state
2022-11-02 17:46:07 +00:00
Billy Laws
13970a5644
Refresh pipeline cached storage buffer bindings after each execution
2022-11-02 17:46:07 +00:00
Billy Laws
6bb2853ca0
Keep track of combined image samplers for quick bind
2022-11-02 17:46:07 +00:00
Billy Laws
040db37a28
Fix descriptor copies to be one per descriptor type
2022-11-02 17:46:07 +00:00
Billy Laws
d482e0ea98
Fix shader stage iteration to not miss the pixel stage
2022-11-02 17:46:07 +00:00
Billy Laws
0b808cc22b
Use Sint/Uint attribute type in place of Sscaled/Uscaled
...
Scaled formats are not supported by any mobile GPUs.
2022-11-02 17:46:07 +00:00
Billy Laws
92ce220d3a
Ignore constant buffer selector size for updates
...
Clustertruck performs a giant (0x3000 byte) update with a selector size of only 0x500, without this the update would only partially go through
2022-11-02 17:46:07 +00:00
Billy Laws
33f16ca26e
Handle unmapped blocks in CachedMappedBufferView
2022-11-02 17:46:07 +00:00
Billy Laws
5c0e4a839d
Fix SW BC2 decoding pitch
2022-11-02 17:46:07 +00:00
Billy Laws
60863fa162
Make viewports fallback to viewport 0 when their dimensions are invalid
...
OpenGL games use a zero width/height for viewports 1-15 when they're unused
2022-11-02 17:46:07 +00:00
Billy Laws
586f872655
Update indexed quad conversion for new API
2022-11-02 17:46:07 +00:00
Billy Laws
498b4966d3
Avoid crashing on unmapped buffers
...
Just log a warning instead
2022-11-02 17:46:07 +00:00
Billy Laws
9f2b20443b
Implement Maxwell3D draws
2022-11-02 17:46:07 +00:00
Billy Laws
5020478ace
Always rebind pipeline if it has changed from the previous draw
2022-11-02 17:46:07 +00:00
Billy Laws
af1b4ca4f8
Skip clears if attachments are invalid
2022-11-02 17:46:07 +00:00
Billy Laws
01a5e95ce1
Implement non-indexed quad conversion support in new GPU
...
Uses memory::Buffer directly to allow for cleaner recreation and allow for removing host-only buffers from Buffer in the future.
2022-11-02 17:46:07 +00:00
Billy Laws
d42814bdc1
Add callback for when non-Maxwell3D engines alter pipeline state
...
This is required so that Maxwell3D knows to restore all dynamic state before the next draw
2022-11-02 17:46:07 +00:00
Billy Laws
7133c5d6b3
Drop exclusiveSubpass in favour of only ending RPs when attachments change
...
Significantly helps Mali performance by avoiding creating an excessive number of RPs.
2022-11-02 17:46:07 +00:00
Billy Laws
a3f38c0cf7
Add perfetto tracepoints to async record
2022-11-02 17:46:07 +00:00
Billy Laws
6dfef095e8
Up default record slot count to 6
2022-11-02 17:46:07 +00:00
Billy Laws
7d3a117a6f
Use a spinlock for descriptor set allocator
2022-11-02 17:46:07 +00:00
Billy Laws
78ddd03d1f
Mark newly allocated descriptor slots as active
2022-11-02 17:46:07 +00:00
Billy Laws
0867c593be
Support binding pipelines in state updater
2022-11-02 17:46:07 +00:00
Billy Laws
f42a0df72c
Use ctSelect register for colour targets and impl null color attachments
...
Some games have invalid colour attachments sandwiched inbetween valid ones, these need to be skipped in Vulkan with UNUSED_ATTACHMENT
2022-11-02 17:46:07 +00:00
Billy Laws
38aad21d29
Share single flag variable for Maxwell3D batch draw/constant buffer update
...
Slightly cheaper
2022-11-02 17:46:07 +00:00
Billy Laws
bd7eee8e2b
Optimise GPFIFO command processing
...
GPFIFO code is very high throughput due to the sheer number of commands used for rendering. Adjust some types and switch to a if statement with hints to slightly increase processing speed.
2022-11-02 17:46:07 +00:00
Billy Laws
2cdf6c1fe6
Add branch hint attributes to a couple branches
2022-11-02 17:46:07 +00:00
Billy Laws
b310b99bdc
Handle unmapped ranges in TranslateRange
2022-11-02 17:46:07 +00:00
Billy Laws
acfa58ea8c
State uopdater MERGEBACK desC
2022-11-02 17:46:07 +00:00
Billy Laws
68b6b20f78
Bunch of cleanup/bugfixes for initial pipeline desc set impl
...
Cleans up code slightly, fixes keeping track of per-stage descriptor counts and fixes storage buffer caching + more random stuff.
2022-11-02 17:46:07 +00:00
Billy Laws
eb4a9bab11
Mark freshly allocated descriptor slots as active
2022-11-02 17:46:07 +00:00
Billy Laws
f3184cdff1
Reorder active descriptor set slots to end of list
...
Speeds up allocations by significantly reducing the number of needed atomic ops per alloc. Could be optimised further in the future if needed.
2022-11-02 17:46:07 +00:00
Billy Laws
128b68d8b2
Avoid resetting command buffers manually it's implicit
...
Somewhat costly on adreno, and with the one time submit flag the reset is implicit for the next beginCommandBuffers call.
2022-11-02 17:46:07 +00:00
Billy Laws
e1717ed811
Implement Maxwell samplers
2022-11-02 17:46:07 +00:00
Billy Laws
f1600f5ad0
Support allocating into spans in the linear allocator
2022-11-02 17:46:07 +00:00
Billy Laws
04cea9239f
Implement descriptor set updating through StateUpdater
...
Will automatically resolve buffer views at record-time into descriptor write/copy structures then apply the write and bind the set.
2022-11-02 17:46:07 +00:00
Billy Laws
d174ca950b
Revert "Reset executor command buffers asynchronously"
...
This reverts commit fc7956df4ff56fdb2afc4b2bb0bbca82196179ca.
2022-11-02 17:46:07 +00:00
Billy Laws
2bbe975ea7
Reset executor command buffers asynchronously
...
This took a little while to do on qcom drivers, moving it to the cycle waiter thread gives a tiny speedup.
2022-11-02 17:46:07 +00:00
Billy Laws
054d32567d
Allow mutation of input data by callback in CircularQueue::AppendTranform
2022-11-02 17:46:07 +00:00