skyline

mirror of https://github.com/skyline-emu/skyline.git synced 2024-12-23 20:41:50 +01:00

Author	SHA1	Message	Date
Billy Laws	0670e0e0dc	Support using Vulkan semaphores with fence cycles In some cases like presentation, it may be possible to avoid waiting on the CPU by using a semaphore to indicate GPU completion. Due to the binary nature of Vulkan semaphores this requires a fair bit of code as we need to ensure semaphores are always unsignalled before they are waited on and signalled again. This is achieved with a special kind of chained cycle that can be added even after guest GPFIFO processing for a given cycle, the main cycle's semaphore can be waited and then the cycle for the wait attached to the main cycle and it will be waited on before signalling.	2022-11-02 17:46:07 +00:00
Billy Laws	5b72be88c3	Stub ldn:u service	2022-11-02 17:46:07 +00:00
Billy Laws	77d76ed05a	Batch contiguous GMMU ranges into one	2022-11-02 17:46:07 +00:00
Billy Laws	e52dbf202f	Pass more Maxwell3D registers into interconnect	2022-11-02 17:46:07 +00:00
Billy Laws	83c7ed314e	Setup KThread pthread handle in StartThread Avoids a race with starting the thread and the handle not being set yet	2022-11-02 17:46:07 +00:00
Billy Laws	9784ae23e9	Skip checking affinity before taking load-balance WaitScheduler path The affinity mask may be set after the wait has began	2022-11-02 17:46:07 +00:00
Billy Laws	ad3195e06f	Split out guest texture layer size calcs into a seperate func	2022-11-02 17:46:07 +00:00
Billy Laws	8fa83fdf13	Fix deswizzling non-pow2 block size formats We need to use DivideCeil to avoid rounding off part of the texture. Fixes texture in Nier Automata: Game of the YoRHa edition.	2022-11-02 17:46:07 +00:00
Billy Laws	27de42f8df	Use surfaceClip as a hint for the underlying rendertarget size TIC sizes may not be aligned to block linear dimensions whereas RT sizes are and then limited by the surface clip. By using this to determine surface size we are more likely to get a match in texture manager for any future usages.	2022-11-02 17:46:07 +00:00
Billy Laws	297597f697	Fix texture manager depth compat comparison	2022-11-02 17:46:07 +00:00
Billy Laws	500f817a28	Synchronize all non-matching textures back to host before recreation	2022-11-02 17:46:07 +00:00
Billy Laws	05581f2230	Remove now redundant buffer/texture/megabuffer manager locks They have been superseeded by the global channel lock	2022-11-02 17:46:07 +00:00
Billy Laws	f5a141a621	Add dirty resource operator*	2022-11-02 17:46:07 +00:00
Billy Laws	b72720e8db	Finish off transform feedback implementation	2022-11-02 17:46:07 +00:00
Billy Laws	36fd885b49	Pack all draw state into a struct to avoid std::function allocations	2022-11-02 17:46:07 +00:00
Billy Laws	b5d0060c3f	Only use scissor for clear rect when enabled	2022-11-02 17:46:07 +00:00
Billy Laws	f93df35e6c	Only set line width when wideLines feature is supported	2022-11-02 17:46:07 +00:00
Billy Laws	4cebdfc8d3	Pass texture and cbuf state into pipeline manager for hades callbacks	2022-11-02 17:46:07 +00:00
Billy Laws	9ce848d4e0	Implement descriptor update batching and push descriptors Batching helps to avoid the need to attach so many objects to the fence cycle, which ends up taking a fair bit of time due to the allocation required.	2022-11-02 17:46:07 +00:00
Billy Laws	62a165b51e	Reformat maxwell3d interconnect codebase	2022-11-02 17:46:07 +00:00
Billy Laws	3766be59e7	Zero out vertex attribute state when disabled to avoid creating redundant pipelines	2022-11-02 17:46:07 +00:00
Billy Laws	751e3356e1	Keep shader trap lock held for the duration of an execution Avoids constant relocking on the GPFIFO thread (~0.5% of total time)	2022-11-02 17:46:07 +00:00
Billy Laws	314a9bccbc	Allow megabuffering readonly SSBOs	2022-11-02 17:46:07 +00:00
Billy Laws	4c2db0ba01	Implement ReadCbufValue and ReadTextureType hades callbacks Used for bindless and BRX instruction emulation.	2022-11-02 17:46:07 +00:00
Billy Laws	2163f8cde6	Implement alpha test pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	c86ad638c4	Keep track of transform feedback varyings pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	7ad2d94345	Zero out blend state when disabled to avoid creating redundant pipelines	2022-11-02 17:46:07 +00:00
Billy Laws	4052a93051	Force non-pushdescriptors for blit helper shader	2022-11-02 17:46:07 +00:00
Billy Laws	cb2a8c6d24	Enable wideLines Vulkan feature	2022-11-02 17:46:07 +00:00
Billy Laws	a3369637a9	Don't entirely wipe out per-index TIC cache efter each execution Keep a copy of the old TIC entry and view even after purge caches and use the execution number to check validity instead, if that doesn't match then just memcmp can be used as opposed to a full hash and map lookup.	2022-11-02 17:46:07 +00:00
Billy Laws	98c0cc3e7f	Impl preserve attached buffers/textures to avoid GPFIFO lock thrashing When profiling SMO, it became obvious that the constant locking of textures and buffers in SyncDescriptors took up a large amount of CPU time (3-5%), a precious resource in intensive areas like Metro. This commit implements somewhat of a workaround to avoid constant relocking, if a buffer is frequently attached on the GPU and almost never used on the CPU we can keep the lock held between executions. Of course it's not that simple though, if the guest tries to lock a texture for the first time which has already been locked as preserve on the GPFIFO we need to avoid a deadlock. This is acheived through a combination of two things: first we periodically clear the locked attachments every 2*SlotCount submissions, preventing a complete deadlock on the CPU (just a long wait instead) and meaning that the next time the resource is attached on the GPU it will not be marked for preservation due to having been locked on the guest before; second, we always need to unlock everything when the GPU thread runs out of work, as the perioding clearing will not execute in this case which would otherwise leave the textures locked on the GPFIFO thread forever (if guest was waiting on a lock to submit work). It should be noted that we don't clear preserve attached resources in the latter scenario, only unlock them and then relock when more work is available.	2022-11-02 17:46:07 +00:00
Billy Laws	0e8ccf1e99	Use memory_order_release for new descriptor set allocations	2022-11-02 17:46:07 +00:00
Billy Laws	34db5097da	Avoid using a shared_ptr reference to cycle for command buffer submission Somehow without this we can sometimes get crashes during appending to circular queue	2022-11-02 17:46:07 +00:00
Billy Laws	0428e8c7da	Support forcing regular descriptor sets in VK pipeline cache	2022-11-02 17:46:07 +00:00
Billy Laws	0f394d516b	Lock FenceCycle inbetween waiting on chained cycles and checking signalled Avoids one race where we would end up hogging all the locks of chained cycles and ourself when waiting for submission of previous cycles and prevent any forward progress due to another thread locking one of the chained cycles.	2022-11-02 17:46:07 +00:00
Billy Laws	a015fe753d	Only write npad controllerInfo entry on the HID thread if it is valid	2022-11-02 17:46:07 +00:00
Billy Laws	b5446846f7	Stub IsSixAxisSensorAtRest	2022-11-02 17:46:07 +00:00
Billy Laws	6719572b3b	Keep track of how often textures/buffers are locked on the CPU For the upcoming preserve attachment optimisation, which will keep buffers/textures locked on the GPU between executions, we don't want to preserve any which are frequently locked on the CPU as that would result in lots of needless waiting for a resource to be unlocked by the GPU when it occasionally frees all preserve attachments when it could have been done much sooner. By checking if a resource has ever been locked on the CPU and using that to choose whether we preserve it we can avoid such waiting.	2022-11-02 17:46:07 +00:00
Billy Laws	993ffb56f4	Avoid waiting on texture/buffer fence with trapMutex locked This could cause deadlocks under certain circumstances by preventing the GPFIFO from making any progress while also waiting on it at the same time.	2022-11-02 17:46:07 +00:00
Billy Laws	3e8bd26978	Add a global gm20b channel lock Allowing for parallel execution of channels never really benefitted many games and prevented optimisations such as keeping frequently used resources always locked to avoid the constant overhead of locking on the hot path.	2022-11-02 17:46:07 +00:00
Billy Laws	57a4699bd1	Add IOCTL trace events	2022-11-02 17:46:07 +00:00
Billy Laws	7861968c05	Fix memory::Buffer move constructor	2022-11-02 17:46:07 +00:00
Billy Laws	ef0ae30667	Implement Maxwell3D texture pool management and view creation Ontop of the TIC cache from previous code a simple index based lookup has been added which vastly speeds things up by avoding the need to hash the TIC structure every time.	2022-11-02 17:46:07 +00:00
Billy Laws	5542459c75	Use a SpinLock for guest shader code cache trap mutex	2022-11-02 17:46:07 +00:00
Billy Laws	3e12cde4d5	Make active Vulkan pipeline public	2022-11-02 17:46:07 +00:00
Billy Laws	2556966ec5	Don't attach textures to the active cycle in AttachTexture	2022-11-02 17:46:07 +00:00
Billy Laws	7dc3dde815	Introduce support for waiting for submission to FenceCycle Introducing async record resulted in breaking the assumption that any work submitted through command scheduler would be submitted in order with graphics submits. Since async record now unlocks the texture before it's submitted a seperate mechanism is needed to ensure ordering of submits. This is achieved by building support into fence cycle itself, with a conditional variable that is waited on for submission before any fence waits occur.	2022-11-02 17:46:07 +00:00
Billy Laws	54b85583ae	Fix layout transition in Texture::CopyFrom	2022-11-02 17:46:07 +00:00
Billy Laws	0f7c04ffb4	Use target format bpb when calculating linear mip level size	2022-11-02 17:46:07 +00:00
Billy Laws	849184452c	Add function to check if any guest texture mappings are unmapped	2022-11-02 17:46:07 +00:00
Billy Laws	98cb94ca6c	Bind an empty uniform buffer in place of unbound constant buffers	2022-11-02 17:46:07 +00:00
Billy Laws	55b85d0691	Implement combined image samplers and make descriptor code common between quick/normal updates	2022-11-02 17:46:07 +00:00
Billy Laws	ccf2d59351	Fixup input rate reading from packed pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	13970a5644	Refresh pipeline cached storage buffer bindings after each execution	2022-11-02 17:46:07 +00:00
Billy Laws	6bb2853ca0	Keep track of combined image samplers for quick bind	2022-11-02 17:46:07 +00:00
Billy Laws	040db37a28	Fix descriptor copies to be one per descriptor type	2022-11-02 17:46:07 +00:00
Billy Laws	d482e0ea98	Fix shader stage iteration to not miss the pixel stage	2022-11-02 17:46:07 +00:00
Billy Laws	0b808cc22b	Use Sint/Uint attribute type in place of Sscaled/Uscaled Scaled formats are not supported by any mobile GPUs.	2022-11-02 17:46:07 +00:00
Billy Laws	92ce220d3a	Ignore constant buffer selector size for updates Clustertruck performs a giant (0x3000 byte) update with a selector size of only 0x500, without this the update would only partially go through	2022-11-02 17:46:07 +00:00
Billy Laws	33f16ca26e	Handle unmapped blocks in CachedMappedBufferView	2022-11-02 17:46:07 +00:00
Billy Laws	5c0e4a839d	Fix SW BC2 decoding pitch	2022-11-02 17:46:07 +00:00
Billy Laws	60863fa162	Make viewports fallback to viewport 0 when their dimensions are invalid OpenGL games use a zero width/height for viewports 1-15 when they're unused	2022-11-02 17:46:07 +00:00
Billy Laws	586f872655	Update indexed quad conversion for new API	2022-11-02 17:46:07 +00:00
Billy Laws	498b4966d3	Avoid crashing on unmapped buffers Just log a warning instead	2022-11-02 17:46:07 +00:00
Billy Laws	9f2b20443b	Implement Maxwell3D draws	2022-11-02 17:46:07 +00:00
Billy Laws	5020478ace	Always rebind pipeline if it has changed from the previous draw	2022-11-02 17:46:07 +00:00
Billy Laws	af1b4ca4f8	Skip clears if attachments are invalid	2022-11-02 17:46:07 +00:00
Billy Laws	01a5e95ce1	Implement non-indexed quad conversion support in new GPU Uses memory::Buffer directly to allow for cleaner recreation and allow for removing host-only buffers from Buffer in the future.	2022-11-02 17:46:07 +00:00
Billy Laws	d42814bdc1	Add callback for when non-Maxwell3D engines alter pipeline state This is required so that Maxwell3D knows to restore all dynamic state before the next draw	2022-11-02 17:46:07 +00:00
Billy Laws	7133c5d6b3	Drop exclusiveSubpass in favour of only ending RPs when attachments change Significantly helps Mali performance by avoiding creating an excessive number of RPs.	2022-11-02 17:46:07 +00:00
Billy Laws	a3f38c0cf7	Add perfetto tracepoints to async record	2022-11-02 17:46:07 +00:00
Billy Laws	6dfef095e8	Up default record slot count to 6	2022-11-02 17:46:07 +00:00
Billy Laws	7d3a117a6f	Use a spinlock for descriptor set allocator	2022-11-02 17:46:07 +00:00
Billy Laws	78ddd03d1f	Mark newly allocated descriptor slots as active	2022-11-02 17:46:07 +00:00
Billy Laws	0867c593be	Support binding pipelines in state updater	2022-11-02 17:46:07 +00:00
Billy Laws	f42a0df72c	Use ctSelect register for colour targets and impl null color attachments Some games have invalid colour attachments sandwiched inbetween valid ones, these need to be skipped in Vulkan with UNUSED_ATTACHMENT	2022-11-02 17:46:07 +00:00
Billy Laws	38aad21d29	Share single flag variable for Maxwell3D batch draw/constant buffer update Slightly cheaper	2022-11-02 17:46:07 +00:00
Billy Laws	bd7eee8e2b	Optimise GPFIFO command processing GPFIFO code is very high throughput due to the sheer number of commands used for rendering. Adjust some types and switch to a if statement with hints to slightly increase processing speed.	2022-11-02 17:46:07 +00:00
Billy Laws	2cdf6c1fe6	Add branch hint attributes to a couple branches	2022-11-02 17:46:07 +00:00
Billy Laws	b310b99bdc	Handle unmapped ranges in TranslateRange	2022-11-02 17:46:07 +00:00
Billy Laws	acfa58ea8c	State uopdater MERGEBACK desC	2022-11-02 17:46:07 +00:00
Billy Laws	68b6b20f78	Bunch of cleanup/bugfixes for initial pipeline desc set impl Cleans up code slightly, fixes keeping track of per-stage descriptor counts and fixes storage buffer caching + more random stuff.	2022-11-02 17:46:07 +00:00
Billy Laws	eb4a9bab11	Mark freshly allocated descriptor slots as active	2022-11-02 17:46:07 +00:00
Billy Laws	f3184cdff1	Reorder active descriptor set slots to end of list Speeds up allocations by significantly reducing the number of needed atomic ops per alloc. Could be optimised further in the future if needed.	2022-11-02 17:46:07 +00:00
Billy Laws	128b68d8b2	Avoid resetting command buffers manually it's implicit Somewhat costly on adreno, and with the one time submit flag the reset is implicit for the next beginCommandBuffers call.	2022-11-02 17:46:07 +00:00
Billy Laws	e1717ed811	Implement Maxwell samplers	2022-11-02 17:46:07 +00:00
Billy Laws	f1600f5ad0	Support allocating into spans in the linear allocator	2022-11-02 17:46:07 +00:00
Billy Laws	04cea9239f	Implement descriptor set updating through StateUpdater Will automatically resolve buffer views at record-time into descriptor write/copy structures then apply the write and bind the set.	2022-11-02 17:46:07 +00:00
Billy Laws	d174ca950b	Revert "Reset executor command buffers asynchronously" This reverts commit fc7956df4ff56fdb2afc4b2bb0bbca82196179ca.	2022-11-02 17:46:07 +00:00
Billy Laws	2bbe975ea7	Reset executor command buffers asynchronously This took a little while to do on qcom drivers, moving it to the cycle waiter thread gives a tiny speedup.	2022-11-02 17:46:07 +00:00
Billy Laws	054d32567d	Allow mutation of input data by callback in CircularQueue::AppendTranform	2022-11-02 17:46:07 +00:00
Billy Laws	7c9212743c	Implement asynchronous command recording Recording of command nodes into Vulkan command buffers is very easily parallelisable as it can effectively be treated as part of the GPU execution, which is inherently async. By moving it to a seperate thread we can shave off about 20% of GPFIFO execution time. It should be noted that the command scheduler command buffer infra is no longer used, since we need to record texture updates on the GPFIFO thread (while another slot is being recorded on the record thread) and then use the same command buffer on the record thread later. This ends up requiring a pool per slot, which is reasonable considering we only have four slots by default.	2022-11-02 17:46:07 +00:00
Billy Laws	a197dd2b28	Allow for creating signalled fence cycles	2022-11-02 17:46:07 +00:00
Billy Laws	542651232b	Add a mutex to allow preventing buffer recreation	2022-11-02 17:46:07 +00:00
Billy Laws	379b4f163d	Implement popping from CircularQueue	2022-11-02 17:46:07 +00:00
Billy Laws	6d9dc9c6fb	Implement some more of Draw Now performs VK state updates on execution and takes advantage of quick descriptor binding.	2022-11-02 17:46:07 +00:00
Billy Laws	3b26f4f48a	Expose way to check inter-pipeline descriptor compatibility Allows users to skip/use quick descriptor sync even after switching pipelines.	2022-11-02 17:46:07 +00:00
Billy Laws	943a38e168	Implement StateUpdater for rapid recording of VK state updates Using command executor for each state individual update was found to be infeasible due to the shear number of state updates per draw and it relying on per-node heap allocations. Instead this commit takes advantage of each state update being used only once to implement a system of linearly-allocated state update commands that are linked together. After setting up all draw state with StateUpdateBuilder, the built StateUpdater can then be used in the execution phase to record all of the draw state into the command buffer with almost zero ovehead.	2022-11-02 17:46:07 +00:00
Billy Laws	7b4da52445	Add a fast binding sync path for when only one cbuf has changed SMO implements instanced draws by repeating the same draw just with a different constant buffer bound. Reduce the cost of this significantly by detecting such cases and instead of processing every descriptor, copy the previous descriptor set and update only the ones affected by the bound constant buffer. Credits to ripinperiperi for the initial idea and making me aware of how SMO does these draws	2022-11-02 17:46:07 +00:00
Billy Laws	89edd9b303	Reset megabuffer binding for disabled vertex buffers	2022-11-02 17:46:07 +00:00

1 2 3 4 5 ...

1472 Commits