Since the register writes technically happen after the draw, issues can occur if they happen before: e.g. skyrim updates ctSelect and disables all RTs after a draw, but this would happen before it previously and crash the driver.
Vulkan doesn't allow sampling a texture and using it as an RT in the same RP, by tracking the texture usage status and splitting RPs when this occurs we can avoid such potential sync errors.
Previously, both I2M uploads and DMA copies would force GPU serialisation if they happened to hit a trap or were used to copy GPU dirty buffers. By using the buffer manager to implement them on the host GPU we can avoid such slowdowns entiely.
The lock release within the wait for submission means that another thread could end up signalling the cycle and then the VK wait still happen after when the lock has been reacquired.
Readback can be especially slow on mobile due to the varying load pattern it creates which often prevents the CPU/GPU from clocking up. Since some games perform texture readback but don't actually use it for anything significant implement a hack to skip it and significantly improve performance in such cases.
Due to the frequency at which is is called megabuffering performance is critical to the performance of the entire emulator, especially in high-drawcall-count scenarios. After the view redesign, megabuffering on a per-view level was no longer possible nor desirable, and thus megabuffering was modified to just copy for every usage of a view. This worked great at the time since there were other bottlenecks, however gpu-new has since removed almost all of them and megabuffering is now a major sore point. Fix this by megabuffering small chunks and storing them in a page-table like structure within the buffer, these chunks can be referenced by multiple views and will be smartly invalidated whenever the sequence number or execution number changes to avoid any sequencing issues. In addition to this, to help the case where almost the whole buffer is read every single frame across a set of multiple views, an optimisation to skip the chunked tracking and use one large single megabuffer allocation and one single memcpy has been introduced. This reduces the overall amount of time spent in memcpy since large memcpys are quicker.
Rather than using just bpb for format compat, additionally check that the exact component bit layout matches since many games end up reusing RTs for unrelated textures. The texture size requirements have also been weaked to only check the resulting layer size as opposed to width/height - this is somewhat hacky but it gets around the problem of blocklinear alignment.