Commit Graph

1967 Commits

Author SHA1 Message Date
Billy Laws
1a2351386d Add u64 iova ctor 2022-11-02 17:46:07 +00:00
Billy Laws
93d43e0115 Fully fill in swizzle component mappings
Avoids the rest being default initialised to identity, which would break the intended effect of them.
2022-11-02 17:46:07 +00:00
Billy Laws
37ff0ab814 Add buffer manager support for accelerated copies
These will be sequenced on the GPU/CPU depending on what's optimal and avoid any serialisation
2022-11-02 17:46:07 +00:00
Billy Laws
cac287d9fd Implement accelerated uploads/copies through buffer manager
Previously, both I2M uploads and DMA copies would force GPU serialisation if they happened to hit a trap or were used to copy GPU dirty buffers. By using the buffer manager to implement them on the host GPU we can avoid such slowdowns entiely.
2022-11-02 17:46:07 +00:00
Billy Laws
c5ec484d9a Avoid redundantly passing executor in ctors when it's already in ChannelCtx 2022-11-02 17:46:07 +00:00
Billy Laws
463394ba72 Pass correct size for XFB buffers 2022-11-02 17:46:07 +00:00
Billy Laws
bd976676f4 Fix SNorm vertex formats 2022-11-02 17:46:07 +00:00
Billy Laws
b74098570f Zero-out unused XFB varyings before passing to hades 2022-11-02 17:46:07 +00:00
Billy Laws
22f3ba6b93 Mark XFB buffers as GPU dirty 2022-11-02 17:46:07 +00:00
Billy Laws
26aeeaecf5 Add constant buffer GPU write pipeline barrier 2022-11-02 17:46:07 +00:00
Billy Laws
0b5d9308c4 Be more careful about potentially-unneeded GPU->CPU syncs
These can be especially expensive so should be avoided as much as possible.
2022-11-02 17:46:07 +00:00
Billy Laws
e6530e2386 Delete graphics_context
F
2022-11-02 17:46:07 +00:00
Billy Laws
ac2e6c125b Switch to Roboto for Korean font 2022-11-02 17:46:07 +00:00
Billy Laws
b24a8465da Don't require depthClamp 2022-11-02 17:46:07 +00:00
Billy Laws
9055c98e09 Only enable debug/verbose logs in (rel)debug builds 2022-11-02 17:46:07 +00:00
Billy Laws
0ebdbcf0ff Don't lock stateMutex when updating buffer cycle 2022-11-02 17:46:07 +00:00
Billy Laws
dd360b8f75 Pass correct wait semaphore array size to queue submit 2022-11-02 17:46:07 +00:00
Billy Laws
c78a4b9699 Fixup buffer recreation to avoid deadlock when waiting on srcs 2022-11-02 17:46:07 +00:00
Billy Laws
d236bfe454 Enable depthClamp VK device feature 2022-11-02 17:46:07 +00:00
Billy Laws
95d849e1f6 Check FenceCycle signalled flag immediately before waiting
The lock release within the wait for submission means that another thread could end up signalling the cycle and then the VK wait still happen after when the lock has been reacquired.
2022-11-02 17:46:07 +00:00
Billy Laws
1a23b929a7 Avoid chaining cycles in buffer recreation
This had a chance of creating circular chains which obviously caused issues, just do a wait instead for now.
2022-11-02 17:46:07 +00:00
Billy Laws
a15db9cb06 Update hades submodule 2022-11-02 17:46:07 +00:00
Billy Laws
cfc55e60b0 Add robin map submodule 2022-11-02 17:46:07 +00:00
Billy Laws
6c0f084aae Introduce hack to ignore frequently read-back textures
Readback can be especially slow on mobile due to the varying load pattern it creates which often prevents the CPU/GPU from clocking up. Since some games perform texture readback but don't actually use it for anything significant implement a hack to skip it and significantly improve performance in such cases.
2022-11-02 17:46:07 +00:00
Billy Laws
e45e7546c8 Redesign buffer megabuffering
Due to the frequency at which is is called megabuffering performance is critical to the performance of the entire emulator, especially in high-drawcall-count scenarios. After the view redesign, megabuffering on a per-view level was no longer possible nor desirable, and thus megabuffering was modified to just copy for every usage of a view. This worked great at the time since there were other bottlenecks, however gpu-new has since removed almost all of them and megabuffering is now a major sore point. Fix this by megabuffering small chunks and storing them in a page-table like structure within the buffer, these chunks can be referenced by multiple views and will be smartly invalidated whenever the sequence number or execution number changes to avoid any sequencing issues. In addition to this, to help the case where almost the whole buffer is read every single frame across a set of multiple views, an optimisation to skip the chunked tracking and use one large single megabuffer allocation and one single memcpy has been introduced. This reduces the overall amount of time spent in memcpy since large memcpys are quicker.
2022-11-02 17:46:07 +00:00
Billy Laws
7ea9aa52f5 Speed up reported guest GPU time
Avoids triggering DRS in games in cases where it wouldn't actually benefit anything due to being CPU bottlenecked.
2022-11-02 17:46:07 +00:00
Billy Laws
31c2fb7d7a Fixup IDirectory read 2022-11-02 17:46:07 +00:00
Billy Laws
7491178a9e Pass base array layer to texture views 2022-11-02 17:46:07 +00:00
Billy Laws
ff57d2fbbf Enforce stronger format and weaker dimension texture compat checks
Rather than using just bpb for format compat, additionally check that the exact component bit layout matches since many games end up reusing RTs for unrelated textures. The texture size requirements have also been weaked to only check the resulting layer size as opposed to width/height - this is somewhat hacky but it gets around the problem of blocklinear alignment.
2022-11-02 17:46:07 +00:00
Billy Laws
14af383238 Only allow submitting swapchainImageCount images for host present at a time
Prevents situations where nothing would otherwise be waiting on the GPU and since presentation no longer blocks too many images would be submitted for presentation.
2022-11-02 17:46:07 +00:00
Billy Laws
bcd96ac77d Fixup A8R8G8B8 TIC format mapping
8-bit formats are inverted in TICs compared to Vulkan
2022-11-02 17:46:07 +00:00
Billy Laws
90466b8830 Implement depth clamp rasterisation state
Used in SMO for shadows.
2022-11-02 17:46:07 +00:00
Billy Laws
1cfc4278f9 Disable preserve buffer/texture attachment opt for now
Causes several issues and crashes in Pokemon without an obvious cause.
2022-11-02 17:46:07 +00:00
Billy Laws
e483cf9634 Use shader memory mirror when reading guest shaders
Avoids triggering any traps that may be present on the region
2022-11-02 17:46:07 +00:00
Billy Laws
f6e4328b5a Ensure blit src/dst textures are attached as execution cycle dependencies
Since they're not in the TIC pool they would otherwise be freed
2022-11-02 17:46:07 +00:00
Billy Laws
77a131df60 Support using in-app renderdoc API to capture individual executions 2022-11-02 17:46:07 +00:00
Billy Laws
576bc6f37e Add CommandExecutor slot count setting 2022-11-02 17:46:07 +00:00
Billy Laws
1a0819fb76 Use semaphores for presentation engine frame synchronisation
Avoids waits on the CPU which can be costly and confuse the scheduler, also reduces latency significantly.
2022-11-02 17:46:07 +00:00
Billy Laws
0670e0e0dc Support using Vulkan semaphores with fence cycles
In some cases like presentation, it may be possible to avoid waiting on the CPU by using a semaphore to indicate GPU completion. Due to the binary nature of Vulkan semaphores this requires a fair bit of code as we need to ensure semaphores are always unsignalled before they are waited on and signalled again. This is achieved with a special kind of chained cycle that can be added even after guest GPFIFO processing for a given cycle, the main cycle's semaphore can be waited and then the cycle for the wait attached to the main cycle and it will be waited on before signalling.
2022-11-02 17:46:07 +00:00
Billy Laws
5b72be88c3 Stub ldn:u service 2022-11-02 17:46:07 +00:00
Billy Laws
77d76ed05a Batch contiguous GMMU ranges into one 2022-11-02 17:46:07 +00:00
Billy Laws
e52dbf202f Pass more Maxwell3D registers into interconnect 2022-11-02 17:46:07 +00:00
Billy Laws
83c7ed314e Setup KThread pthread handle in StartThread
Avoids a race with starting the thread and the handle not being set yet
2022-11-02 17:46:07 +00:00
Billy Laws
9784ae23e9 Skip checking affinity before taking load-balance WaitScheduler path
The affinity mask may be set after the wait has began
2022-11-02 17:46:07 +00:00
Billy Laws
ad3195e06f Split out guest texture layer size calcs into a seperate func 2022-11-02 17:46:07 +00:00
Billy Laws
8fa83fdf13 Fix deswizzling non-pow2 block size formats
We need to use DivideCeil to avoid rounding off part of the texture.
Fixes texture in Nier Automata: Game of the YoRHa edition.
2022-11-02 17:46:07 +00:00
Billy Laws
27de42f8df Use surfaceClip as a hint for the underlying rendertarget size
TIC sizes may not be aligned to block linear dimensions whereas RT sizes are and then limited by the surface clip. By using this to determine surface size we are more likely to get a match in texture manager for any future usages.
2022-11-02 17:46:07 +00:00
Billy Laws
297597f697 Fix texture manager depth compat comparison 2022-11-02 17:46:07 +00:00
Billy Laws
500f817a28 Synchronize all non-matching textures back to host before recreation 2022-11-02 17:46:07 +00:00
Billy Laws
05581f2230 Remove now redundant buffer/texture/megabuffer manager locks
They have been superseeded by the global channel lock
2022-11-02 17:46:07 +00:00