Commit Graph

1821 Commits

Author SHA1 Message Date
Dima
2cdfc7640c Stub GetPreviousProgramIndex 2022-11-18 12:52:25 +00:00
Dima
360306eb61 Stub GetAddOnContentListChangedEventWithProcessId 2022-11-18 12:52:25 +00:00
Dima
3d475ca122 Stub GetAccountId 2022-11-18 12:52:25 +00:00
Dima
0b452fe36b Stub GetFriendList 2022-11-18 12:52:25 +00:00
Dima
cc37d2231d Stub CheckFreeCommunicationPermission and IsFreeCommunicationAvailable 2022-11-18 12:52:25 +00:00
Dima
ec81c97fa9 Stub TryPopFromFriendInvitationStorageChannel 2022-11-18 12:52:25 +00:00
Dima
413f162cf2 Stub some account functions 2022-11-18 12:52:25 +00:00
lynxnb
b209ae8e90 Account for stick flat area when retrieving axes value from a MotionEvent 2022-11-17 21:54:15 +01:00
lynxnb
c966220bab Zero-initialize axes history instead of using null values
Use zero initialization for axes history instead of using null values. Fixes the first axis event after launching a game being completely ignored.
2022-11-17 21:54:15 +01:00
lynxnb
3a657c44cc Don't ignore HAT axes input events
Capture HAT axes events ourselves instead of relying on the android framework to turn them into KeyCodes. Fixes handling of DPAD button presses on most controllers.
2022-11-17 21:54:15 +01:00
lynxnb
f1ec771944 Fix inverted axis polarity
In the case of axis value being zero, polarity would favor one side of the stick resulting in invalid values. Fix that by taking into account axis history when calculating polarity.
2022-11-17 21:54:15 +01:00
lynxnb
675e8dbb2e Move input handling code to a dedicated class 2022-11-17 21:54:15 +01:00
lybxlpsv
18861d73a3 Set systemUiVisibility during onResume for A11 2022-11-17 14:04:57 +01:00
Dima
262ee28611 Stub some bsd functions
Co-authored-by: Lunar-Pixel <83507264+Lunar-Pixel@users.noreply.github.com>
2022-11-15 16:24:33 +00:00
Dima
9afa8b881e Stub nsd:u/nsd:a and sfdnsres services 2022-11-15 16:24:33 +00:00
Billy Laws
01e27bd2dd Implement ldr:ro LoadModule 2022-11-15 16:23:40 +00:00
Billy Laws
e571066409 Stub ldr:ro IRoInterface
Some games initialise this service on startup however don't actually use it. Add a simple stub to allow such games to boot.
2022-11-15 16:23:40 +00:00
Billy Laws
1fc2641746 Stub the web applet 2022-11-13 11:37:18 +00:00
Billy Laws
021f82ef08 Stub ListOpenContextStoredUsers 2022-11-13 11:35:40 +00:00
Billy Laws
e7bab27d85 Fixup nvdrv channel private memory allocation
This was incorrectly allocated in words, rather than bytes, meaning that guest allocations could overwrite the private memory and break inline syncpt operations
2022-11-13 11:35:40 +00:00
Billy Laws
8b523fa1f0 Avoid inline syncpt increments sending OOB GpEntries
In cases where no wfi is required, the space where the WFI commands would go needs to be zeroed out to avoid the GPU reading uninitialised memory.
2022-11-13 11:35:40 +00:00
Billy Laws
cd0b2636e5 Prevent truncation of big page start in GetVaRegions 2022-11-13 11:35:40 +00:00
Billy Laws
f650f32bf0 Avoid duplicating NvDrv buffer unmap code 2022-11-13 11:35:40 +00:00
Billy Laws
001064b7bf Fix GraphicsBufferProducer recreation
We need to use a shared_ptr to ensure that the present callback doesn't do any UAFs, also unlocks the GBP during presentation as if the queue is full a deadlock could a rise where the present callback wouldn't be able to run due to the (waiting) DequeueBuffer thread holding the lock.
2022-11-13 11:35:40 +00:00
Billy Laws
29e89a3950 Fix crashes when opening non-existent directories 2022-11-13 11:35:40 +00:00
Billy Laws
ec139b3027 Fixup CancelBuffer fence handling 2022-11-13 11:35:40 +00:00
Billy Laws
7f24c7b857 Store KMemory object ptrs in memory class to avoid linear-time unmap
This is quite a horrible solution but fixing it properly would require a whole rewrite of how we handle memory.
2022-11-13 11:35:15 +00:00
lynxnb
cc71e7b56c Run emulation in a separate process
Exiting from emulation has always been a big issue for Skyline, with guest and host threads that would keep running in the background unless the app was manually killed. Running emulation in a separate process allows us to kill it when we are done, avoiding the need for complex exiting management code.
2022-11-11 11:49:33 +00:00
lynxnb
281562ccdb Fix FABs ripple effect in OnScreenEditActivity 2022-11-09 23:07:23 +05:30
lynxnb
56f6f8a362 Reword the unsupported gpu drivers message
The old message was being misinterpreted as if the device's gpu was not supported by the emulator. Reword that message to explicitly mention custom drivers.
2022-11-09 23:07:23 +05:30
lynxnb
e2a5da1d67 Fix AppDialog layout
* Add a drag indicator at the top
* Fix flex layout wrapping when buttons didn't fit on a single line
* Fix BottomSheetDialog peek height too small on landscape orientation
* General cleanup of the layout
2022-11-09 23:07:23 +05:30
lynxnb
4146261069 Create a unified style for section titles 2022-11-09 23:07:23 +05:30
lynxnb
86364a84a2 Allow content to be drawn behind the navigation bar 2022-11-09 23:07:23 +05:30
lynxnb
6a6e89f070 Make BottomSheetDialog go fullscreen when fully expanded 2022-11-09 23:07:23 +05:30
lynxnb
f93d3b78d3 Add a drag indicator element at the top of LicenceDialog
A new `DragIndicatorView` had been introduced, which draws a small drag handle element. When used inside a `BottomSheetDialog`, this view will add a callback for hiding the indicator when the dialog is fully expanded.
2022-11-09 23:07:23 +05:30
lynxnb
6848e69638 Improve design consistency across the app
Game images, buttons and dialogs now have a consistent corner radius, across all game list layouts.
2022-11-09 23:07:23 +05:30
lynxnb
f63fdf26c9 Partially revert 'Make all Dialogs use @color/backgroundColor as the background color'
This partially reverts commit 36a1f2a2ec.
2022-11-09 23:07:23 +05:30
lynxnb
dec04db647 AppListItem misc tweaks
* Restore text marquee on all layouts
* Text size and color tweaks
* List layout image has round corners
* Clean up unneeded attributes
2022-11-09 23:07:23 +05:30
lynxnb
5c76a57e6e Use NestedScrollView for licence dialogs and minor layout tweaks 2022-11-09 23:07:23 +05:30
Billy Laws
388245789f Restructure ConditionalVariableSignal to avoid potential deadlock
Since InsertThread can block for paused threads, we need to ensure we unlock syncWaiterMutex when calling it.
2022-11-09 23:02:26 +05:30
PixelyIon
f4a8328cef Implement Symbol Hooking
Symbol hooking is required for HLE implementations of certain features in the future such as `nvdec` and for more in-depth debugging of games as we can inspect them on a SDK function level which allows us to debug issues far more easily.
2022-11-07 23:56:22 +05:30
PixelyIon
8892eb08e6 Fix MoveRegister to clear when value is 0
The register wouldn't be cleared with a `MOVZ` when a value was zero due to the condition for writing an instruction requiring the `offsetValue` to be non-zero.
2022-11-07 23:56:22 +05:30
Billy Laws
f7ab3abb86 Allow load balancing when waiting on condvars 2022-11-06 20:47:26 +00:00
german77
b6e2fb894c service: bcat: Stub CreateDeliveryCacheStorageService 2022-11-06 20:39:41 +00:00
Billy Laws
80b65d5094 Update XDR names 2022-11-03 22:53:01 +00:00
Billy Laws
026bb04386 Impl some more texture formats 2022-11-02 17:46:07 +00:00
Billy Laws
133f08ed14 Stash new register value before executing deferred draws/updates
Since the register writes technically happen after the draw, issues can occur if they happen before: e.g. skyrim updates ctSelect and disables all RTs after a draw, but this would happen before it previously and crash the driver.
2022-11-02 17:46:07 +00:00
Billy Laws
c50852e546 Implement the draw(...)BeginEnd Maxwell3D draw registers
Used by guest Vulkan games and nouveau.
2022-11-02 17:46:07 +00:00
Billy Laws
270ef3e0d2 Implement GPFIFO semaphore acquire operations 2022-11-02 17:46:07 +00:00
Billy Laws
2ce146e28f Don't crash on the Grp0SetSubDevMask TertOp
Used by Vulkan games to set the SLI mask, not applicable to the switch.
2022-11-02 17:46:07 +00:00
Billy Laws
1d83dadefb Drop size restruction bypass for frequently synced buffers
In cases where large buffers are updated every draw this could seriously increase memory usage beyond 3GB in the megabuffer.
2022-11-02 17:46:07 +00:00
Billy Laws
1088ed514c Introduce texture usage system to ensure RPs are split when necessary
Vulkan doesn't allow sampling a texture and using it as an RT in the same RP, by tracking the texture usage status and splitting RPs when this occurs we can avoid such potential sync errors.
2022-11-02 17:46:07 +00:00
Billy Laws
2dd4698441 Adjust texture matching hacks 2022-11-02 17:46:07 +00:00
Billy Laws
4f5c9047ef Add some additional texture formats used by Vulkan games 2022-11-02 17:46:07 +00:00
Billy Laws
6a830dfac5 Use shader-compiler side {S,U}Scaled format emulation 2022-11-02 17:46:07 +00:00
Billy Laws
579fd04117 Fixup ReadTextureType shader compiler callback 2022-11-02 17:46:07 +00:00
Billy Laws
b04d18eba5 Add support for split mappings to I2M uploads
Used by Super Mario Sunshine and other Vulkan games.
2022-11-02 17:46:07 +00:00
Billy Laws
db5e208379 Clear images even when aspects mismatch 2022-11-02 17:46:07 +00:00
Billy Laws
3c8df327f1 Fixup subpass barriers and flags 2022-11-02 17:46:07 +00:00
Billy Laws
5ab80901c6 Drop some debug code 2022-11-02 17:46:07 +00:00
Billy Laws
4de89c8839 GPU NEW MARGEBAC 2022-11-02 17:46:07 +00:00
Billy Laws
7670c83405 Ensure textures are clean before paging them out 2022-11-02 17:46:07 +00:00
Billy Laws
1a2351386d Add u64 iova ctor 2022-11-02 17:46:07 +00:00
Billy Laws
93d43e0115 Fully fill in swizzle component mappings
Avoids the rest being default initialised to identity, which would break the intended effect of them.
2022-11-02 17:46:07 +00:00
Billy Laws
37ff0ab814 Add buffer manager support for accelerated copies
These will be sequenced on the GPU/CPU depending on what's optimal and avoid any serialisation
2022-11-02 17:46:07 +00:00
Billy Laws
cac287d9fd Implement accelerated uploads/copies through buffer manager
Previously, both I2M uploads and DMA copies would force GPU serialisation if they happened to hit a trap or were used to copy GPU dirty buffers. By using the buffer manager to implement them on the host GPU we can avoid such slowdowns entiely.
2022-11-02 17:46:07 +00:00
Billy Laws
c5ec484d9a Avoid redundantly passing executor in ctors when it's already in ChannelCtx 2022-11-02 17:46:07 +00:00
Billy Laws
463394ba72 Pass correct size for XFB buffers 2022-11-02 17:46:07 +00:00
Billy Laws
bd976676f4 Fix SNorm vertex formats 2022-11-02 17:46:07 +00:00
Billy Laws
b74098570f Zero-out unused XFB varyings before passing to hades 2022-11-02 17:46:07 +00:00
Billy Laws
22f3ba6b93 Mark XFB buffers as GPU dirty 2022-11-02 17:46:07 +00:00
Billy Laws
26aeeaecf5 Add constant buffer GPU write pipeline barrier 2022-11-02 17:46:07 +00:00
Billy Laws
0b5d9308c4 Be more careful about potentially-unneeded GPU->CPU syncs
These can be especially expensive so should be avoided as much as possible.
2022-11-02 17:46:07 +00:00
Billy Laws
e6530e2386 Delete graphics_context
F
2022-11-02 17:46:07 +00:00
Billy Laws
ac2e6c125b Switch to Roboto for Korean font 2022-11-02 17:46:07 +00:00
Billy Laws
b24a8465da Don't require depthClamp 2022-11-02 17:46:07 +00:00
Billy Laws
9055c98e09 Only enable debug/verbose logs in (rel)debug builds 2022-11-02 17:46:07 +00:00
Billy Laws
0ebdbcf0ff Don't lock stateMutex when updating buffer cycle 2022-11-02 17:46:07 +00:00
Billy Laws
dd360b8f75 Pass correct wait semaphore array size to queue submit 2022-11-02 17:46:07 +00:00
Billy Laws
c78a4b9699 Fixup buffer recreation to avoid deadlock when waiting on srcs 2022-11-02 17:46:07 +00:00
Billy Laws
d236bfe454 Enable depthClamp VK device feature 2022-11-02 17:46:07 +00:00
Billy Laws
95d849e1f6 Check FenceCycle signalled flag immediately before waiting
The lock release within the wait for submission means that another thread could end up signalling the cycle and then the VK wait still happen after when the lock has been reacquired.
2022-11-02 17:46:07 +00:00
Billy Laws
1a23b929a7 Avoid chaining cycles in buffer recreation
This had a chance of creating circular chains which obviously caused issues, just do a wait instead for now.
2022-11-02 17:46:07 +00:00
Billy Laws
6c0f084aae Introduce hack to ignore frequently read-back textures
Readback can be especially slow on mobile due to the varying load pattern it creates which often prevents the CPU/GPU from clocking up. Since some games perform texture readback but don't actually use it for anything significant implement a hack to skip it and significantly improve performance in such cases.
2022-11-02 17:46:07 +00:00
Billy Laws
e45e7546c8 Redesign buffer megabuffering
Due to the frequency at which is is called megabuffering performance is critical to the performance of the entire emulator, especially in high-drawcall-count scenarios. After the view redesign, megabuffering on a per-view level was no longer possible nor desirable, and thus megabuffering was modified to just copy for every usage of a view. This worked great at the time since there were other bottlenecks, however gpu-new has since removed almost all of them and megabuffering is now a major sore point. Fix this by megabuffering small chunks and storing them in a page-table like structure within the buffer, these chunks can be referenced by multiple views and will be smartly invalidated whenever the sequence number or execution number changes to avoid any sequencing issues. In addition to this, to help the case where almost the whole buffer is read every single frame across a set of multiple views, an optimisation to skip the chunked tracking and use one large single megabuffer allocation and one single memcpy has been introduced. This reduces the overall amount of time spent in memcpy since large memcpys are quicker.
2022-11-02 17:46:07 +00:00
Billy Laws
7ea9aa52f5 Speed up reported guest GPU time
Avoids triggering DRS in games in cases where it wouldn't actually benefit anything due to being CPU bottlenecked.
2022-11-02 17:46:07 +00:00
Billy Laws
31c2fb7d7a Fixup IDirectory read 2022-11-02 17:46:07 +00:00
Billy Laws
7491178a9e Pass base array layer to texture views 2022-11-02 17:46:07 +00:00
Billy Laws
ff57d2fbbf Enforce stronger format and weaker dimension texture compat checks
Rather than using just bpb for format compat, additionally check that the exact component bit layout matches since many games end up reusing RTs for unrelated textures. The texture size requirements have also been weaked to only check the resulting layer size as opposed to width/height - this is somewhat hacky but it gets around the problem of blocklinear alignment.
2022-11-02 17:46:07 +00:00
Billy Laws
14af383238 Only allow submitting swapchainImageCount images for host present at a time
Prevents situations where nothing would otherwise be waiting on the GPU and since presentation no longer blocks too many images would be submitted for presentation.
2022-11-02 17:46:07 +00:00
Billy Laws
bcd96ac77d Fixup A8R8G8B8 TIC format mapping
8-bit formats are inverted in TICs compared to Vulkan
2022-11-02 17:46:07 +00:00
Billy Laws
90466b8830 Implement depth clamp rasterisation state
Used in SMO for shadows.
2022-11-02 17:46:07 +00:00
Billy Laws
1cfc4278f9 Disable preserve buffer/texture attachment opt for now
Causes several issues and crashes in Pokemon without an obvious cause.
2022-11-02 17:46:07 +00:00
Billy Laws
e483cf9634 Use shader memory mirror when reading guest shaders
Avoids triggering any traps that may be present on the region
2022-11-02 17:46:07 +00:00
Billy Laws
f6e4328b5a Ensure blit src/dst textures are attached as execution cycle dependencies
Since they're not in the TIC pool they would otherwise be freed
2022-11-02 17:46:07 +00:00
Billy Laws
77a131df60 Support using in-app renderdoc API to capture individual executions 2022-11-02 17:46:07 +00:00
Billy Laws
576bc6f37e Add CommandExecutor slot count setting 2022-11-02 17:46:07 +00:00
Billy Laws
1a0819fb76 Use semaphores for presentation engine frame synchronisation
Avoids waits on the CPU which can be costly and confuse the scheduler, also reduces latency significantly.
2022-11-02 17:46:07 +00:00
Billy Laws
0670e0e0dc Support using Vulkan semaphores with fence cycles
In some cases like presentation, it may be possible to avoid waiting on the CPU by using a semaphore to indicate GPU completion. Due to the binary nature of Vulkan semaphores this requires a fair bit of code as we need to ensure semaphores are always unsignalled before they are waited on and signalled again. This is achieved with a special kind of chained cycle that can be added even after guest GPFIFO processing for a given cycle, the main cycle's semaphore can be waited and then the cycle for the wait attached to the main cycle and it will be waited on before signalling.
2022-11-02 17:46:07 +00:00
Billy Laws
5b72be88c3 Stub ldn:u service 2022-11-02 17:46:07 +00:00
Billy Laws
77d76ed05a Batch contiguous GMMU ranges into one 2022-11-02 17:46:07 +00:00
Billy Laws
e52dbf202f Pass more Maxwell3D registers into interconnect 2022-11-02 17:46:07 +00:00
Billy Laws
83c7ed314e Setup KThread pthread handle in StartThread
Avoids a race with starting the thread and the handle not being set yet
2022-11-02 17:46:07 +00:00
Billy Laws
9784ae23e9 Skip checking affinity before taking load-balance WaitScheduler path
The affinity mask may be set after the wait has began
2022-11-02 17:46:07 +00:00
Billy Laws
ad3195e06f Split out guest texture layer size calcs into a seperate func 2022-11-02 17:46:07 +00:00
Billy Laws
8fa83fdf13 Fix deswizzling non-pow2 block size formats
We need to use DivideCeil to avoid rounding off part of the texture.
Fixes texture in Nier Automata: Game of the YoRHa edition.
2022-11-02 17:46:07 +00:00
Billy Laws
27de42f8df Use surfaceClip as a hint for the underlying rendertarget size
TIC sizes may not be aligned to block linear dimensions whereas RT sizes are and then limited by the surface clip. By using this to determine surface size we are more likely to get a match in texture manager for any future usages.
2022-11-02 17:46:07 +00:00
Billy Laws
297597f697 Fix texture manager depth compat comparison 2022-11-02 17:46:07 +00:00
Billy Laws
500f817a28 Synchronize all non-matching textures back to host before recreation 2022-11-02 17:46:07 +00:00
Billy Laws
05581f2230 Remove now redundant buffer/texture/megabuffer manager locks
They have been superseeded by the global channel lock
2022-11-02 17:46:07 +00:00
Billy Laws
f5a141a621 Add dirty resource operator* 2022-11-02 17:46:07 +00:00
Billy Laws
b72720e8db Finish off transform feedback implementation 2022-11-02 17:46:07 +00:00
Billy Laws
36fd885b49 Pack all draw state into a struct to avoid std::function allocations 2022-11-02 17:46:07 +00:00
Billy Laws
b5d0060c3f Only use scissor for clear rect when enabled 2022-11-02 17:46:07 +00:00
Billy Laws
f93df35e6c Only set line width when wideLines feature is supported 2022-11-02 17:46:07 +00:00
Billy Laws
4cebdfc8d3 Pass texture and cbuf state into pipeline manager for hades callbacks 2022-11-02 17:46:07 +00:00
Billy Laws
9ce848d4e0 Implement descriptor update batching and push descriptors
Batching helps to avoid the need to attach so many objects to the fence cycle, which ends up taking a fair bit of time due to the allocation required.
2022-11-02 17:46:07 +00:00
Billy Laws
62a165b51e Reformat maxwell3d interconnect codebase 2022-11-02 17:46:07 +00:00
Billy Laws
3766be59e7 Zero out vertex attribute state when disabled to avoid creating redundant pipelines 2022-11-02 17:46:07 +00:00
Billy Laws
751e3356e1 Keep shader trap lock held for the duration of an execution
Avoids constant relocking on the GPFIFO thread (~0.5% of total time)
2022-11-02 17:46:07 +00:00
Billy Laws
314a9bccbc Allow megabuffering readonly SSBOs 2022-11-02 17:46:07 +00:00
Billy Laws
4c2db0ba01 Implement ReadCbufValue and ReadTextureType hades callbacks
Used for bindless and BRX instruction emulation.
2022-11-02 17:46:07 +00:00
Billy Laws
2163f8cde6 Implement alpha test pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
c86ad638c4 Keep track of transform feedback varyings pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
7ad2d94345 Zero out blend state when disabled to avoid creating redundant pipelines 2022-11-02 17:46:07 +00:00
Billy Laws
4052a93051 Force non-pushdescriptors for blit helper shader 2022-11-02 17:46:07 +00:00
Billy Laws
cb2a8c6d24 Enable wideLines Vulkan feature 2022-11-02 17:46:07 +00:00
Billy Laws
a3369637a9 Don't entirely wipe out per-index TIC cache efter each execution
Keep a copy of the old TIC entry and view even after purge caches and use the execution number to check validity instead, if that doesn't match then just memcmp can be used as opposed to a full hash and map lookup.
2022-11-02 17:46:07 +00:00
Billy Laws
98c0cc3e7f Impl preserve attached buffers/textures to avoid GPFIFO lock thrashing
When profiling SMO, it became obvious that the constant locking of textures and buffers in SyncDescriptors took up a large amount of CPU time (3-5%), a precious resource in intensive areas like Metro. This commit implements somewhat of a workaround to avoid constant relocking, if a buffer is frequently attached on the GPU and almost never used on the CPU we can keep the lock held between executions. Of course it's not that simple though, if the guest tries to lock a texture for the first time which has already been locked as preserve on the GPFIFO we need to avoid a deadlock. This is acheived through a combination of two things: first we periodically clear the locked attachments every 2*SlotCount submissions, preventing a complete deadlock on the CPU (just a long wait instead) and meaning that the next time the resource is attached on the GPU it will not be marked for preservation due to having been locked on the guest before; second, we always need to unlock everything when the GPU thread runs out of work, as the perioding clearing will not execute in this case which would otherwise leave the textures locked on the GPFIFO thread forever (if guest was waiting on a lock to submit work). It should be noted that we don't clear preserve attached resources in the latter scenario, only unlock them and then relock when more work is available.
2022-11-02 17:46:07 +00:00
Billy Laws
0e8ccf1e99 Use memory_order_release for new descriptor set allocations 2022-11-02 17:46:07 +00:00
Billy Laws
34db5097da Avoid using a shared_ptr reference to cycle for command buffer submission
Somehow without this we can sometimes get crashes during appending to circular queue
2022-11-02 17:46:07 +00:00
Billy Laws
0428e8c7da Support forcing regular descriptor sets in VK pipeline cache 2022-11-02 17:46:07 +00:00
Billy Laws
0f394d516b Lock FenceCycle inbetween waiting on chained cycles and checking signalled
Avoids one race where we would end up hogging all the locks of chained cycles and ourself when waiting for submission of previous cycles and prevent any forward progress due to another thread locking one of the chained cycles.
2022-11-02 17:46:07 +00:00
Billy Laws
a015fe753d Only write npad controllerInfo entry on the HID thread if it is valid 2022-11-02 17:46:07 +00:00
Billy Laws
b5446846f7 Stub IsSixAxisSensorAtRest 2022-11-02 17:46:07 +00:00
Billy Laws
6719572b3b Keep track of how often textures/buffers are locked on the CPU
For the upcoming preserve attachment optimisation, which will keep buffers/textures locked on the GPU between executions, we don't want to preserve any which are frequently locked on the CPU as that would result in lots of needless waiting for a resource to be unlocked by the GPU when it occasionally frees all preserve attachments when it could have been done much sooner.  By checking if a resource has ever been locked on the CPU and using that to choose whether we preserve it we can avoid such waiting.
2022-11-02 17:46:07 +00:00
Billy Laws
993ffb56f4 Avoid waiting on texture/buffer fence with trapMutex locked
This could cause deadlocks under certain circumstances by preventing the GPFIFO from making any progress while also waiting on it at the same time.
2022-11-02 17:46:07 +00:00
Billy Laws
3e8bd26978 Add a global gm20b channel lock
Allowing for parallel execution of channels never really benefitted many games and prevented optimisations such as keeping frequently used resources always locked to avoid the constant overhead of locking on the hot path.
2022-11-02 17:46:07 +00:00
Billy Laws
57a4699bd1 Add IOCTL trace events 2022-11-02 17:46:07 +00:00
Billy Laws
7861968c05 Fix memory::Buffer move constructor 2022-11-02 17:46:07 +00:00
Billy Laws
ef0ae30667 Implement Maxwell3D texture pool management and view creation
Ontop of the TIC cache from previous code a simple index based lookup has been added which vastly speeds things up by avoding the need to hash the TIC structure every time.
2022-11-02 17:46:07 +00:00
Billy Laws
5542459c75 Use a SpinLock for guest shader code cache trap mutex 2022-11-02 17:46:07 +00:00
Billy Laws
3e12cde4d5 Make active Vulkan pipeline public 2022-11-02 17:46:07 +00:00
Billy Laws
2556966ec5 Don't attach textures to the active cycle in AttachTexture 2022-11-02 17:46:07 +00:00
Billy Laws
7dc3dde815 Introduce support for waiting for submission to FenceCycle
Introducing async record resulted in breaking the assumption that any work submitted through command scheduler would be submitted in order with graphics submits. Since async record now unlocks the texture before it's submitted a seperate mechanism is needed to ensure ordering of submits. This is achieved by building support into fence cycle itself, with a conditional variable that is waited on for submission before any fence waits occur.
2022-11-02 17:46:07 +00:00
Billy Laws
54b85583ae Fix layout transition in Texture::CopyFrom 2022-11-02 17:46:07 +00:00
Billy Laws
0f7c04ffb4 Use target format bpb when calculating linear mip level size 2022-11-02 17:46:07 +00:00
Billy Laws
849184452c Add function to check if any guest texture mappings are unmapped 2022-11-02 17:46:07 +00:00
Billy Laws
98cb94ca6c Bind an empty uniform buffer in place of unbound constant buffers 2022-11-02 17:46:07 +00:00
Billy Laws
55b85d0691 Implement combined image samplers and make descriptor code common between quick/normal updates 2022-11-02 17:46:07 +00:00
Billy Laws
ccf2d59351 Fixup input rate reading from packed pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
13970a5644 Refresh pipeline cached storage buffer bindings after each execution 2022-11-02 17:46:07 +00:00
Billy Laws
6bb2853ca0 Keep track of combined image samplers for quick bind 2022-11-02 17:46:07 +00:00
Billy Laws
040db37a28 Fix descriptor copies to be one per descriptor type 2022-11-02 17:46:07 +00:00
Billy Laws
d482e0ea98 Fix shader stage iteration to not miss the pixel stage 2022-11-02 17:46:07 +00:00
Billy Laws
0b808cc22b Use Sint/Uint attribute type in place of Sscaled/Uscaled
Scaled formats are not supported by any mobile GPUs.
2022-11-02 17:46:07 +00:00
Billy Laws
92ce220d3a Ignore constant buffer selector size for updates
Clustertruck performs a giant (0x3000 byte) update with a selector size of only 0x500, without this the update would only partially go through
2022-11-02 17:46:07 +00:00
Billy Laws
33f16ca26e Handle unmapped blocks in CachedMappedBufferView 2022-11-02 17:46:07 +00:00
Billy Laws
5c0e4a839d Fix SW BC2 decoding pitch 2022-11-02 17:46:07 +00:00
Billy Laws
60863fa162 Make viewports fallback to viewport 0 when their dimensions are invalid
OpenGL games use a zero width/height for viewports 1-15 when they're unused
2022-11-02 17:46:07 +00:00
Billy Laws
586f872655 Update indexed quad conversion for new API 2022-11-02 17:46:07 +00:00
Billy Laws
498b4966d3 Avoid crashing on unmapped buffers
Just log a warning instead
2022-11-02 17:46:07 +00:00
Billy Laws
9f2b20443b Implement Maxwell3D draws 2022-11-02 17:46:07 +00:00
Billy Laws
5020478ace Always rebind pipeline if it has changed from the previous draw 2022-11-02 17:46:07 +00:00
Billy Laws
af1b4ca4f8 Skip clears if attachments are invalid 2022-11-02 17:46:07 +00:00
Billy Laws
01a5e95ce1 Implement non-indexed quad conversion support in new GPU
Uses memory::Buffer directly to allow for cleaner recreation and allow for removing host-only buffers from Buffer in the future.
2022-11-02 17:46:07 +00:00
Billy Laws
d42814bdc1 Add callback for when non-Maxwell3D engines alter pipeline state
This is required so that Maxwell3D knows to restore all dynamic state before the next draw
2022-11-02 17:46:07 +00:00
Billy Laws
7133c5d6b3 Drop exclusiveSubpass in favour of only ending RPs when attachments change
Significantly helps Mali performance by avoiding creating an excessive number of RPs.
2022-11-02 17:46:07 +00:00
Billy Laws
a3f38c0cf7 Add perfetto tracepoints to async record 2022-11-02 17:46:07 +00:00
Billy Laws
6dfef095e8 Up default record slot count to 6 2022-11-02 17:46:07 +00:00
Billy Laws
7d3a117a6f Use a spinlock for descriptor set allocator 2022-11-02 17:46:07 +00:00
Billy Laws
78ddd03d1f Mark newly allocated descriptor slots as active 2022-11-02 17:46:07 +00:00
Billy Laws
0867c593be Support binding pipelines in state updater 2022-11-02 17:46:07 +00:00
Billy Laws
f42a0df72c Use ctSelect register for colour targets and impl null color attachments
Some games have invalid colour attachments sandwiched inbetween valid ones, these need to be skipped in Vulkan with UNUSED_ATTACHMENT
2022-11-02 17:46:07 +00:00
Billy Laws
38aad21d29 Share single flag variable for Maxwell3D batch draw/constant buffer update
Slightly cheaper
2022-11-02 17:46:07 +00:00
Billy Laws
bd7eee8e2b Optimise GPFIFO command processing
GPFIFO code is very high throughput due to the sheer number of commands used for rendering. Adjust some types and switch to a if statement with hints to slightly increase processing speed.
2022-11-02 17:46:07 +00:00
Billy Laws
2cdf6c1fe6 Add branch hint attributes to a couple branches 2022-11-02 17:46:07 +00:00
Billy Laws
b310b99bdc Handle unmapped ranges in TranslateRange 2022-11-02 17:46:07 +00:00
Billy Laws
acfa58ea8c State uopdater MERGEBACK desC 2022-11-02 17:46:07 +00:00
Billy Laws
68b6b20f78 Bunch of cleanup/bugfixes for initial pipeline desc set impl
Cleans up code slightly, fixes keeping track of per-stage descriptor counts and fixes storage buffer caching + more random stuff.
2022-11-02 17:46:07 +00:00
Billy Laws
eb4a9bab11 Mark freshly allocated descriptor slots as active 2022-11-02 17:46:07 +00:00
Billy Laws
f3184cdff1 Reorder active descriptor set slots to end of list
Speeds up allocations by significantly reducing the number of needed atomic ops per alloc. Could be optimised further in the future if needed.
2022-11-02 17:46:07 +00:00
Billy Laws
128b68d8b2 Avoid resetting command buffers manually it's implicit
Somewhat costly on adreno, and with the one time submit flag the reset is implicit for the next beginCommandBuffers call.
2022-11-02 17:46:07 +00:00
Billy Laws
e1717ed811 Implement Maxwell samplers 2022-11-02 17:46:07 +00:00
Billy Laws
f1600f5ad0 Support allocating into spans in the linear allocator 2022-11-02 17:46:07 +00:00
Billy Laws
04cea9239f Implement descriptor set updating through StateUpdater
Will automatically resolve buffer views at record-time into descriptor write/copy structures then apply the write and bind the set.
2022-11-02 17:46:07 +00:00
Billy Laws
d174ca950b Revert "Reset executor command buffers asynchronously"
This reverts commit fc7956df4ff56fdb2afc4b2bb0bbca82196179ca.
2022-11-02 17:46:07 +00:00
Billy Laws
2bbe975ea7 Reset executor command buffers asynchronously
This took a little while to do on qcom drivers, moving it to the cycle waiter thread gives a tiny speedup.
2022-11-02 17:46:07 +00:00
Billy Laws
054d32567d Allow mutation of input data by callback in CircularQueue::AppendTranform 2022-11-02 17:46:07 +00:00
Billy Laws
7c9212743c Implement asynchronous command recording
Recording of command nodes into Vulkan command buffers is very easily parallelisable as it can effectively be treated as part of the GPU execution, which is inherently async. By moving it to a seperate thread we can shave off about 20% of GPFIFO execution time. It should be noted that the command scheduler command buffer infra is no longer used, since we need to record texture updates on the GPFIFO thread (while another slot is being recorded on the record thread) and then use the same command buffer on the record thread later. This ends up requiring a pool per slot, which is reasonable considering we only have four slots by default.
2022-11-02 17:46:07 +00:00
Billy Laws
a197dd2b28 Allow for creating signalled fence cycles 2022-11-02 17:46:07 +00:00
Billy Laws
542651232b Add a mutex to allow preventing buffer recreation 2022-11-02 17:46:07 +00:00
Billy Laws
379b4f163d Implement popping from CircularQueue 2022-11-02 17:46:07 +00:00
Billy Laws
6d9dc9c6fb Implement some more of Draw
Now performs VK state updates on execution and takes advantage of quick descriptor binding.
2022-11-02 17:46:07 +00:00
Billy Laws
3b26f4f48a Expose way to check inter-pipeline descriptor compatibility
Allows users to skip/use quick descriptor sync even after switching pipelines.
2022-11-02 17:46:07 +00:00
Billy Laws
943a38e168 Implement StateUpdater for rapid recording of VK state updates
Using command executor for each state individual update was found to be infeasible due to the shear number of state updates per draw and it relying on per-node heap allocations. Instead this commit takes advantage of each state update being used only once to implement a system of linearly-allocated state update commands that are linked together. After setting up all draw state with StateUpdateBuilder, the built StateUpdater can then be used in the execution phase to record all of the draw state into the command buffer with almost zero ovehead.
2022-11-02 17:46:07 +00:00
Billy Laws
7b4da52445 Add a fast binding sync path for when only one cbuf has changed
SMO implements instanced draws by repeating the same draw just with a different constant buffer bound. Reduce the cost of this significantly by detecting such cases and instead of processing every descriptor, copy the previous descriptor set and update only the ones affected by the bound constant buffer.

Credits to ripinperiperi for the initial idea and making me aware of how SMO does these draws
2022-11-02 17:46:07 +00:00
Billy Laws
89edd9b303 Reset megabuffer binding for disabled vertex buffers 2022-11-02 17:46:07 +00:00
Billy Laws
6a1615a104 Expose color and depth attachments to Draw 2022-11-02 17:46:07 +00:00
Billy Laws
aae957819e Simplify BufferView locking by requiring buffer manager be locked
Avoids the need to repeat the lookup after locking since recreations are impossible if buffer manager is locked.
2022-11-02 17:46:07 +00:00
Billy Laws
9449b52f36 Reduce minimum megabuffer alignment to 128 bytes 2022-11-02 17:46:07 +00:00
Billy Laws
b3cf9c40ba Update megabuffer execution/sequence numbers after updating an allocation 2022-11-02 17:46:07 +00:00
Billy Laws
4b2b6fc6e9 Avoid calling SynchronizeGuest when attempting to megabuffer unless necessary 2022-11-02 17:46:07 +00:00
Billy Laws
e5919e84a1 Pipeline state if statment cleanups 2022-11-02 17:46:07 +00:00
Billy Laws
bf536aa168 Sync pipeline descriptors every draw 2022-11-02 17:46:07 +00:00
Billy Laws
9223d7f524 Fix descriptor initialisation order
They need to be setup before the pipeline is created to avoid passing in garbage data.
2022-11-02 17:46:07 +00:00
Billy Laws
4652cc5a0a Avoid parsing descriptors for disabled shader stages 2022-11-02 17:46:07 +00:00
Billy Laws
3456fb39fa Fix pipeline to shader stage conversion when filling in shader infos
The two vertex pipeline stages need to be both treated as a single stage, and all subsequent stages need to be offset by -1
2022-11-02 17:46:07 +00:00
Billy Laws
a9213debc7 Implement constant buffer reading 2022-11-02 17:46:07 +00:00
Billy Laws
afcfe8a7fa Don't update scissor state >0 unless multiview is supported 2022-11-02 17:46:07 +00:00
Billy Laws
55d77b7eb0 Update user code for new megabuffering 2022-11-02 17:46:07 +00:00
Billy Laws
cc776ae395 Keep track of an 'execution number' in CommandExecutor
Allows users to efficiently cache resources that are valid for only one execution without resorting to callbacks.
2022-11-02 17:46:07 +00:00
Billy Laws
99a34df4cc Avoid trapping frequently synced buffers by using megabuffer copies
When a buffer is trapped nearly every frame, the cost of trapping and synchronising its contents starts to quickly add up. By always using the megabuffer when this is the case, since megabuffer copies are done directly from the guest, we skip the need to synchronise/trap the backing.
2022-11-02 17:46:07 +00:00
Billy Laws
a24aec03a6 Rework per-view megabuffering to cache allocs in the buffer itself
The original intention was to cache on the user side, but especially with shader constant buffers that's difficult and costly. Instead we can cache on the buffer side, with a page-table like structure to hold variable sized allocations indexed by the aligned view base address. This avoids most redundant copies from repeated use of the same buffer without updates inbetween.
2022-11-02 17:46:07 +00:00
Billy Laws
b810470601 Invalidate HLE macro state on macro updates 2022-11-02 17:46:07 +00:00
Billy Laws
2360ca24da Implement constant buffer and storage buffer pipeline descriptor types 2022-11-02 17:46:07 +00:00
Billy Laws
25255b01c7 Keep track of more pipeline descriptor information
This is needed in order to allow allocating per-draw descriptor arrays without recounting their lengths each time
2022-11-02 17:46:07 +00:00
Billy Laws
ad0275dbef Expose active pipeline for access by Maxwell3D class 2022-11-02 17:46:07 +00:00
Billy Laws
6e22373b59 Add array support to AllocateUntracked 2022-11-02 17:46:07 +00:00
Billy Laws
388cff3353 Implement simple pipeline transition cache
Avoids the need to hash PipelineState when we can guess the pipeline that will be used next. This could very easily be optimised in the future with generational, usage-based caching if necessary.
2022-11-02 17:46:07 +00:00
Billy Laws
302b2fcc3f Force flush when dirty refresh returns true 2022-11-02 17:46:07 +00:00
Billy Laws
ec4ea5c5d7 Supply dispatcher manually for shader creation 2022-11-02 17:46:07 +00:00
Billy Laws
3404a3abdb Implement macro HLE for instanced draw macros
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
2022-11-02 17:46:07 +00:00
Billy Laws
cf0752f937 Use NCE memory tracking for guest shaders
Prevents needing to hash them for every single pipeline state update, without this just hashing shaders takes up a significant amount of time.
2022-11-02 17:46:07 +00:00
Billy Laws
19a75c3f65 Bind all pipeline states to main pipeline dirty state 2022-11-02 17:46:07 +00:00
Billy Laws
a04d8fb5cf Setup minimal viewport Vulkan pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
fe51db366b Mark all dirty resources as dirty initially 2022-11-02 17:46:07 +00:00
Billy Laws
abfa5929f1 Treat vertex buffers with base addr 0 as disabled 2022-11-02 17:46:07 +00:00
Billy Laws
e71ca05f19 Avoid bitfields for signed enum types in PackedPipelineState 2022-11-02 17:46:07 +00:00
Billy Laws
2f2b615780 Add dynamic state support to VK graphics pipeline cache 2022-11-02 17:46:07 +00:00
Billy Laws
ad045058ee Cleanup some redundant comments and includes in pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
9b05c9c0c3 Introduce a pipeline manager and partial pipeline object
gpu-new will use a monolithic pipeline object for each pipeline to store state, keyed by the PackedPipelineState contents. This allows for a greater level of per-pipeline optimisations and a reduction in the overall number of lookups in a draw compared to the previous system.
2022-11-02 17:46:07 +00:00
Billy Laws
2c1b40c9a8 Add some missed pipeline state members used by Hades 2022-11-02 17:46:07 +00:00
Billy Laws
0c6fa22c6b Introduce pipeline shader stage state
Simple state that generates a hash that can be used in the packed state and spans over the binary for pipeline creation.
2022-11-02 17:46:07 +00:00
Billy Laws
a6bb716123 Move packed pipeline state to a seperate file 2022-11-02 17:46:07 +00:00
Billy Laws
4dcbf5c3a0 Drop the caching aspect of shader manager entirely
Caching here was deemed unnecessary since it will be done implicitly by the pipeline cache and creates issues with the legacy attribute conversion pass. It now purely serves as a frontend for Hades.
2022-11-02 17:46:07 +00:00
Billy Laws
e77e4891dc Tidy up Maxwell 3D regs a bit 2022-11-02 17:46:07 +00:00
Billy Laws
405d26fc22 Introduce Maxwell 3D shader state
Simple state holder that hashes the stored shader and reads it into a buffer
2022-11-02 17:46:07 +00:00
Billy Laws
6865f0bdaf Transition color blend state to packed pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
7049a521d2 Use Vulkan types directly in PackedPipelineState where possible 2022-11-02 17:46:07 +00:00
Billy Laws
effeb074b6 Move pipeline cache Key to cpp file and rename to PackedPipelineState
The new name is more indicative of what the struct contains and how it will be used as more than just a key.
2022-11-02 17:46:07 +00:00
Billy Laws
e1512c91a0 Transition depth stencil state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
1f844e2c18 Transition rasterization state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
9a6efb091c Transition tessellation state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ae5d419586 Transition input assembly state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
3f9161fb74 Transition vertex input state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ffe24aa075 Introduce a base pipeline cache key starting with RTs
It was determined that a general purpose Vulkan pipeline cache isn't viable for the significant performance reqs of Draw(), by using a Maxwell 3D specific key we can shrink state significantly more than if we used Vulkan structs.
2022-11-02 17:46:07 +00:00
Billy Laws
1e8f7d7fcb Implement dynamic pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
a94040ac7d Add default cases to enum conversions where necessary 2022-11-02 17:46:07 +00:00
Billy Laws
6e55d4dcf4 Implement color blend pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
cb11662ea5 Implement depth stencil pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
2484a2d6b5 Add A1R5G5B5Unorm and S8Uint formats 2022-11-02 17:46:07 +00:00
Billy Laws
690e96bce0 Implement rasterization pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
90cd6adb91 Implement tessellation pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
3649d4c779 Adapt Maxwell 3D engine to new interconnect code
Removes all usage of graphics_context.h from the codebase, exclusively using the new interconnect and its dirty tracking system. While porting the code a number of bugs were discovered such as not respecting the base instance or primitive type override, which have all been fixed. Currently only clears and constant buffer updates are implemented but due to the dirty state system allowing register handling on the interconnect end there shouldn't end up being many more changes.
2022-11-02 17:46:07 +00:00
Billy Laws
ef11900a39 Introduce reworked Maxwell 3D core interconnect
This mainly distributes operations down to activeState and pipelineState, aside from clears which are implemented in-place. The exposed interface is much reduced as opposed to the previous GraphicsContext system due to the newly introduced dirty system, this should hopefully make the code more maintainable and keep actual rendering operations seperate from primitive restart state or whatever. Currently draws are unimplemented and the only full implemented things are clears and constant buffer operations.
2022-11-02 17:46:07 +00:00
Billy Laws
37b821a4dc Introduce Maxwell 3D interconnect active state
Active state encapsulates all state that isn't part of a pipeline and can  be set dynamically with Vulkan calls. This includes both dynamic state like stencil faces, and command buffer state like vertex buffer bindings.
Simililarly to the last commit, the main goal of this is to reduce the number of redundant work done per draw by employing dirty state as much as possible. Without using dirty state for this every active state operation would need to be performed every draw, which gets very expensive when things like buffer lookups end up being reqiored. Code has also been heavily cleaned up as is described in the previous commit.
2022-11-02 17:46:07 +00:00
Billy Laws
5fdda78073 Introduce Maxwell 3D interconnect pipeline state
The main goal of this is to reduce the number of redundant lookups and work done per draw as much as possible, this is mainly achived through heavy used of dirty tracking though other optimisations like heavily using the linear allocator are also in play. In addition to the goal of performance, the code has been cleaned up and abstracted significantly from its state in graphics_context, hopefully making the GPU interconnect code much more maintainable in the future and reducing the boilerplace needed to add even simple functionality. This commit includes partial pipeline state, enough for implementing clears + a slight bit extra.
2022-11-02 17:46:07 +00:00
Billy Laws
21f5611231 Rewrite constant buffer interconnect code
Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.
2022-11-02 17:46:07 +00:00
Billy Laws
d1e7bbc1d8 Introduce common code for Maxwell 3D interconnect rewrite
This common code will be used across the entirety of the 3D rewrite, it also includes a stub for StateUpdateBuilder, which will be used by active state code to apply state updates.
2022-11-02 17:46:07 +00:00
Billy Laws
a6c49115f9 Rewrite all Maxwell 3D registers up to clears to match Nvidia docs
All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.
2022-11-02 17:46:07 +00:00
Billy Laws
d7eab40f1c Introduce resource based dirty tracking infrastructure
This will be heavily used by the upcoming GPU rework. It provides an intuitive way to track dirtiness based on using the underlying pointers of objects, as opposed to other methods which often need an enum entry per dirty state and don't support overlaps. Wrappers for dirty state objects are also provided to abstract as much of the dirty tracking as possible from user code. The pointer based mechanism also serves to avoid having to handle dirty bindings on the user side of the dirty resources, allowing them to bind things internally instead.
2022-11-02 17:46:07 +00:00
Billy Laws
8471ab754d Introduce a spin lock for resources locked at a very high frequency
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
2022-11-02 17:46:07 +00:00
Billy Laws
d810619203 Drop 3D engine method calling fast path in GPFIFO
This ended up actually turning out to be a slow path when Maxwell 3D method handling code was inlined.
2022-11-02 17:46:07 +00:00
Billy Laws
ded02e3eac Small engine.h fixups 2022-11-02 17:46:07 +00:00
Billy Laws
38ba963311 Drop usage of unique_ptr for Maxwell3D
Since graphics context is being replaced and split into cpp files there will no longer be any circular includes that previously prevented this.
2022-11-02 17:46:07 +00:00
Billy Laws
90db743c56 Source AsGpu GMMU page sizes from GMMU class 2022-11-02 17:46:07 +00:00
Billy Laws
e72fe02c15 Add inline fast-path for Buffer::FindOrCreate()
This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.
2022-11-02 17:46:07 +00:00
Billy Laws
49478e178a Avoid redundantly syncing buffers before every Write in an execution
This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.
2022-11-02 17:46:07 +00:00
Billy Laws
f7a726e452 Allow attempting to write to buffers without passing a GPU copy callback
Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.
2022-11-02 17:46:07 +00:00
Billy Laws
5dca5cc10e Redesign buffer view infra to remarkably reduce creation overhead
Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.
2022-11-02 17:46:07 +00:00
Billy Laws
09f376e500 Add const accessors to OffsetMember 2022-11-02 17:46:07 +00:00
Billy Laws
64a9db2e82 Introduce MergeInto helper for simplified construction of arrays of structs
In the upcoming GPU code each state member will hold a reference to its corresponding Maxwell 3D regs, this helper is needed to allow easy transformation from the the main 3D register struct into them.

Example:
```c++
struct Regs {
    std::array<View, 10> viewRegs;
    u32 enable;
} regs;

struct ViewState {
    const View &view;
    const u32 &enable;
    size_t index;
};

std::array<ViewState, 10> viewStates{MergeInto<ViewState, 10>(regs.viewRegs, regs.enable, IncrementingT{})
```
2022-11-02 17:46:07 +00:00
Billy Laws
2c682f19a6 Add untracked linear allocator emplace/allocate functions
Useful for cases where allocations are guaranteed to be unused by the time `Reset()` is called and calling `Free()` would be difficult or add extra performance cost due to how the allocation is used.
2022-11-02 17:46:07 +00:00
Billy Laws
6359852652 Introduce page size constants and replace all usages of PAGE_SIZE
Avoids using macros and results in code which looks slightly cleaner.
2022-11-02 17:46:07 +00:00
Billy Laws
30ec844a1b Use GPFIFO pushbuffer contents in-place if possible
The memcpy within `Read()` was taking up a fair amount of time, avoid this by using the mapped range in-place when the mapping isn't split.
2022-11-02 17:46:07 +00:00
Billy Laws
be825b7aad Utilise SegmentTable for rapid FlatMemoryManager lookups
In some games performing the binary search in `TranslateRange()` ended up taking a fairly large (~8%) proportion of GPFIFO time. By using a segment table for O(1) lookups this is reduced to <2% for non-split mappings at the cost of slightly increased memory usage (2GiB in the absolute worse case but more like 50MiB in real world situations).

In addition to adapting `TranslateRange()` to use the segment table, a new function `LookupBlock()` for cases where only a single mapping would ever be looked up so the small_vector handling and fallback paths can be skipped and the entire lookup be inlined.
2022-11-02 17:46:07 +00:00
Morph
4ea0b0e1e5 fssrv: IFileSystemProxy: Implement OpenReadOnlySaveDataFileSystem
Forward this function to OpenSaveDataFileSystem for now. A proper implementation should wrap the underlying filesystem with nn::fs::ReadOnlyFileSystem.
2022-10-30 20:04:40 +00:00
Narr the Reg
25b9bb00fd service: hid: Properly clear and set npad devices 2022-10-30 15:52:03 +00:00
german77
cf95cfb056 service: hid: Stub SetPalmaBoostMode 2022-10-30 15:52:03 +00:00
Narr the Reg
4da934579c service: hid: Set the correct maxEntry value and signal on acquire event handle 2022-10-30 15:52:03 +00:00
Dima
baa6b5d5ea check if NpadId is valid when update 2022-10-30 15:51:45 +00:00
Dima
a409f30e91 add GetAvailableLanguageCodeCount for both lists 2022-10-30 15:51:29 +00:00
Dima
51ce3f7c3c Stub IClient::Poll 2022-10-29 19:52:50 +05:30
Billy Laws
1846f533bc Add credits PreferenceCategory 2022-10-25 21:40:28 +01:00
Abandoned Cart
267d25b5a5 Prevent a false positive SecurityException for DocumentsProvider 2022-10-25 20:17:18 +05:30
lynxnb
1ae36cea24 Update ngword2 archive with the correct content 2022-10-25 00:40:46 +02:00
Billy Laws
160c2f3457 Add Ko-Fi credits to settings 2022-10-23 21:14:39 +01:00
PixelyIon
128ea33073 Print NPDM + NACP metadata for title determination
We determined that printing NPDM + NACP metadata is a significantly better way to determine what title is running rather than printing the filename.
2022-10-23 20:20:44 +05:30
PixelyIon
a0539a3edb Trace Scheduler Preemption/Yield in Perfetto 2022-10-22 17:37:03 +05:30
PixelyIon
c874907eb5 Log and flush inside KProcess::Kill
We want to know when the `KProcess` is being killed and flushing log during it is important since it can often result in hangs due to joining not working correctly.
2022-10-22 17:17:04 +05:30
PixelyIon
597a6ff31d Wait on slot to be freed in GraphicBufferProducer::DequeueBuffer
We currently don't wait on a slot to be freed if none are free, this worked prior to async presentation as GBP's slots wouldn't change their state until other commands were called but now slots can be held by the presentation engine. As a result, we now have to wait on the presentation engine to free up slots.

This commit also fixes the behavior of the `async` flag in `DequeueBuffer` as it was treated as a non-blocking flag but isn't supposed to do anything on HOS.
2022-10-22 17:15:33 +05:30
lynxnb
cdb2b85d6c Stub ngword and ngword2
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
a7be4fd1e1 Stub pl::RequestLoad and pl::GetSharedFontInOrderOfPriority
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
782f9e37ee Add a system region setting
Needed for games such as AC:NH.
The `Auto` option automatically selects a region based on the currently selected system language.

Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
d4800d13b8 Stub hid::ActivateMouse and hid::ActivateKeyboard
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
45830633eb Stub am::SetTerminateResult 2022-10-18 20:54:57 +01:00
lynxnb
bc016aff47 Make the vulkan validation layer toggleable via setting
As part of this commit, a new preference category for debug settings is being introduced. All future settings only relevant for debugging purposes will be put there. The category is hidden on release builds.
2022-10-18 19:47:23 +02:00
lynxnb
5cf14e45e1 Enable frame throttling when triple buffering is disabled 2022-10-18 19:27:14 +02:00
lynxnb
b17364bb92 Introduce a dev app flavor for side-by-side installation 2022-10-01 13:01:46 +02:00
Billy Laws
8d1026d0cc Reduce font shared memory size for compacted fonts 2022-09-22 21:51:13 +01:00
Billy Laws
25fb09800a Compact fonts to only include necessary glyphs 2022-09-22 21:51:13 +01:00
Billy Laws
e31ed6a429 Increase font shared memory size for Noto fonts 2022-09-22 21:34:29 +01:00
Billy Laws
0d4893c448 Cleanup font magic generation code 2022-09-22 21:34:29 +01:00
Billy Laws
5c4cc3d51f Fix font load order to match HOS
Without this games would load an inappropriate font file for their target language.
2022-09-22 21:34:29 +01:00
Billy Laws
272bbf6cd2 Switch to Noto Sans fonts for shared fonts replacement
Provides CJK characters, which the previous replacements lacked entirely.
2022-09-22 21:34:29 +01:00
lynxnb
54172322fe Fix host synchronization for texture with a different guest format
Host synchronization of a guest texture with a different guest format represents a valid use case where the host doesn't support the guest format and conversion to a host-compatible format must be performed. The issue is most evident on Mali GPUs, as they don't support BCn texture formats thus needing manual decoding before submission. It was disabled by mistake in a previous commit, this commit re-enables it.
2022-09-15 15:22:52 +02:00
TheASVigilante
a1ff4e1777
Implement OpenHardwareOpusDecoderEx and GetWorkBufferSizeEx
Implements these 2 functions which were introduced in HOS 12.0.0. Fixes a crash in Xenoblade Chronicles 3.
2022-09-10 22:04:15 +01:00
lynxnb
34bd16426c Fix quads index buffer conversion not accounting for first index
Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index.
Indexed quad draws also suffered from the same issue, but was never encountered in games.
This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.
2022-09-04 12:42:33 +02:00
Billy Laws
9af5df4bae Increase IPC pointer buffer size
Some services report a pointer buffer size > 0x1000, so up it as to not cause issues when they're implemented.
2022-09-02 23:14:05 +01:00
Billy Laws
51f4e7662e Add support for the TIPC protocol introduced in HOS 12.0.0
TIPC is a much lighter layer ontop of the Horizon IPC system than CMIF and is used by SM in 12.0.0+. This implementation is slightly hacky since it doesn't really keep a seperation between the underlying kernel IPC stuff and userspace like CMIF/TIPC, this should be fixed eventually, probably together with an IPC dispatch rewrite to avoid the mess of frozen maps.

Tested with Hentai Uni, which now crashes needing 'ldr:ro'.
2022-09-02 23:13:23 +01:00
Billy Laws
a40d7c78ad Always recreate oboe stream on error
This is what's done in oboe examples to avoid spurious errors breaking audio entirely.
2022-08-31 21:26:14 +01:00
Billy Laws
8917ec9c88 Don't set framesPerCallback for main stream as per oboe guidance
It's best to let oboe figure it out on it's own
2022-08-31 21:26:14 +01:00
Billy Laws
b00008daf5 Fix identifier release check in AudioTrack::Stop 2022-08-31 21:26:14 +01:00
Billy Laws
d9c8e62d1c Don't warn on GetConfig IOCTL fails 2022-08-31 21:26:14 +01:00
Billy Laws
4aef24ba32 Implement NVGPU_GPU_IOCTL_GET_GPU_TIME in nvdrv 2022-08-31 21:26:14 +01:00
Billy Laws
5841799420 Fix decoding of IOCTLs with padding at the end 2022-08-31 21:26:14 +01:00
Billy Laws
82444f3b0a Don't set push descrptor flag for desc sets
This is redundant and against the spec since we no longer use push descriptors.
2022-08-31 21:26:14 +01:00
PixelyIon
94fdd6aa43 Fix HID touch points not being removed from screen
Tapping anything in titles that supported touch (such as Puyo Puyo Tetris or Sonic Mania) wouldn't work due to the first touch point never being removed from the screen, it is supposed to be removed after a 3 frame delay from the touch ending.

This commit introduces a mechanism to "time-out" touch points which counts down during the shared memory updates and removes them from the screen after a specified timeout duration.
2022-08-31 23:43:02 +05:30
PixelyIon
70ad4498a2 Write HID LIFO entries at fixed intervals
Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior.

This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest.

Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>
2022-08-31 22:49:36 +05:30
PixelyIon
7966bfa9f6 Fix PI update KThread::waiterMutex deadlock
It was determined that deadlocks inside `KThread::UpdatePriorityInheritance` would not only arise from the first level of locking with `waitingOn->waiterMutex` but also the second level of locking with `nextThread->waiterMutex` which has now also been fixed to fallback when facing contention.
2022-08-28 20:15:08 +05:30
Abandoned Cart
86f6fc510e Remove printing result message from author
There is no purpose in printing the same result twice, so any duplicate messages should instead allow the field to be truncated.
2022-08-27 18:54:27 +05:30
Abandoned Cart
1013857fc4 Refactor subtitle as author to remove subtitle
Subtitle is no longer used, so instances have been rerouted to author. DataItem was also updated to reflect the removal of a subtitle.
2022-08-27 18:54:27 +05:30
Abandoned Cart
04cae942ea Follow typical per-file detail formatting
Format the details in the expected format of individual files (instead of a complete game) and move the code to match the updated placement.
2022-08-27 18:54:27 +05:30
lynxnb
5d6eaee301 Correctly save/restore ROM version to/from game entry cache
PR #1758 introduced a bug where the game list would be entirely loaded every time the app was opened. This commit addresses that issue, which was caused by the `version` member of the cached game list being serialized to file (although incorrectly) but never actually read back when deserializing.
2022-08-21 20:24:36 +02:00
lynxnb
cfa5f0e030 Fix OSC alpha not changing on button press 2022-08-20 17:00:40 +02:00
KikiManjaro
8fb4e62c28 Add version information about rom
Review:
Co-authored-by: Niccolò Betto <niccolo.betto@gmail.com>
2022-08-20 13:48:07 +02:00
KikiManjaro
3407f6d530 Add OSC opacity adjustment 2022-08-20 13:46:17 +02:00
lynxnb
d129fb09cd Android Manifest cleanup
* Remove `package` from manifest and from activity prefixes, gradle `namespace` will be used instead
* Removed deprecated `android.support.PARENT_ACTIVITY` metadata 
entries
* Make `MainActivity` and `SettingsActivity` launched in `singleTop` mode to avoid unnecessary activity restarts while navigating the app
2022-08-17 15:33:53 +02:00
lynxnb
e9618d9e2c Use pragma pack directions for tightly packing structs containing u128
Using `__attribute__((packed))` doesn't work in new NDKs when a struct contains 128-bit integer members, likely because of a ndk/compiler bug. We now enclose the requiring structs in `#pragma pack` directives to tightly pack them.
2022-08-17 12:22:11 +02:00
lynxnb
c4bf92a49f Fix Kotlin compilation errors from incorrect overloading of null-safe types 2022-08-17 12:16:26 +02:00
Billy Laws
bf491f71f9 Simplify blit helper shader vertex order 2022-08-10 15:43:16 +01:00
Billy Laws
c32bec071c Adjust blit src{X,Y} to account for centred sampling before calling into helper shader
Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.
2022-08-10 15:39:37 +01:00
Billy Laws
08f36aac33 Enable hades vertex position input workaround for Adreno
Caused crashes in any games using geometry shaders as by default hades uses the position builtin directly.
2022-08-08 18:09:00 +01:00
Billy Laws
04e7b684d2 Enable vertexPipelineStoresAndAtomics, fragmentStoresAndAtomics and shaderStorageImageWriteWithoutFormat Vulkan features
Used by Xenoblade Chronicles DE
2022-08-08 17:43:18 +01:00
Billy Laws
390558c802 Add partial support for legacy attribute conversion
We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.
2022-08-08 17:43:18 +01:00
Billy Laws
540437b547 Fixup index buffer view caching
We forgot to set the view size, which would end up forcing a view to be recreated with every call
2022-08-08 17:43:18 +01:00
Billy Laws
c966cd3b26 Prevent runtimeInfo vertex state from leaking into wrong shaders
This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.
2022-08-08 17:43:13 +01:00
Billy Laws
c52d3195cf Ensure shader stage enable state matches pipeline stage enable state
As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.
2022-08-08 17:40:35 +01:00
Billy Laws
b1c669ba14 Always keep the VertexB shader stage enabled
HW doesn't allow disabling the VertexB stage, enforce this in code.
2022-08-08 17:40:35 +01:00
lynxnb
d5174175d1 Implement indexed quads support
We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.
2022-08-08 17:40:35 +01:00
lynxnb
e6741642ba Split out megabuffer allocation from pushing data
The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.
2022-08-08 17:40:35 +01:00
Billy Laws
cdc6a4628a Enable VK uint8 indices feature when supported 2022-08-08 17:40:35 +01:00
Billy Laws
dccc86ea97 Implement transform feedback with VK_EXT_transform_feedback
Tested to work in Xenoblade Chronicles DE, the code handles both hades varying input and buffer setup.
2022-08-08 17:40:35 +01:00
Billy Laws
06053d3caf Rewrite Fermi 2D engine to use the blit helper shader
Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock
2022-08-08 17:40:35 +01:00
Billy Laws
395f665a13 Implement a system for helper shaders together with a simple blit shader
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may  need in the future.
2022-08-08 17:40:35 +01:00
Billy Laws
f4e58a9238 Remove redundant synchost creating a new buffer 2022-08-08 14:57:44 +01:00
Billy Laws
11a8feb037 Correct nvdrv DMA copy class ID
Was wrongly copy and pasted.
2022-08-08 14:57:44 +01:00
Billy Laws
13e7b54c61 Ensure failed IOCTLs are logged as a warning log 2022-08-08 14:57:44 +01:00
Billy Laws
eeb86a4f8a Calculate renderArea from min(attachments.dimensions...)
Vulkan doesn't support a renderArea larger than that of the smallest attachment
2022-08-08 14:57:44 +01:00
Billy Laws
9ea658d0ed Don't throw on unsupported TIC formats
These sometimes spuriously occur in games during transitions, to avoid crashing during them just use the null texture if they occur and log an error log
2022-08-08 14:57:44 +01:00
Billy Laws
856818c8eb Emulate the 'None' mipfilter by adjusting LOD
Borrowed this technique from yuzu since Vulkan has no direct equivalent
2022-08-08 14:57:44 +01:00
Billy Laws
9d50b6d0f7 Avoid locking presentation mutex in GetTransformHint
This caused slowdown in Pokemon as it was being called every frame
2022-08-08 14:57:44 +01:00
Billy Laws
460e6c9c84 Use raw pointers to hold constant buffer views
The constant destruction and creation of `BufferView`s in cbuf-heavy games showed up as a large chunk of the profiler. Fix this by taking advantage of the fact that constant buffer `BufferView`s are never deleted and always kept around in the cache to just return a pointer to them in the cache.
2022-08-08 14:54:57 +01:00
Billy Laws
6b2e84712b Avoid race in nvdrv debug prints
Looking up the device name without locking it could race with map insertions or deletions, so lock it to avoid that
2022-08-08 13:24:23 +01:00
Billy Laws
683cd594ad Use a linear allocator for most per-execution GPU allocations
Currently we heavily thrash the heap each draw, with malloc/free taking up about 10% of GPFIFOs execution time. Using a linear allocator for the main offenders of buffer usage callbacks and index/vertex state helps to reduce this to about 4%
2022-08-08 13:24:21 +01:00
Billy Laws
70eec5a414 Store delegate attached state within the delegate itself
Avoids a costly map lookup for every AttachBuffer call, this was a serious bottleneck in SMO
2022-08-08 13:23:26 +01:00
Billy Laws
0268e1d5a0 Force a submit before any i2m engine writes
We need traps to be inplace so we dont end up overwriting a resource that's being actively used by the current context without setting it to dirty
2022-08-08 13:22:37 +01:00
Billy Laws
cb0b132486 Allow supplying push constants to GetPipeline 2022-08-08 13:22:37 +01:00
Billy Laws
1c8863ec3b Use const references for holding pipeline state in pipeline cache
Allows passing in constexpr structs to state directly
2022-08-08 13:22:37 +01:00
Billy Laws
b6b04fa6c5 Use small_vector for VMM TranslateRange results
This was the source of a lot of heap allocs, moving to small_vector helps to avoid most of them
2022-08-08 13:22:37 +01:00
Billy Laws
1fe6d92970
Wait on Swapchain Image copy to complete
Certain titles can have a display frames out of order due to not waiting on the copy from the final RT to the swapchain image to occur. Although `PresentFrame` does wait on the syncpoint, that isn't enough to ensure the source texture is up-to-date due to us signalling syncpoints early. 

By waiting on the swapchain texture after the copy is submitted, we now implicitly wait on the source texture's cycle to be signalled thus waiting on the frame to be done which fixes the issue.
2022-08-07 03:12:27 +05:30
PixelyIon
5b7572a8b3
Introduce chunked MegaBuffer allocation
After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage.

This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2022-08-07 03:12:27 +05:30
Billy Laws
99b5fc35c6
Change SegmentTable semantics to respect unset entries
Accesses to unset entries is now clearly defined as returning a 0'd out value, the prior behavior would be to optimize sets for border segments to use L2 atomicity when the specific segment had no L1 entries set. This would lead to any future lookups of offsets within the same L2 segment but a different L1 entry to incorrectly return an inaccurate value as the only prior guarantee was that lookups after setting a segment would return the same value as was set but lacked the guarantee for unset segments to also consistently return unset values.

This could lead to issues in practical usages such as the `BufferManager` lookups returning the existence of a `Buffer` at a location falsely even though the segment was never set to the value, this was problematic as raw pointers were utilized and bound checks would lead to a segmentation fault.

This commit fixes this issue by introducing this guarantee and refactoring the class accordingly, it also deletes the `Set` method for setting a single entry as the meaning is ambiguous and it's functionality was more akin to the past guarantee and no longer makes sense.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
36b8d3c445
Account for SegmentTable insertions entirely within an L2 entry
We would always write all L1 entries that correspond to an L2 entry, even if setting an input range ended before that. This would effectively reduce the atomicity of the segment table to that of the L2 range and lead to breaking API guarantees by returning entirely wrong segment values for a lookup covering a region that was overwritten.
2022-08-06 22:20:54 +05:30
PixelyIon
c72316d9f6
Rename RangeTable to SegmentTable
It was determined that `RangeTable` was too ambiguous of a name as it could be interpreted to be holding ranges rather than looking them up, to avoid confusion the terminology has been changed to `range` to `segment`. As "segment table" is more clear in describing that it is a table comprised of descriptors regarding segments and it avoids any overlaps with terminology concerning "pages" which would be overly specific for this data structure or the ambiguous "ranges".
2022-08-06 22:20:54 +05:30
Billy Laws
5398eff045
Fix KProcess::MutexUnlock PI CAS
The PI CAS in `MutexUnlock` ends up loading `basePriority` rather than `priority` which could lead to an infinite CAS loop when `basePriority` doesn't equal to `priority` and the `highestPriorityThread`'s priority is lower than `basePriority`.
2022-08-06 22:20:54 +05:30
PixelyIon
850c0f4092
Make Texture::SynchronizeGuest Blocking
It was determined that `Texture::SynchronizeGuest`'s `TextureBufferCopy` had races that were exposed by the introduction of the cycle waiter thread, the synchronization did not take place under a locked context so the texture could be mutated at any point in addition to the destructor not being run during `FenceCycle::Wait` due to `shouldDestroy` being `false`. 

This commit fixes the issue by making `SynchronizeGuest` entirely blocking as all usages of the function required blocking semantics regardless so it would be pointless to retain its async nature while solving any races that may arise from it being async.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
77d15b02a3
Ensure backing continuity when recreating GPU dirty buffers
Since we don't call `SynchronizeHost` on source buffers which are GPU dirty, their mirrors will be out of date. The backing contents of this source buffer's region in the new buffer will be incorrect. By copying from the backing directly, we can ensure that no writes are lost and that if the newly created buffer needs to turn GPU dirty during recreation no copies need to be done since the backing is as up to date as the mirror at a minimum.
2022-08-06 22:20:54 +05:30
Billy Laws
c1bf5a804a
Extend stateMutex scope inside Buffer::SynchronizeHost
The code is much simpler to reason about when reading the code as it doesn't require evaluating all the potential edge cases of trap handlers in different states. It should be noted that this should not change behavior in any meaningful way, at most it can prevent a minor race where the protection could be upgraded after being downgraded by the signal handler leading to a redundant trap.
2022-08-06 22:20:54 +05:30
PixelyIon
c3cf79cb39
Rework KThread::waiterMutex Locking
Two issues exist with locking of `KThread::waiterMutex`:
* It was not always locked when accessing waiter members such as `waitThread`, `waitKey` and `waitTag` which would lead to a race that could end up in a deadlock or most notably a segfault inside `UpdatePriorityInheritance`
* There could be a deadlock from `UpdatePriorityInheritance` locking `waiterMutex` of a thread and waiting to get the owner's `waiterMutex` while on another thread `MutexUnlock` holds the owner's `waiterMutex` and waits on locking the `waiterMutex` held by `UpdatePriorityInheritance`

This commit fixes both issues by adding appropriate locking to all locations where waiter members are accessed in addition to adding a fallback mechanism inside `UpdatePriorityInheritance` that unlocks `waiterMutex` on contention to avoid a deadlock.
2022-08-06 22:20:54 +05:30
PixelyIon
68615703c1
Fix KProcess/SetThreadPriority PI CAS
The condition for exiting the CAS loops is incorrect in several places which leads to additional loops, while this doesn't make the behavior incorrect it does lead to redundant iterations. 

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
8fc3cc7a16
Rework Descriptor Set Allocation/Updates
A substantial amount of time would be spent on creation/destruction of `VkDescriptorSet` which scales on titles doing a substantial amount of draws with bindings, this leads to poor performance on those titles as the frametime is dragged down by performing these tasks while they repeatedly create descriptor sets of the same layouts.

This commit fixes it by pooling descriptor sets per-layout in a dynamically resizable pool and keeping them around rather than destroying them after usage which leads to the vast majority of cases not requiring a new descriptor set to even be created. It leads to significantly improved performance where it would otherwise be spent on redundant destruction/recreation or push descriptor updates which took a substantial amount of time themselves.

Additionally, the `BaseDescriptorSizes` were not kept up to date with all of the descriptor types, it led to no crashes on Adreno/Mali as they were purely used for size calculations on either driver but has been corrected to avoid any future issues.
2022-08-06 22:20:54 +05:30
PixelyIon
e1a4325137
Introduce FenceCycle Waiter Thread
A substantial amount of time is spent destroying dependencies for any threads waiting or polling `FenceCycle`s, this is not optimal as it blocks them from moving onto other tasks while destruction is a fundamentally async task and can be delayed.

This commit solves this by introducing a thread that is dedicated to waiting on every `FenceCycle` then signalling and destroying all dependencies which entirely fixes the issue of destruction blocking on more important threads.
2022-08-06 22:20:54 +05:30
PixelyIon
5f8619f791
Optimize Buffer Lookups using Range Tables
Buffer lookups are a fairly expensive operation that we currently spend `O(log n)` on the simplest and most frequent case of which is a direct match, this is a very frequent operation where that may be insufficient. This commit optimizes that case to `O(1)` by utilizing a `RangeTable` at the cost of slightly higher insertion/deletion costs for setting ranges of values but these are minimal in frequency compared to lookups.
2022-08-06 22:20:54 +05:30
PixelyIon
578ae86cca
Implement Multi-Level Range Table
A data structure that can represent the same value for a range of addresses (pages) is required for fast lookup in certain cases. This commit implements a near optimal data structure for mass insertion and O(1) lookup of range-based data, this is achieved using the host MMU and implementing multiple levels of atomicity for the ranges. 

It should be noted that the table is limited to two levels but can be extended to a variable amount of ranges in the future, it was determined that additional levels of ranges can be beneficial for performance depending on the specific use-case.
2022-08-06 22:20:54 +05:30
Billy Laws
38eab80ed8
Disable Vulkan Push Descriptors on Adreno
Adreno drivers have certain errata which leads to Vulkan Push Descriptors to be broken on them in certain cases which leads to a descriptor set update being swallowed. This has been worked around by disabling push descriptors on Adreno drivers, this may lead to reduced performance on certain titles which frequently bind new descriptors.
2022-08-06 22:20:54 +05:30
Billy Laws
88fd491ed5
Submit after Maxwell3D Semaphore Release
Any semaphore releases are implicit synchronization events that can be utilized by the guest to pick up that the GPU has executed till a certain point and therefore we must submit all prior work accordingly.
2022-08-06 22:20:54 +05:30
Billy Laws
b77da1182f
Don't flush submission on DMA Copies
DMA copies utilized `SubmitWithFlush` instead of `Submit`, this is not required and incurs significant additional synchronization penalties which will no longer be required.
2022-08-06 22:20:54 +05:30
PixelyIon
0992fde028
Don't block on surface creation in GetTransformHint
We want to avoid blocking on surface creation unless necessary, this commit doesn't wait on the creation of the surface as it default initializes the value which'll generally be `Identity` or the transformation of the previous surface if it was lost.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
35133381b6
Fix V-Sync KEvent construction order
The V-Sync `KEvent` would be used by the presentation thread prior to construction leading to dereferencing an invalid value, this has been fixed by changing the order of construction to move the construction of the presentation thread after the V-Sync event.
2022-08-06 22:20:54 +05:30
PixelyIon
ffad246d67
Split NCE Trap page-out functionality from TrapRegions
The `TrapRegions` function performed a page-out on any regions that were trapped as read-only, this wasn't optimal as it would tie them both into the same operation while Buffers/Textures require to protect then synchronize and page-out. The trap was being moved to after the synchronize to get around this limitation but that can cause a potential race due to certain writes being done after the synchronization but prior to the trap which would be lost. This commit fixes these issues by splitting paging out into `PageOutRegions` which can be called after `TrapRegions` by any API users.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
da464d84bc
Consolidate NCE::TrapRegions functionality into CreateTrap
`NCE::TrapRegions` was a bit too overloaded as a method as it implicitly trapped which was unnecessary in all current usage cases, this has now been made more explicit by consolidating the functionality into `NCE::CreateTrap` which handles just creation of the trap and nothing past that, `RetrapRegions` has been renamed to `TrapRegions` and handles all trapping now.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
8a62f8d37b
Rework Texture Synchronization API + Locking
Similar to `Buffer`s, `Texture`s suffered from unoptimal behavior due to using atomics for `DirtyState` and had certain bugs with replacement of the variable at times where it shouldn't be possible which have now been fixed by moving to using a mutex instead of atomics. This commit also updates the API to more closely match what is expected of it now and removes any functions that weren't utilized such as `SynchronizeGuestWithBuffer`.
2022-08-06 22:20:54 +05:30
Billy Laws
04bcd7e580
Rework Buffer DirtyState with BackingImmutability
Having a single variable denoting the exact state of a buffer and the operations that could be performed on it was found to be too restrictive, it's now been expanded into an additional `BackingImmutability` variable but due to these two. We can no longer use atomics without significant additional complexity so all accesses to the state are now mediated through `stateMutex`, a mutex specifically designed for tracking the state.

While designing the system around `stateMutex` it was determined to be more efficient than atomics as it would enforce blocking far less than it would generally have been compared to if the regular atomic fallback of locking the main resource lock which is locked for significantly longer generally.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
1af781c0a5
Add Perfetto Tracing to NCE Trapping API
As a performance sensitive part of code, the NCE Trapping API benefits from having tracing and it helps us better determine where guest code is spending its time for more targeted optimizations.
2022-08-06 22:20:54 +05:30
PixelyIon
9d294b9ccc
Use weak_ptr for TrapHandler Callbacks
The lifetime of the `this` pointer in the trap callbacks could be invalid as the lifetime of the underlying `Buffer`/`Texture` object wasn't guaranteed, this commit fixes that by passing a `weak_ptr` of the objects into the callbacks which is locked during the callbacks and ensures that a destroyed object isn't accessed.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
96d8676d5b
Fix SubmitWithFlush not updating MegaBuffer cycle
The `CommandExecutor`'s `MegaBuffer` was not being updated with the latest `FenceCycle` on being flushed in `SubmitWIthFlush`, this led to the megabuffer being overwritten prior to its GPU-side usage being complete. This commit fixes that by replacing the cycle to the latest cycle and prevents any races that occurred prior.
2022-08-06 22:20:54 +05:30
PixelyIon
3e9d84b0c3
Split FindOrCreate functionality across BufferManager
`FindOrCreate` ended up being monolithic function with poor readability, this commit addresses those concerns by refactoring the function to split it up into multiple member functions of `BufferManager`, while some of these member functions may only have a single call-site they are important to logically categorize tasks into individual functions. The end result is far neater logic which is far more readable and slightly better optimized by virtue of being abstracted better.
2022-08-06 22:20:54 +05:30
PixelyIon
d2a34b5f7a
Implement ContextLock Move-assignment operator
In certain cases the move constructor may not suffice and the move assignment operator is required, this commit implements that and moves to using a pointer for storing the `resource` member rather than a reference as its semantics matched what we desired more and allowed for assignment of the `resource`.
2022-08-06 22:20:54 +05:30
PixelyIon
38d3ff4300
Fix BufferManager::FindOrCreate Recreation Bugs
It was determined that `FindOrCreate` has several issues which this commit fixes:
* It wouldn't correctly handle locking of the newly created `Buffer` as the constructor would setup traps prior to being able to lock it which could lead to UB
* It wouldn't propagate the `usedByContext`/`everHadInlineUpdate` flags correctly
* It wouldn't correctly set the `dirtyState` of the buffer according to that of its source buffers
2022-08-06 22:20:54 +05:30
Billy Laws
d1a682eace
Fix setDirty behavior in Buffer::SynchronizeGuest
The condition for `setDirty` in the dirty state CAS was inverted from what it should've been resulting in synchronizing incorrectly, this commit fixes the condition to correct synchronization.
2022-08-06 22:20:54 +05:30
Billy Laws
00d434efdc
Remove Texture::CopyFrom format check
The formats of the textures involved in a texture were checked for equality, this broke certain copies as the presentation engine would invoke copies between textures of different yet compatible formats.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
58174f255f
Improve ContextLock semantics
`ContextLock` had unoptimal semantics in the form of direct access to the `isFirst` member which wasn't clearly defined, it's now been broken up into function calls `IsFirstUsage` and `OwnsLock` with explicit move semantics and a function for releasing the lock.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
561103d3da
Submit GPFIFO work prior to CircularQueue waiting
The position at which we call submit is a significant factor in performance and we did so at the end of PBs (PushBuffers), this isn't optimal as there could be multiple PBs queued up that would benefit from being in the same submission. We now delay the submission of the workload till we run out of PBs.
2022-08-06 22:20:54 +05:30
PixelyIon
3ac5ed8c06
Attach coalesced Buffer if any source Buffer is attached
A buffer that's attached to a context could be coalesced into a larger buffer which isn't attached, this would break as it wouldn't keep the buffer alive till the end of the associated context. To fix this if any source buffers are attached then the resulting coalesced buffer is also attached now.
2022-08-06 22:20:54 +05:30
PixelyIon
284ac53d88
Fix KThread Priority Inheritance CAS
The CAS condition for KThread PI was inverted which lead to entirely incorrect behavior for CAS conditions which while it might work in the vast majority of cases would lead to significantly inaccurate behavior.
2022-08-06 22:20:54 +05:30
PixelyIon
45cb8388cc
Fix NCE Trap API Lock Callback
The lock callback would `continue` which would end up skipping over the current item as it applied to the inner loop rather than the outer loop as intended. This has now been fixed by using `break` and a check instead.
2022-08-06 22:20:54 +05:30
PixelyIon
745d809e07
Fix Buffer::SynchronizeGuest Non-Blocking Behavior
The buffer's non-blocking behavior could lead to an invalid state where the dirty state doesn't adequately represent the buffer's true state, the check has now been moved inside the CAS loop as its behavior changes depending on the dirty state. In addition, `SynchronizeGuest` returns a boolean denoting if the synchronization was successful now to make code flows depending on non-blocking synchronization cleaner.
2022-08-06 22:20:54 +05:30
PixelyIon
c1f2445772
Set state to CpuDirty directly in SynchronizeGuest
`SynchronizeGuest` could only set the dirty state to `Clean` which was redundant since calls to it from inside the write trap handler would set it to `CpuDirty` directly after, this fixes that by doing it inside the function when necessary.
2022-08-06 22:20:54 +05:30