skyline

mirror of https://github.com/skyline-emu/skyline.git synced 2024-11-18 23:49:15 +01:00

Author	SHA1	Message	Date
Billy Laws	37b821a4dc	Introduce Maxwell 3D interconnect active state Active state encapsulates all state that isn't part of a pipeline and can be set dynamically with Vulkan calls. This includes both dynamic state like stencil faces, and command buffer state like vertex buffer bindings. Simililarly to the last commit, the main goal of this is to reduce the number of redundant work done per draw by employing dirty state as much as possible. Without using dirty state for this every active state operation would need to be performed every draw, which gets very expensive when things like buffer lookups end up being reqiored. Code has also been heavily cleaned up as is described in the previous commit.	2022-11-02 17:46:07 +00:00
Billy Laws	5fdda78073	Introduce Maxwell 3D interconnect pipeline state The main goal of this is to reduce the number of redundant lookups and work done per draw as much as possible, this is mainly achived through heavy used of dirty tracking though other optimisations like heavily using the linear allocator are also in play. In addition to the goal of performance, the code has been cleaned up and abstracted significantly from its state in graphics_context, hopefully making the GPU interconnect code much more maintainable in the future and reducing the boilerplace needed to add even simple functionality. This commit includes partial pipeline state, enough for implementing clears + a slight bit extra.	2022-11-02 17:46:07 +00:00
Billy Laws	21f5611231	Rewrite constant buffer interconnect code Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.	2022-11-02 17:46:07 +00:00
Billy Laws	d1e7bbc1d8	Introduce common code for Maxwell 3D interconnect rewrite This common code will be used across the entirety of the 3D rewrite, it also includes a stub for StateUpdateBuilder, which will be used by active state code to apply state updates.	2022-11-02 17:46:07 +00:00
Billy Laws	a6c49115f9	Rewrite all Maxwell 3D registers up to clears to match Nvidia docs All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.	2022-11-02 17:46:07 +00:00
Billy Laws	d7eab40f1c	Introduce resource based dirty tracking infrastructure This will be heavily used by the upcoming GPU rework. It provides an intuitive way to track dirtiness based on using the underlying pointers of objects, as opposed to other methods which often need an enum entry per dirty state and don't support overlaps. Wrappers for dirty state objects are also provided to abstract as much of the dirty tracking as possible from user code. The pointer based mechanism also serves to avoid having to handle dirty bindings on the user side of the dirty resources, allowing them to bind things internally instead.	2022-11-02 17:46:07 +00:00
Billy Laws	8471ab754d	Introduce a spin lock for resources locked at a very high frequency Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.	2022-11-02 17:46:07 +00:00
Billy Laws	d810619203	Drop 3D engine method calling fast path in GPFIFO This ended up actually turning out to be a slow path when Maxwell 3D method handling code was inlined.	2022-11-02 17:46:07 +00:00
Billy Laws	ded02e3eac	Small engine.h fixups	2022-11-02 17:46:07 +00:00
Billy Laws	38ba963311	Drop usage of unique_ptr for Maxwell3D Since graphics context is being replaced and split into cpp files there will no longer be any circular includes that previously prevented this.	2022-11-02 17:46:07 +00:00
Billy Laws	90db743c56	Source AsGpu GMMU page sizes from GMMU class	2022-11-02 17:46:07 +00:00
Billy Laws	e72fe02c15	Add inline fast-path for `Buffer::FindOrCreate()` This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.	2022-11-02 17:46:07 +00:00
Billy Laws	49478e178a	Avoid redundantly syncing buffers before every Write in an execution This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.	2022-11-02 17:46:07 +00:00
Billy Laws	f7a726e452	Allow attempting to write to buffers without passing a GPU copy callback Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.	2022-11-02 17:46:07 +00:00
Billy Laws	5dca5cc10e	Redesign buffer view infra to remarkably reduce creation overhead Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.	2022-11-02 17:46:07 +00:00
Billy Laws	09f376e500	Add const accessors to OffsetMember	2022-11-02 17:46:07 +00:00
Billy Laws	64a9db2e82	Introduce `MergeInto` helper for simplified construction of arrays of structs In the upcoming GPU code each state member will hold a reference to its corresponding Maxwell 3D regs, this helper is needed to allow easy transformation from the the main 3D register struct into them. Example: ```c++ struct Regs { std::array<View, 10> viewRegs; u32 enable; } regs; struct ViewState { const View &view; const u32 &enable; size_t index; }; std::array<ViewState, 10> viewStates{MergeInto<ViewState, 10>(regs.viewRegs, regs.enable, IncrementingT{}) ```	2022-11-02 17:46:07 +00:00
Billy Laws	2c682f19a6	Add untracked linear allocator emplace/allocate functions Useful for cases where allocations are guaranteed to be unused by the time `Reset()` is called and calling `Free()` would be difficult or add extra performance cost due to how the allocation is used.	2022-11-02 17:46:07 +00:00
Billy Laws	6359852652	Introduce page size constants and replace all usages of PAGE_SIZE Avoids using macros and results in code which looks slightly cleaner.	2022-11-02 17:46:07 +00:00
Billy Laws	30ec844a1b	Use GPFIFO pushbuffer contents in-place if possible The memcpy within `Read()` was taking up a fair amount of time, avoid this by using the mapped range in-place when the mapping isn't split.	2022-11-02 17:46:07 +00:00
Billy Laws	be825b7aad	Utilise SegmentTable for rapid FlatMemoryManager lookups In some games performing the binary search in `TranslateRange()` ended up taking a fairly large (~8%) proportion of GPFIFO time. By using a segment table for O(1) lookups this is reduced to <2% for non-split mappings at the cost of slightly increased memory usage (2GiB in the absolute worse case but more like 50MiB in real world situations). In addition to adapting `TranslateRange()` to use the segment table, a new function `LookupBlock()` for cases where only a single mapping would ever be looked up so the small_vector handling and fallback paths can be skipped and the entire lookup be inlined.	2022-11-02 17:46:07 +00:00
Morph	4ea0b0e1e5	fssrv: IFileSystemProxy: Implement OpenReadOnlySaveDataFileSystem Forward this function to OpenSaveDataFileSystem for now. A proper implementation should wrap the underlying filesystem with nn::fs::ReadOnlyFileSystem.	2022-10-30 20:04:40 +00:00
Narr the Reg	25b9bb00fd	service: hid: Properly clear and set npad devices	2022-10-30 15:52:03 +00:00
german77	cf95cfb056	service: hid: Stub SetPalmaBoostMode	2022-10-30 15:52:03 +00:00
Narr the Reg	4da934579c	service: hid: Set the correct maxEntry value and signal on acquire event handle	2022-10-30 15:52:03 +00:00
Dima	baa6b5d5ea	check if NpadId is valid when update	2022-10-30 15:51:45 +00:00
Dima	a409f30e91	add GetAvailableLanguageCodeCount for both lists	2022-10-30 15:51:29 +00:00
Dima	51ce3f7c3c	Stub IClient::Poll	2022-10-29 19:52:50 +05:30
Billy Laws	1846f533bc	Add credits PreferenceCategory	2022-10-25 21:40:28 +01:00
Abandoned Cart	267d25b5a5	Prevent a false positive `SecurityException` for DocumentsProvider	2022-10-25 20:17:18 +05:30
lynxnb	1ae36cea24	Update `ngword2` archive with the correct content	2022-10-25 00:40:46 +02:00
Billy Laws	160c2f3457	Add Ko-Fi credits to settings	2022-10-23 21:14:39 +01:00
PixelyIon	128ea33073	Print NPDM + NACP metadata for title determination We determined that printing NPDM + NACP metadata is a significantly better way to determine what title is running rather than printing the filename.	2022-10-23 20:20:44 +05:30
PixelyIon	a0539a3edb	Trace Scheduler Preemption/Yield in Perfetto	2022-10-22 17:37:03 +05:30
PixelyIon	c874907eb5	Log and flush inside `KProcess::Kill` We want to know when the `KProcess` is being killed and flushing log during it is important since it can often result in hangs due to joining not working correctly.	2022-10-22 17:17:04 +05:30
PixelyIon	597a6ff31d	Wait on slot to be freed in `GraphicBufferProducer::DequeueBuffer` We currently don't wait on a slot to be freed if none are free, this worked prior to async presentation as GBP's slots wouldn't change their state until other commands were called but now slots can be held by the presentation engine. As a result, we now have to wait on the presentation engine to free up slots. This commit also fixes the behavior of the `async` flag in `DequeueBuffer` as it was treated as a non-blocking flag but isn't supposed to do anything on HOS.	2022-10-22 17:15:33 +05:30
lynxnb	cdb2b85d6c	Stub `ngword` and `ngword2` Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>	2022-10-18 20:54:57 +01:00
lynxnb	a7be4fd1e1	Stub `pl::RequestLoad` and `pl::GetSharedFontInOrderOfPriority` Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>	2022-10-18 20:54:57 +01:00
lynxnb	782f9e37ee	Add a system region setting Needed for games such as AC:NH. The `Auto` option automatically selects a region based on the currently selected system language. Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>	2022-10-18 20:54:57 +01:00
lynxnb	d4800d13b8	Stub `hid::ActivateMouse` and `hid::ActivateKeyboard` Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>	2022-10-18 20:54:57 +01:00
lynxnb	45830633eb	Stub `am::SetTerminateResult`	2022-10-18 20:54:57 +01:00
lynxnb	bc016aff47	Make the vulkan validation layer toggleable via setting As part of this commit, a new preference category for debug settings is being introduced. All future settings only relevant for debugging purposes will be put there. The category is hidden on release builds.	2022-10-18 19:47:23 +02:00
lynxnb	5cf14e45e1	Enable frame throttling when triple buffering is disabled	2022-10-18 19:27:14 +02:00
lynxnb	6b76c61cd1	Introduce a `releasedebug` build variant	2022-10-17 18:39:32 +02:00
lynxnb	b17364bb92	Introduce a `dev` app flavor for side-by-side installation	2022-10-01 13:01:46 +02:00
Billy Laws	8d1026d0cc	Reduce font shared memory size for compacted fonts	2022-09-22 21:51:13 +01:00
Billy Laws	25fb09800a	Compact fonts to only include necessary glyphs	2022-09-22 21:51:13 +01:00
Billy Laws	e31ed6a429	Increase font shared memory size for Noto fonts	2022-09-22 21:34:29 +01:00
Billy Laws	0d4893c448	Cleanup font magic generation code	2022-09-22 21:34:29 +01:00
Billy Laws	5c4cc3d51f	Fix font load order to match HOS Without this games would load an inappropriate font file for their target language.	2022-09-22 21:34:29 +01:00
Billy Laws	272bbf6cd2	Switch to Noto Sans fonts for shared fonts replacement Provides CJK characters, which the previous replacements lacked entirely.	2022-09-22 21:34:29 +01:00
lynxnb	54172322fe	Fix host synchronization for texture with a different guest format Host synchronization of a guest texture with a different guest format represents a valid use case where the host doesn't support the guest format and conversion to a host-compatible format must be performed. The issue is most evident on Mali GPUs, as they don't support BCn texture formats thus needing manual decoding before submission. It was disabled by mistake in a previous commit, this commit re-enables it.	2022-09-15 15:22:52 +02:00
TheASVigilante	a1ff4e1777	Implement OpenHardwareOpusDecoderEx and GetWorkBufferSizeEx Implements these 2 functions which were introduced in HOS 12.0.0. Fixes a crash in Xenoblade Chronicles 3.	2022-09-10 22:04:15 +01:00
lynxnb	34bd16426c	Fix quads index buffer conversion not accounting for first index Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index. Indexed quad draws also suffered from the same issue, but was never encountered in games. This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.	2022-09-04 12:42:33 +02:00
Billy Laws	9af5df4bae	Increase IPC pointer buffer size Some services report a pointer buffer size > 0x1000, so up it as to not cause issues when they're implemented.	2022-09-02 23:14:05 +01:00
Billy Laws	51f4e7662e	Add support for the TIPC protocol introduced in HOS 12.0.0 TIPC is a much lighter layer ontop of the Horizon IPC system than CMIF and is used by SM in 12.0.0+. This implementation is slightly hacky since it doesn't really keep a seperation between the underlying kernel IPC stuff and userspace like CMIF/TIPC, this should be fixed eventually, probably together with an IPC dispatch rewrite to avoid the mess of frozen maps. Tested with Hentai Uni, which now crashes needing 'ldr:ro'.	2022-09-02 23:13:23 +01:00
Billy Laws	a40d7c78ad	Always recreate oboe stream on error This is what's done in oboe examples to avoid spurious errors breaking audio entirely.	2022-08-31 21:26:14 +01:00
Billy Laws	8917ec9c88	Don't set framesPerCallback for main stream as per oboe guidance It's best to let oboe figure it out on it's own	2022-08-31 21:26:14 +01:00
Billy Laws	b00008daf5	Fix identifier release check in AudioTrack::Stop	2022-08-31 21:26:14 +01:00
Billy Laws	d9c8e62d1c	Don't warn on GetConfig IOCTL fails	2022-08-31 21:26:14 +01:00
Billy Laws	4aef24ba32	Implement NVGPU_GPU_IOCTL_GET_GPU_TIME in nvdrv	2022-08-31 21:26:14 +01:00
Billy Laws	5841799420	Fix decoding of IOCTLs with padding at the end	2022-08-31 21:26:14 +01:00
Billy Laws	82444f3b0a	Don't set push descrptor flag for desc sets This is redundant and against the spec since we no longer use push descriptors.	2022-08-31 21:26:14 +01:00
PixelyIon	94fdd6aa43	Fix HID touch points not being removed from screen Tapping anything in titles that supported touch (such as Puyo Puyo Tetris or Sonic Mania) wouldn't work due to the first touch point never being removed from the screen, it is supposed to be removed after a 3 frame delay from the touch ending. This commit introduces a mechanism to "time-out" touch points which counts down during the shared memory updates and removes them from the screen after a specified timeout duration.	2022-08-31 23:43:02 +05:30
PixelyIon	70ad4498a2	Write HID LIFO entries at fixed intervals Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior. This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest. Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>	2022-08-31 22:49:36 +05:30
PixelyIon	7966bfa9f6	Fix PI update `KThread::waiterMutex` deadlock It was determined that deadlocks inside `KThread::UpdatePriorityInheritance` would not only arise from the first level of locking with `waitingOn->waiterMutex` but also the second level of locking with `nextThread->waiterMutex` which has now also been fixed to fallback when facing contention.	2022-08-28 20:15:08 +05:30
Abandoned Cart	86f6fc510e	Remove printing result message from author There is no purpose in printing the same result twice, so any duplicate messages should instead allow the field to be truncated.	2022-08-27 18:54:27 +05:30
Abandoned Cart	1013857fc4	Refactor subtitle as author to remove subtitle Subtitle is no longer used, so instances have been rerouted to author. DataItem was also updated to reflect the removal of a subtitle.	2022-08-27 18:54:27 +05:30
Abandoned Cart	04cae942ea	Follow typical per-file detail formatting Format the details in the expected format of individual files (instead of a complete game) and move the code to match the updated placement.	2022-08-27 18:54:27 +05:30
lynxnb	5d6eaee301	Correctly save/restore ROM version to/from game entry cache PR #1758 introduced a bug where the game list would be entirely loaded every time the app was opened. This commit addresses that issue, which was caused by the `version` member of the cached game list being serialized to file (although incorrectly) but never actually read back when deserializing.	2022-08-21 20:24:36 +02:00
lynxnb	cfa5f0e030	Fix OSC alpha not changing on button press	2022-08-20 17:00:40 +02:00
KikiManjaro	8fb4e62c28	Add version information about rom Review: Co-authored-by: Niccolò Betto <niccolo.betto@gmail.com>	2022-08-20 13:48:07 +02:00
KikiManjaro	3407f6d530	Add OSC opacity adjustment	2022-08-20 13:46:17 +02:00
lynxnb	d129fb09cd	Android Manifest cleanup * Remove `package` from manifest and from activity prefixes, gradle `namespace` will be used instead * Removed deprecated `android.support.PARENT_ACTIVITY` metadata entries * Make `MainActivity` and `SettingsActivity` launched in `singleTop` mode to avoid unnecessary activity restarts while navigating the app	2022-08-17 15:33:53 +02:00
lynxnb	d128856c7d	Update target SDK version to Android 13	2022-08-17 12:29:46 +02:00
lynxnb	39f398f76b	Update Kotlin (1.7.10), NDK (25.0.8775105), AGP (7.2.2) and Kotlin deps	2022-08-17 12:28:31 +02:00
lynxnb	e9618d9e2c	Use pragma pack directions for tightly packing structs containing `u128` Using `__attribute__((packed))` doesn't work in new NDKs when a struct contains 128-bit integer members, likely because of a ndk/compiler bug. We now enclose the requiring structs in `#pragma pack` directives to tightly pack them.	2022-08-17 12:22:11 +02:00
lynxnb	c4bf92a49f	Fix Kotlin compilation errors from incorrect overloading of null-safe types	2022-08-17 12:16:26 +02:00
Billy Laws	bf491f71f9	Simplify blit helper shader vertex order	2022-08-10 15:43:16 +01:00
Billy Laws	c32bec071c	Adjust blit src{X,Y} to account for centred sampling before calling into helper shader Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.	2022-08-10 15:39:37 +01:00
Billy Laws	08f36aac33	Enable hades vertex position input workaround for Adreno Caused crashes in any games using geometry shaders as by default hades uses the position builtin directly.	2022-08-08 18:09:00 +01:00
Billy Laws	04e7b684d2	Enable vertexPipelineStoresAndAtomics, fragmentStoresAndAtomics and shaderStorageImageWriteWithoutFormat Vulkan features Used by Xenoblade Chronicles DE	2022-08-08 17:43:18 +01:00
Billy Laws	390558c802	Add partial support for legacy attribute conversion We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.	2022-08-08 17:43:18 +01:00
Billy Laws	540437b547	Fixup index buffer view caching We forgot to set the view size, which would end up forcing a view to be recreated with every call	2022-08-08 17:43:18 +01:00
Billy Laws	c966cd3b26	Prevent runtimeInfo vertex state from leaking into wrong shaders This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.	2022-08-08 17:43:13 +01:00
Billy Laws	c52d3195cf	Ensure shader stage enable state matches pipeline stage enable state As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.	2022-08-08 17:40:35 +01:00
Billy Laws	b1c669ba14	Always keep the VertexB shader stage enabled HW doesn't allow disabling the VertexB stage, enforce this in code.	2022-08-08 17:40:35 +01:00
lynxnb	d5174175d1	Implement indexed quads support We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.	2022-08-08 17:40:35 +01:00
lynxnb	e6741642ba	Split out megabuffer allocation from pushing data The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.	2022-08-08 17:40:35 +01:00
Billy Laws	cdc6a4628a	Enable VK uint8 indices feature when supported	2022-08-08 17:40:35 +01:00
Billy Laws	dccc86ea97	Implement transform feedback with VK_EXT_transform_feedback Tested to work in Xenoblade Chronicles DE, the code handles both hades varying input and buffer setup.	2022-08-08 17:40:35 +01:00
Billy Laws	06053d3caf	Rewrite Fermi 2D engine to use the blit helper shader Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock	2022-08-08 17:40:35 +01:00
Billy Laws	395f665a13	Implement a system for helper shaders together with a simple blit shader It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may need in the future.	2022-08-08 17:40:35 +01:00
Billy Laws	1da1698f90	Disable unused Vulkan HPP setters and smart handles	2022-08-08 14:57:44 +01:00
Billy Laws	f4e58a9238	Remove redundant synchost creating a new buffer	2022-08-08 14:57:44 +01:00
Billy Laws	11a8feb037	Correct nvdrv DMA copy class ID Was wrongly copy and pasted.	2022-08-08 14:57:44 +01:00
Billy Laws	13e7b54c61	Ensure failed IOCTLs are logged as a warning log	2022-08-08 14:57:44 +01:00
Billy Laws	eeb86a4f8a	Calculate renderArea from min(attachments.dimensions...) Vulkan doesn't support a renderArea larger than that of the smallest attachment	2022-08-08 14:57:44 +01:00
Billy Laws	9ea658d0ed	Don't throw on unsupported TIC formats These sometimes spuriously occur in games during transitions, to avoid crashing during them just use the null texture if they occur and log an error log	2022-08-08 14:57:44 +01:00
Billy Laws	856818c8eb	Emulate the 'None' mipfilter by adjusting LOD Borrowed this technique from yuzu since Vulkan has no direct equivalent	2022-08-08 14:57:44 +01:00
Billy Laws	9d50b6d0f7	Avoid locking presentation mutex in GetTransformHint This caused slowdown in Pokemon as it was being called every frame	2022-08-08 14:57:44 +01:00
Billy Laws	460e6c9c84	Use raw pointers to hold constant buffer views The constant destruction and creation of `BufferView`s in cbuf-heavy games showed up as a large chunk of the profiler. Fix this by taking advantage of the fact that constant buffer `BufferView`s are never deleted and always kept around in the cache to just return a pointer to them in the cache.	2022-08-08 14:54:57 +01:00
Billy Laws	6b2e84712b	Avoid race in nvdrv debug prints Looking up the device name without locking it could race with map insertions or deletions, so lock it to avoid that	2022-08-08 13:24:23 +01:00
Billy Laws	683cd594ad	Use a linear allocator for most per-execution GPU allocations Currently we heavily thrash the heap each draw, with malloc/free taking up about 10% of GPFIFOs execution time. Using a linear allocator for the main offenders of buffer usage callbacks and index/vertex state helps to reduce this to about 4%	2022-08-08 13:24:21 +01:00
Billy Laws	70eec5a414	Store delegate attached state within the delegate itself Avoids a costly map lookup for every AttachBuffer call, this was a serious bottleneck in SMO	2022-08-08 13:23:26 +01:00
Billy Laws	0268e1d5a0	Force a submit before any i2m engine writes We need traps to be inplace so we dont end up overwriting a resource that's being actively used by the current context without setting it to dirty	2022-08-08 13:22:37 +01:00
Billy Laws	cb0b132486	Allow supplying push constants to GetPipeline	2022-08-08 13:22:37 +01:00
Billy Laws	1c8863ec3b	Use const references for holding pipeline state in pipeline cache Allows passing in constexpr structs to state directly	2022-08-08 13:22:37 +01:00
Billy Laws	b6b04fa6c5	Use small_vector for VMM TranslateRange results This was the source of a lot of heap allocs, moving to small_vector helps to avoid most of them	2022-08-08 13:22:37 +01:00
Billy Laws	1fe6d92970	Wait on Swapchain Image copy to complete Certain titles can have a display frames out of order due to not waiting on the copy from the final RT to the swapchain image to occur. Although `PresentFrame` does wait on the syncpoint, that isn't enough to ensure the source texture is up-to-date due to us signalling syncpoints early. By waiting on the swapchain texture after the copy is submitted, we now implicitly wait on the source texture's cycle to be signalled thus waiting on the frame to be done which fixes the issue.	2022-08-07 03:12:27 +05:30
PixelyIon	5b7572a8b3	Introduce chunked `MegaBuffer` allocation After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage. This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.	2022-08-07 03:12:27 +05:30
Billy Laws	99b5fc35c6	Change `SegmentTable` semantics to respect unset entries Accesses to unset entries is now clearly defined as returning a 0'd out value, the prior behavior would be to optimize sets for border segments to use L2 atomicity when the specific segment had no L1 entries set. This would lead to any future lookups of offsets within the same L2 segment but a different L1 entry to incorrectly return an inaccurate value as the only prior guarantee was that lookups after setting a segment would return the same value as was set but lacked the guarantee for unset segments to also consistently return unset values. This could lead to issues in practical usages such as the `BufferManager` lookups returning the existence of a `Buffer` at a location falsely even though the segment was never set to the value, this was problematic as raw pointers were utilized and bound checks would lead to a segmentation fault. This commit fixes this issue by introducing this guarantee and refactoring the class accordingly, it also deletes the `Set` method for setting a single entry as the meaning is ambiguous and it's functionality was more akin to the past guarantee and no longer makes sense. Co-authored-by: PixelyIon <pixelyion@protonmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	36b8d3c445	Account for `SegmentTable` insertions entirely within an L2 entry We would always write all L1 entries that correspond to an L2 entry, even if setting an input range ended before that. This would effectively reduce the atomicity of the segment table to that of the L2 range and lead to breaking API guarantees by returning entirely wrong segment values for a lookup covering a region that was overwritten.	2022-08-06 22:20:54 +05:30
PixelyIon	c72316d9f6	Rename `RangeTable` to `SegmentTable` It was determined that `RangeTable` was too ambiguous of a name as it could be interpreted to be holding ranges rather than looking them up, to avoid confusion the terminology has been changed to `range` to `segment`. As "segment table" is more clear in describing that it is a table comprised of descriptors regarding segments and it avoids any overlaps with terminology concerning "pages" which would be overly specific for this data structure or the ambiguous "ranges".	2022-08-06 22:20:54 +05:30
Billy Laws	5398eff045	Fix `KProcess::MutexUnlock` PI CAS The PI CAS in `MutexUnlock` ends up loading `basePriority` rather than `priority` which could lead to an infinite CAS loop when `basePriority` doesn't equal to `priority` and the `highestPriorityThread`'s priority is lower than `basePriority`.	2022-08-06 22:20:54 +05:30
PixelyIon	850c0f4092	Make `Texture::SynchronizeGuest` Blocking It was determined that `Texture::SynchronizeGuest`'s `TextureBufferCopy` had races that were exposed by the introduction of the cycle waiter thread, the synchronization did not take place under a locked context so the texture could be mutated at any point in addition to the destructor not being run during `FenceCycle::Wait` due to `shouldDestroy` being `false`. This commit fixes the issue by making `SynchronizeGuest` entirely blocking as all usages of the function required blocking semantics regardless so it would be pointless to retain its async nature while solving any races that may arise from it being async. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
Billy Laws	77d15b02a3	Ensure backing continuity when recreating GPU dirty buffers Since we don't call `SynchronizeHost` on source buffers which are GPU dirty, their mirrors will be out of date. The backing contents of this source buffer's region in the new buffer will be incorrect. By copying from the backing directly, we can ensure that no writes are lost and that if the newly created buffer needs to turn GPU dirty during recreation no copies need to be done since the backing is as up to date as the mirror at a minimum.	2022-08-06 22:20:54 +05:30
Billy Laws	c1bf5a804a	Extend `stateMutex` scope inside `Buffer::SynchronizeHost` The code is much simpler to reason about when reading the code as it doesn't require evaluating all the potential edge cases of trap handlers in different states. It should be noted that this should not change behavior in any meaningful way, at most it can prevent a minor race where the protection could be upgraded after being downgraded by the signal handler leading to a redundant trap.	2022-08-06 22:20:54 +05:30
PixelyIon	c3cf79cb39	Rework `KThread::waiterMutex` Locking Two issues exist with locking of `KThread::waiterMutex`: * It was not always locked when accessing waiter members such as `waitThread`, `waitKey` and `waitTag` which would lead to a race that could end up in a deadlock or most notably a segfault inside `UpdatePriorityInheritance` * There could be a deadlock from `UpdatePriorityInheritance` locking `waiterMutex` of a thread and waiting to get the owner's `waiterMutex` while on another thread `MutexUnlock` holds the owner's `waiterMutex` and waits on locking the `waiterMutex` held by `UpdatePriorityInheritance` This commit fixes both issues by adding appropriate locking to all locations where waiter members are accessed in addition to adding a fallback mechanism inside `UpdatePriorityInheritance` that unlocks `waiterMutex` on contention to avoid a deadlock.	2022-08-06 22:20:54 +05:30
PixelyIon	68615703c1	Fix `KProcess`/`SetThreadPriority` PI CAS The condition for exiting the CAS loops is incorrect in several places which leads to additional loops, while this doesn't make the behavior incorrect it does lead to redundant iterations. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	8fc3cc7a16	Rework Descriptor Set Allocation/Updates A substantial amount of time would be spent on creation/destruction of `VkDescriptorSet` which scales on titles doing a substantial amount of draws with bindings, this leads to poor performance on those titles as the frametime is dragged down by performing these tasks while they repeatedly create descriptor sets of the same layouts. This commit fixes it by pooling descriptor sets per-layout in a dynamically resizable pool and keeping them around rather than destroying them after usage which leads to the vast majority of cases not requiring a new descriptor set to even be created. It leads to significantly improved performance where it would otherwise be spent on redundant destruction/recreation or push descriptor updates which took a substantial amount of time themselves. Additionally, the `BaseDescriptorSizes` were not kept up to date with all of the descriptor types, it led to no crashes on Adreno/Mali as they were purely used for size calculations on either driver but has been corrected to avoid any future issues.	2022-08-06 22:20:54 +05:30
PixelyIon	e1a4325137	Introduce `FenceCycle` Waiter Thread A substantial amount of time is spent destroying dependencies for any threads waiting or polling `FenceCycle`s, this is not optimal as it blocks them from moving onto other tasks while destruction is a fundamentally async task and can be delayed. This commit solves this by introducing a thread that is dedicated to waiting on every `FenceCycle` then signalling and destroying all dependencies which entirely fixes the issue of destruction blocking on more important threads.	2022-08-06 22:20:54 +05:30
PixelyIon	5f8619f791	Optimize Buffer Lookups using Range Tables Buffer lookups are a fairly expensive operation that we currently spend `O(log n)` on the simplest and most frequent case of which is a direct match, this is a very frequent operation where that may be insufficient. This commit optimizes that case to `O(1)` by utilizing a `RangeTable` at the cost of slightly higher insertion/deletion costs for setting ranges of values but these are minimal in frequency compared to lookups.	2022-08-06 22:20:54 +05:30
PixelyIon	578ae86cca	Implement Multi-Level Range Table A data structure that can represent the same value for a range of addresses (pages) is required for fast lookup in certain cases. This commit implements a near optimal data structure for mass insertion and O(1) lookup of range-based data, this is achieved using the host MMU and implementing multiple levels of atomicity for the ranges. It should be noted that the table is limited to two levels but can be extended to a variable amount of ranges in the future, it was determined that additional levels of ranges can be beneficial for performance depending on the specific use-case.	2022-08-06 22:20:54 +05:30
Billy Laws	38eab80ed8	Disable Vulkan Push Descriptors on Adreno Adreno drivers have certain errata which leads to Vulkan Push Descriptors to be broken on them in certain cases which leads to a descriptor set update being swallowed. This has been worked around by disabling push descriptors on Adreno drivers, this may lead to reduced performance on certain titles which frequently bind new descriptors.	2022-08-06 22:20:54 +05:30
Billy Laws	88fd491ed5	Submit after Maxwell3D Semaphore Release Any semaphore releases are implicit synchronization events that can be utilized by the guest to pick up that the GPU has executed till a certain point and therefore we must submit all prior work accordingly.	2022-08-06 22:20:54 +05:30
Billy Laws	b77da1182f	Don't flush submission on DMA Copies DMA copies utilized `SubmitWithFlush` instead of `Submit`, this is not required and incurs significant additional synchronization penalties which will no longer be required.	2022-08-06 22:20:54 +05:30
PixelyIon	0992fde028	Don't block on surface creation in `GetTransformHint` We want to avoid blocking on surface creation unless necessary, this commit doesn't wait on the creation of the surface as it default initializes the value which'll generally be `Identity` or the transformation of the previous surface if it was lost. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
Billy Laws	35133381b6	Fix V-Sync `KEvent` construction order The V-Sync `KEvent` would be used by the presentation thread prior to construction leading to dereferencing an invalid value, this has been fixed by changing the order of construction to move the construction of the presentation thread after the V-Sync event.	2022-08-06 22:20:54 +05:30
PixelyIon	ffad246d67	Split NCE Trap page-out functionality from `TrapRegions` The `TrapRegions` function performed a page-out on any regions that were trapped as read-only, this wasn't optimal as it would tie them both into the same operation while Buffers/Textures require to protect then synchronize and page-out. The trap was being moved to after the synchronize to get around this limitation but that can cause a potential race due to certain writes being done after the synchronization but prior to the trap which would be lost. This commit fixes these issues by splitting paging out into `PageOutRegions` which can be called after `TrapRegions` by any API users. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	da464d84bc	Consolidate `NCE::TrapRegions` functionality into `CreateTrap` `NCE::TrapRegions` was a bit too overloaded as a method as it implicitly trapped which was unnecessary in all current usage cases, this has now been made more explicit by consolidating the functionality into `NCE::CreateTrap` which handles just creation of the trap and nothing past that, `RetrapRegions` has been renamed to `TrapRegions` and handles all trapping now. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	8a62f8d37b	Rework Texture Synchronization API + Locking Similar to `Buffer`s, `Texture`s suffered from unoptimal behavior due to using atomics for `DirtyState` and had certain bugs with replacement of the variable at times where it shouldn't be possible which have now been fixed by moving to using a mutex instead of atomics. This commit also updates the API to more closely match what is expected of it now and removes any functions that weren't utilized such as `SynchronizeGuestWithBuffer`.	2022-08-06 22:20:54 +05:30
Billy Laws	04bcd7e580	Rework Buffer `DirtyState` with `BackingImmutability` Having a single variable denoting the exact state of a buffer and the operations that could be performed on it was found to be too restrictive, it's now been expanded into an additional `BackingImmutability` variable but due to these two. We can no longer use atomics without significant additional complexity so all accesses to the state are now mediated through `stateMutex`, a mutex specifically designed for tracking the state. While designing the system around `stateMutex` it was determined to be more efficient than atomics as it would enforce blocking far less than it would generally have been compared to if the regular atomic fallback of locking the main resource lock which is locked for significantly longer generally. Co-authored-by: PixelyIon <pixelyion@protonmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	1af781c0a5	Add Perfetto Tracing to NCE Trapping API As a performance sensitive part of code, the NCE Trapping API benefits from having tracing and it helps us better determine where guest code is spending its time for more targeted optimizations.	2022-08-06 22:20:54 +05:30
PixelyIon	9d294b9ccc	Use `weak_ptr` for `TrapHandler` Callbacks The lifetime of the `this` pointer in the trap callbacks could be invalid as the lifetime of the underlying `Buffer`/`Texture` object wasn't guaranteed, this commit fixes that by passing a `weak_ptr` of the objects into the callbacks which is locked during the callbacks and ensures that a destroyed object isn't accessed. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
Billy Laws	96d8676d5b	Fix `SubmitWithFlush` not updating `MegaBuffer` cycle The `CommandExecutor`'s `MegaBuffer` was not being updated with the latest `FenceCycle` on being flushed in `SubmitWIthFlush`, this led to the megabuffer being overwritten prior to its GPU-side usage being complete. This commit fixes that by replacing the cycle to the latest cycle and prevents any races that occurred prior.	2022-08-06 22:20:54 +05:30
PixelyIon	3e9d84b0c3	Split `FindOrCreate` functionality across `BufferManager` `FindOrCreate` ended up being monolithic function with poor readability, this commit addresses those concerns by refactoring the function to split it up into multiple member functions of `BufferManager`, while some of these member functions may only have a single call-site they are important to logically categorize tasks into individual functions. The end result is far neater logic which is far more readable and slightly better optimized by virtue of being abstracted better.	2022-08-06 22:20:54 +05:30
PixelyIon	d2a34b5f7a	Implement `ContextLock` Move-assignment operator In certain cases the move constructor may not suffice and the move assignment operator is required, this commit implements that and moves to using a pointer for storing the `resource` member rather than a reference as its semantics matched what we desired more and allowed for assignment of the `resource`.	2022-08-06 22:20:54 +05:30
PixelyIon	38d3ff4300	Fix `BufferManager::FindOrCreate` Recreation Bugs It was determined that `FindOrCreate` has several issues which this commit fixes: * It wouldn't correctly handle locking of the newly created `Buffer` as the constructor would setup traps prior to being able to lock it which could lead to UB * It wouldn't propagate the `usedByContext`/`everHadInlineUpdate` flags correctly * It wouldn't correctly set the `dirtyState` of the buffer according to that of its source buffers	2022-08-06 22:20:54 +05:30
Billy Laws	d1a682eace	Fix `setDirty` behavior in `Buffer::SynchronizeGuest` The condition for `setDirty` in the dirty state CAS was inverted from what it should've been resulting in synchronizing incorrectly, this commit fixes the condition to correct synchronization.	2022-08-06 22:20:54 +05:30
Billy Laws	00d434efdc	Remove `Texture::CopyFrom` format check The formats of the textures involved in a texture were checked for equality, this broke certain copies as the presentation engine would invoke copies between textures of different yet compatible formats. Co-authored-by: PixelyIon <pixelyion@protonmail.com>	2022-08-06 22:20:54 +05:30
Billy Laws	58174f255f	Improve `ContextLock` semantics `ContextLock` had unoptimal semantics in the form of direct access to the `isFirst` member which wasn't clearly defined, it's now been broken up into function calls `IsFirstUsage` and `OwnsLock` with explicit move semantics and a function for releasing the lock. Co-authored-by: PixelyIon <pixelyion@protonmail.com>	2022-08-06 22:20:54 +05:30
Billy Laws	561103d3da	Submit GPFIFO work prior to `CircularQueue` waiting The position at which we call submit is a significant factor in performance and we did so at the end of PBs (PushBuffers), this isn't optimal as there could be multiple PBs queued up that would benefit from being in the same submission. We now delay the submission of the workload till we run out of PBs.	2022-08-06 22:20:54 +05:30
PixelyIon	3ac5ed8c06	Attach coalesced `Buffer` if any source `Buffer` is attached A buffer that's attached to a context could be coalesced into a larger buffer which isn't attached, this would break as it wouldn't keep the buffer alive till the end of the associated context. To fix this if any source buffers are attached then the resulting coalesced buffer is also attached now.	2022-08-06 22:20:54 +05:30
PixelyIon	284ac53d88	Fix KThread Priority Inheritance CAS The CAS condition for KThread PI was inverted which lead to entirely incorrect behavior for CAS conditions which while it might work in the vast majority of cases would lead to significantly inaccurate behavior.	2022-08-06 22:20:54 +05:30
PixelyIon	45cb8388cc	Fix NCE Trap API Lock Callback The lock callback would `continue` which would end up skipping over the current item as it applied to the inner loop rather than the outer loop as intended. This has now been fixed by using `break` and a check instead.	2022-08-06 22:20:54 +05:30
PixelyIon	745d809e07	Fix `Buffer::SynchronizeGuest` Non-Blocking Behavior The buffer's non-blocking behavior could lead to an invalid state where the dirty state doesn't adequately represent the buffer's true state, the check has now been moved inside the CAS loop as its behavior changes depending on the dirty state. In addition, `SynchronizeGuest` returns a boolean denoting if the synchronization was successful now to make code flows depending on non-blocking synchronization cleaner.	2022-08-06 22:20:54 +05:30
PixelyIon	c1f2445772	Set state to `CpuDirty` directly in `SynchronizeGuest` `SynchronizeGuest` could only set the dirty state to `Clean` which was redundant since calls to it from inside the write trap handler would set it to `CpuDirty` directly after, this fixes that by doing it inside the function when necessary.	2022-08-06 22:20:54 +05:30
PixelyIon	4f6a67af36	Fix `Texture` Trap Data Race The trap callbacks did not wait on the `Texture` to complete synchronization to the guest, this resulted in races where the contents written to the texture would be overwritten by the synced content. This commit fixes that by waiting on the fences at the end of the trap callback.	2022-08-06 22:20:54 +05:30
PixelyIon	cb7c3602e7	Attach `TextureView` to `FenceCycle` The lifetime of `TextureView` objects wasn't correctly managed as they weren't being attached the the `FenceCycle` in `AttachTexture`, this led to them getting deleted and causing all sorts of UB.	2022-08-06 22:20:54 +05:30

1 2 3 4 5 ...

1364 Commits