Commit Graph

1901 Commits

Author SHA1 Message Date
Billy Laws
388cff3353 Implement simple pipeline transition cache
Avoids the need to hash PipelineState when we can guess the pipeline that will be used next. This could very easily be optimised in the future with generational, usage-based caching if necessary.
2022-11-02 17:46:07 +00:00
Billy Laws
302b2fcc3f Force flush when dirty refresh returns true 2022-11-02 17:46:07 +00:00
Billy Laws
ec4ea5c5d7 Supply dispatcher manually for shader creation 2022-11-02 17:46:07 +00:00
Billy Laws
3404a3abdb Implement macro HLE for instanced draw macros
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
2022-11-02 17:46:07 +00:00
Billy Laws
cf0752f937 Use NCE memory tracking for guest shaders
Prevents needing to hash them for every single pipeline state update, without this just hashing shaders takes up a significant amount of time.
2022-11-02 17:46:07 +00:00
Billy Laws
19a75c3f65 Bind all pipeline states to main pipeline dirty state 2022-11-02 17:46:07 +00:00
Billy Laws
a04d8fb5cf Setup minimal viewport Vulkan pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
fe51db366b Mark all dirty resources as dirty initially 2022-11-02 17:46:07 +00:00
Billy Laws
abfa5929f1 Treat vertex buffers with base addr 0 as disabled 2022-11-02 17:46:07 +00:00
Billy Laws
e71ca05f19 Avoid bitfields for signed enum types in PackedPipelineState 2022-11-02 17:46:07 +00:00
Billy Laws
2f2b615780 Add dynamic state support to VK graphics pipeline cache 2022-11-02 17:46:07 +00:00
Billy Laws
ad045058ee Cleanup some redundant comments and includes in pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
9b05c9c0c3 Introduce a pipeline manager and partial pipeline object
gpu-new will use a monolithic pipeline object for each pipeline to store state, keyed by the PackedPipelineState contents. This allows for a greater level of per-pipeline optimisations and a reduction in the overall number of lookups in a draw compared to the previous system.
2022-11-02 17:46:07 +00:00
Billy Laws
2c1b40c9a8 Add some missed pipeline state members used by Hades 2022-11-02 17:46:07 +00:00
Billy Laws
0c6fa22c6b Introduce pipeline shader stage state
Simple state that generates a hash that can be used in the packed state and spans over the binary for pipeline creation.
2022-11-02 17:46:07 +00:00
Billy Laws
a6bb716123 Move packed pipeline state to a seperate file 2022-11-02 17:46:07 +00:00
Billy Laws
4dcbf5c3a0 Drop the caching aspect of shader manager entirely
Caching here was deemed unnecessary since it will be done implicitly by the pipeline cache and creates issues with the legacy attribute conversion pass. It now purely serves as a frontend for Hades.
2022-11-02 17:46:07 +00:00
Billy Laws
e77e4891dc Tidy up Maxwell 3D regs a bit 2022-11-02 17:46:07 +00:00
Billy Laws
405d26fc22 Introduce Maxwell 3D shader state
Simple state holder that hashes the stored shader and reads it into a buffer
2022-11-02 17:46:07 +00:00
Billy Laws
6865f0bdaf Transition color blend state to packed pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
7049a521d2 Use Vulkan types directly in PackedPipelineState where possible 2022-11-02 17:46:07 +00:00
Billy Laws
effeb074b6 Move pipeline cache Key to cpp file and rename to PackedPipelineState
The new name is more indicative of what the struct contains and how it will be used as more than just a key.
2022-11-02 17:46:07 +00:00
Billy Laws
e1512c91a0 Transition depth stencil state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
1f844e2c18 Transition rasterization state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
9a6efb091c Transition tessellation state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ae5d419586 Transition input assembly state to pipeline cache key 2022-11-02 17:46:07 +00:00
Billy Laws
3f9161fb74 Transition vertex input state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
Billy Laws
ffe24aa075 Introduce a base pipeline cache key starting with RTs
It was determined that a general purpose Vulkan pipeline cache isn't viable for the significant performance reqs of Draw(), by using a Maxwell 3D specific key we can shrink state significantly more than if we used Vulkan structs.
2022-11-02 17:46:07 +00:00
Billy Laws
1e8f7d7fcb Implement dynamic pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
a94040ac7d Add default cases to enum conversions where necessary 2022-11-02 17:46:07 +00:00
Billy Laws
6e55d4dcf4 Implement color blend pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
cb11662ea5 Implement depth stencil pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
2484a2d6b5 Add A1R5G5B5Unorm and S8Uint formats 2022-11-02 17:46:07 +00:00
Billy Laws
690e96bce0 Implement rasterization pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
90cd6adb91 Implement tessellation pipeline state 2022-11-02 17:46:07 +00:00
Billy Laws
3649d4c779 Adapt Maxwell 3D engine to new interconnect code
Removes all usage of graphics_context.h from the codebase, exclusively using the new interconnect and its dirty tracking system. While porting the code a number of bugs were discovered such as not respecting the base instance or primitive type override, which have all been fixed. Currently only clears and constant buffer updates are implemented but due to the dirty state system allowing register handling on the interconnect end there shouldn't end up being many more changes.
2022-11-02 17:46:07 +00:00
Billy Laws
ef11900a39 Introduce reworked Maxwell 3D core interconnect
This mainly distributes operations down to activeState and pipelineState, aside from clears which are implemented in-place. The exposed interface is much reduced as opposed to the previous GraphicsContext system due to the newly introduced dirty system, this should hopefully make the code more maintainable and keep actual rendering operations seperate from primitive restart state or whatever. Currently draws are unimplemented and the only full implemented things are clears and constant buffer operations.
2022-11-02 17:46:07 +00:00
Billy Laws
37b821a4dc Introduce Maxwell 3D interconnect active state
Active state encapsulates all state that isn't part of a pipeline and can  be set dynamically with Vulkan calls. This includes both dynamic state like stencil faces, and command buffer state like vertex buffer bindings.
Simililarly to the last commit, the main goal of this is to reduce the number of redundant work done per draw by employing dirty state as much as possible. Without using dirty state for this every active state operation would need to be performed every draw, which gets very expensive when things like buffer lookups end up being reqiored. Code has also been heavily cleaned up as is described in the previous commit.
2022-11-02 17:46:07 +00:00
Billy Laws
5fdda78073 Introduce Maxwell 3D interconnect pipeline state
The main goal of this is to reduce the number of redundant lookups and work done per draw as much as possible, this is mainly achived through heavy used of dirty tracking though other optimisations like heavily using the linear allocator are also in play. In addition to the goal of performance, the code has been cleaned up and abstracted significantly from its state in graphics_context, hopefully making the GPU interconnect code much more maintainable in the future and reducing the boilerplace needed to add even simple functionality. This commit includes partial pipeline state, enough for implementing clears + a slight bit extra.
2022-11-02 17:46:07 +00:00
Billy Laws
21f5611231 Rewrite constant buffer interconnect code
Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.
2022-11-02 17:46:07 +00:00
Billy Laws
d1e7bbc1d8 Introduce common code for Maxwell 3D interconnect rewrite
This common code will be used across the entirety of the 3D rewrite, it also includes a stub for StateUpdateBuilder, which will be used by active state code to apply state updates.
2022-11-02 17:46:07 +00:00
Billy Laws
a6c49115f9 Rewrite all Maxwell 3D registers up to clears to match Nvidia docs
All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.
2022-11-02 17:46:07 +00:00
Billy Laws
d7eab40f1c Introduce resource based dirty tracking infrastructure
This will be heavily used by the upcoming GPU rework. It provides an intuitive way to track dirtiness based on using the underlying pointers of objects, as opposed to other methods which often need an enum entry per dirty state and don't support overlaps. Wrappers for dirty state objects are also provided to abstract as much of the dirty tracking as possible from user code. The pointer based mechanism also serves to avoid having to handle dirty bindings on the user side of the dirty resources, allowing them to bind things internally instead.
2022-11-02 17:46:07 +00:00
Billy Laws
8471ab754d Introduce a spin lock for resources locked at a very high frequency
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
2022-11-02 17:46:07 +00:00
Billy Laws
d810619203 Drop 3D engine method calling fast path in GPFIFO
This ended up actually turning out to be a slow path when Maxwell 3D method handling code was inlined.
2022-11-02 17:46:07 +00:00
Billy Laws
ded02e3eac Small engine.h fixups 2022-11-02 17:46:07 +00:00
Billy Laws
38ba963311 Drop usage of unique_ptr for Maxwell3D
Since graphics context is being replaced and split into cpp files there will no longer be any circular includes that previously prevented this.
2022-11-02 17:46:07 +00:00
Billy Laws
90db743c56 Source AsGpu GMMU page sizes from GMMU class 2022-11-02 17:46:07 +00:00
Billy Laws
e72fe02c15 Add inline fast-path for Buffer::FindOrCreate()
This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.
2022-11-02 17:46:07 +00:00
Billy Laws
49478e178a Avoid redundantly syncing buffers before every Write in an execution
This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.
2022-11-02 17:46:07 +00:00
Billy Laws
f7a726e452 Allow attempting to write to buffers without passing a GPU copy callback
Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.
2022-11-02 17:46:07 +00:00
Billy Laws
5dca5cc10e Redesign buffer view infra to remarkably reduce creation overhead
Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.
2022-11-02 17:46:07 +00:00
Billy Laws
09f376e500 Add const accessors to OffsetMember 2022-11-02 17:46:07 +00:00
Billy Laws
64a9db2e82 Introduce MergeInto helper for simplified construction of arrays of structs
In the upcoming GPU code each state member will hold a reference to its corresponding Maxwell 3D regs, this helper is needed to allow easy transformation from the the main 3D register struct into them.

Example:
```c++
struct Regs {
    std::array<View, 10> viewRegs;
    u32 enable;
} regs;

struct ViewState {
    const View &view;
    const u32 &enable;
    size_t index;
};

std::array<ViewState, 10> viewStates{MergeInto<ViewState, 10>(regs.viewRegs, regs.enable, IncrementingT{})
```
2022-11-02 17:46:07 +00:00
Billy Laws
2c682f19a6 Add untracked linear allocator emplace/allocate functions
Useful for cases where allocations are guaranteed to be unused by the time `Reset()` is called and calling `Free()` would be difficult or add extra performance cost due to how the allocation is used.
2022-11-02 17:46:07 +00:00
Billy Laws
6359852652 Introduce page size constants and replace all usages of PAGE_SIZE
Avoids using macros and results in code which looks slightly cleaner.
2022-11-02 17:46:07 +00:00
Billy Laws
30ec844a1b Use GPFIFO pushbuffer contents in-place if possible
The memcpy within `Read()` was taking up a fair amount of time, avoid this by using the mapped range in-place when the mapping isn't split.
2022-11-02 17:46:07 +00:00
Billy Laws
be825b7aad Utilise SegmentTable for rapid FlatMemoryManager lookups
In some games performing the binary search in `TranslateRange()` ended up taking a fairly large (~8%) proportion of GPFIFO time. By using a segment table for O(1) lookups this is reduced to <2% for non-split mappings at the cost of slightly increased memory usage (2GiB in the absolute worse case but more like 50MiB in real world situations).

In addition to adapting `TranslateRange()` to use the segment table, a new function `LookupBlock()` for cases where only a single mapping would ever be looked up so the small_vector handling and fallback paths can be skipped and the entire lookup be inlined.
2022-11-02 17:46:07 +00:00
Morph
4ea0b0e1e5 fssrv: IFileSystemProxy: Implement OpenReadOnlySaveDataFileSystem
Forward this function to OpenSaveDataFileSystem for now. A proper implementation should wrap the underlying filesystem with nn::fs::ReadOnlyFileSystem.
2022-10-30 20:04:40 +00:00
Narr the Reg
25b9bb00fd service: hid: Properly clear and set npad devices 2022-10-30 15:52:03 +00:00
german77
cf95cfb056 service: hid: Stub SetPalmaBoostMode 2022-10-30 15:52:03 +00:00
Narr the Reg
4da934579c service: hid: Set the correct maxEntry value and signal on acquire event handle 2022-10-30 15:52:03 +00:00
Dima
baa6b5d5ea check if NpadId is valid when update 2022-10-30 15:51:45 +00:00
Dima
a409f30e91 add GetAvailableLanguageCodeCount for both lists 2022-10-30 15:51:29 +00:00
Dima
51ce3f7c3c Stub IClient::Poll 2022-10-29 19:52:50 +05:30
Billy Laws
1846f533bc Add credits PreferenceCategory 2022-10-25 21:40:28 +01:00
Abandoned Cart
267d25b5a5 Prevent a false positive SecurityException for DocumentsProvider 2022-10-25 20:17:18 +05:30
lynxnb
1ae36cea24 Update ngword2 archive with the correct content 2022-10-25 00:40:46 +02:00
Billy Laws
160c2f3457 Add Ko-Fi credits to settings 2022-10-23 21:14:39 +01:00
PixelyIon
128ea33073 Print NPDM + NACP metadata for title determination
We determined that printing NPDM + NACP metadata is a significantly better way to determine what title is running rather than printing the filename.
2022-10-23 20:20:44 +05:30
PixelyIon
a0539a3edb Trace Scheduler Preemption/Yield in Perfetto 2022-10-22 17:37:03 +05:30
PixelyIon
c874907eb5 Log and flush inside KProcess::Kill
We want to know when the `KProcess` is being killed and flushing log during it is important since it can often result in hangs due to joining not working correctly.
2022-10-22 17:17:04 +05:30
PixelyIon
597a6ff31d Wait on slot to be freed in GraphicBufferProducer::DequeueBuffer
We currently don't wait on a slot to be freed if none are free, this worked prior to async presentation as GBP's slots wouldn't change their state until other commands were called but now slots can be held by the presentation engine. As a result, we now have to wait on the presentation engine to free up slots.

This commit also fixes the behavior of the `async` flag in `DequeueBuffer` as it was treated as a non-blocking flag but isn't supposed to do anything on HOS.
2022-10-22 17:15:33 +05:30
lynxnb
cdb2b85d6c Stub ngword and ngword2
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
a7be4fd1e1 Stub pl::RequestLoad and pl::GetSharedFontInOrderOfPriority
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
782f9e37ee Add a system region setting
Needed for games such as AC:NH.
The `Auto` option automatically selects a region based on the currently selected system language.

Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
d4800d13b8 Stub hid::ActivateMouse and hid::ActivateKeyboard
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
2022-10-18 20:54:57 +01:00
lynxnb
45830633eb Stub am::SetTerminateResult 2022-10-18 20:54:57 +01:00
lynxnb
bc016aff47 Make the vulkan validation layer toggleable via setting
As part of this commit, a new preference category for debug settings is being introduced. All future settings only relevant for debugging purposes will be put there. The category is hidden on release builds.
2022-10-18 19:47:23 +02:00
lynxnb
5cf14e45e1 Enable frame throttling when triple buffering is disabled 2022-10-18 19:27:14 +02:00
lynxnb
6b76c61cd1 Introduce a releasedebug build variant 2022-10-17 18:39:32 +02:00
lynxnb
b17364bb92 Introduce a dev app flavor for side-by-side installation 2022-10-01 13:01:46 +02:00
Billy Laws
8d1026d0cc Reduce font shared memory size for compacted fonts 2022-09-22 21:51:13 +01:00
Billy Laws
25fb09800a Compact fonts to only include necessary glyphs 2022-09-22 21:51:13 +01:00
Billy Laws
e31ed6a429 Increase font shared memory size for Noto fonts 2022-09-22 21:34:29 +01:00
Billy Laws
0d4893c448 Cleanup font magic generation code 2022-09-22 21:34:29 +01:00
Billy Laws
5c4cc3d51f Fix font load order to match HOS
Without this games would load an inappropriate font file for their target language.
2022-09-22 21:34:29 +01:00
Billy Laws
272bbf6cd2 Switch to Noto Sans fonts for shared fonts replacement
Provides CJK characters, which the previous replacements lacked entirely.
2022-09-22 21:34:29 +01:00
lynxnb
54172322fe Fix host synchronization for texture with a different guest format
Host synchronization of a guest texture with a different guest format represents a valid use case where the host doesn't support the guest format and conversion to a host-compatible format must be performed. The issue is most evident on Mali GPUs, as they don't support BCn texture formats thus needing manual decoding before submission. It was disabled by mistake in a previous commit, this commit re-enables it.
2022-09-15 15:22:52 +02:00
TheASVigilante
a1ff4e1777
Implement OpenHardwareOpusDecoderEx and GetWorkBufferSizeEx
Implements these 2 functions which were introduced in HOS 12.0.0. Fixes a crash in Xenoblade Chronicles 3.
2022-09-10 22:04:15 +01:00
lynxnb
34bd16426c Fix quads index buffer conversion not accounting for first index
Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index.
Indexed quad draws also suffered from the same issue, but was never encountered in games.
This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.
2022-09-04 12:42:33 +02:00
Billy Laws
9af5df4bae Increase IPC pointer buffer size
Some services report a pointer buffer size > 0x1000, so up it as to not cause issues when they're implemented.
2022-09-02 23:14:05 +01:00
Billy Laws
51f4e7662e Add support for the TIPC protocol introduced in HOS 12.0.0
TIPC is a much lighter layer ontop of the Horizon IPC system than CMIF and is used by SM in 12.0.0+. This implementation is slightly hacky since it doesn't really keep a seperation between the underlying kernel IPC stuff and userspace like CMIF/TIPC, this should be fixed eventually, probably together with an IPC dispatch rewrite to avoid the mess of frozen maps.

Tested with Hentai Uni, which now crashes needing 'ldr:ro'.
2022-09-02 23:13:23 +01:00
Billy Laws
a40d7c78ad Always recreate oboe stream on error
This is what's done in oboe examples to avoid spurious errors breaking audio entirely.
2022-08-31 21:26:14 +01:00
Billy Laws
8917ec9c88 Don't set framesPerCallback for main stream as per oboe guidance
It's best to let oboe figure it out on it's own
2022-08-31 21:26:14 +01:00
Billy Laws
b00008daf5 Fix identifier release check in AudioTrack::Stop 2022-08-31 21:26:14 +01:00
Billy Laws
d9c8e62d1c Don't warn on GetConfig IOCTL fails 2022-08-31 21:26:14 +01:00
Billy Laws
4aef24ba32 Implement NVGPU_GPU_IOCTL_GET_GPU_TIME in nvdrv 2022-08-31 21:26:14 +01:00
Billy Laws
5841799420 Fix decoding of IOCTLs with padding at the end 2022-08-31 21:26:14 +01:00
Billy Laws
82444f3b0a Don't set push descrptor flag for desc sets
This is redundant and against the spec since we no longer use push descriptors.
2022-08-31 21:26:14 +01:00
PixelyIon
94fdd6aa43 Fix HID touch points not being removed from screen
Tapping anything in titles that supported touch (such as Puyo Puyo Tetris or Sonic Mania) wouldn't work due to the first touch point never being removed from the screen, it is supposed to be removed after a 3 frame delay from the touch ending.

This commit introduces a mechanism to "time-out" touch points which counts down during the shared memory updates and removes them from the screen after a specified timeout duration.
2022-08-31 23:43:02 +05:30
PixelyIon
70ad4498a2 Write HID LIFO entries at fixed intervals
Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior.

This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest.

Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>
2022-08-31 22:49:36 +05:30
PixelyIon
7966bfa9f6 Fix PI update KThread::waiterMutex deadlock
It was determined that deadlocks inside `KThread::UpdatePriorityInheritance` would not only arise from the first level of locking with `waitingOn->waiterMutex` but also the second level of locking with `nextThread->waiterMutex` which has now also been fixed to fallback when facing contention.
2022-08-28 20:15:08 +05:30
Abandoned Cart
86f6fc510e Remove printing result message from author
There is no purpose in printing the same result twice, so any duplicate messages should instead allow the field to be truncated.
2022-08-27 18:54:27 +05:30
Abandoned Cart
1013857fc4 Refactor subtitle as author to remove subtitle
Subtitle is no longer used, so instances have been rerouted to author. DataItem was also updated to reflect the removal of a subtitle.
2022-08-27 18:54:27 +05:30
Abandoned Cart
04cae942ea Follow typical per-file detail formatting
Format the details in the expected format of individual files (instead of a complete game) and move the code to match the updated placement.
2022-08-27 18:54:27 +05:30
lynxnb
5d6eaee301 Correctly save/restore ROM version to/from game entry cache
PR #1758 introduced a bug where the game list would be entirely loaded every time the app was opened. This commit addresses that issue, which was caused by the `version` member of the cached game list being serialized to file (although incorrectly) but never actually read back when deserializing.
2022-08-21 20:24:36 +02:00
lynxnb
cfa5f0e030 Fix OSC alpha not changing on button press 2022-08-20 17:00:40 +02:00
KikiManjaro
8fb4e62c28 Add version information about rom
Review:
Co-authored-by: Niccolò Betto <niccolo.betto@gmail.com>
2022-08-20 13:48:07 +02:00
KikiManjaro
3407f6d530 Add OSC opacity adjustment 2022-08-20 13:46:17 +02:00
lynxnb
d129fb09cd Android Manifest cleanup
* Remove `package` from manifest and from activity prefixes, gradle `namespace` will be used instead
* Removed deprecated `android.support.PARENT_ACTIVITY` metadata 
entries
* Make `MainActivity` and `SettingsActivity` launched in `singleTop` mode to avoid unnecessary activity restarts while navigating the app
2022-08-17 15:33:53 +02:00
lynxnb
d128856c7d Update target SDK version to Android 13 2022-08-17 12:29:46 +02:00
lynxnb
39f398f76b Update Kotlin (1.7.10), NDK (25.0.8775105), AGP (7.2.2) and Kotlin deps 2022-08-17 12:28:31 +02:00
lynxnb
e9618d9e2c Use pragma pack directions for tightly packing structs containing u128
Using `__attribute__((packed))` doesn't work in new NDKs when a struct contains 128-bit integer members, likely because of a ndk/compiler bug. We now enclose the requiring structs in `#pragma pack` directives to tightly pack them.
2022-08-17 12:22:11 +02:00
lynxnb
c4bf92a49f Fix Kotlin compilation errors from incorrect overloading of null-safe types 2022-08-17 12:16:26 +02:00
Billy Laws
bf491f71f9 Simplify blit helper shader vertex order 2022-08-10 15:43:16 +01:00
Billy Laws
c32bec071c Adjust blit src{X,Y} to account for centred sampling before calling into helper shader
Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.
2022-08-10 15:39:37 +01:00
Billy Laws
08f36aac33 Enable hades vertex position input workaround for Adreno
Caused crashes in any games using geometry shaders as by default hades uses the position builtin directly.
2022-08-08 18:09:00 +01:00
Billy Laws
04e7b684d2 Enable vertexPipelineStoresAndAtomics, fragmentStoresAndAtomics and shaderStorageImageWriteWithoutFormat Vulkan features
Used by Xenoblade Chronicles DE
2022-08-08 17:43:18 +01:00
Billy Laws
390558c802 Add partial support for legacy attribute conversion
We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.
2022-08-08 17:43:18 +01:00
Billy Laws
540437b547 Fixup index buffer view caching
We forgot to set the view size, which would end up forcing a view to be recreated with every call
2022-08-08 17:43:18 +01:00
Billy Laws
c966cd3b26 Prevent runtimeInfo vertex state from leaking into wrong shaders
This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.
2022-08-08 17:43:13 +01:00
Billy Laws
c52d3195cf Ensure shader stage enable state matches pipeline stage enable state
As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.
2022-08-08 17:40:35 +01:00
Billy Laws
b1c669ba14 Always keep the VertexB shader stage enabled
HW doesn't allow disabling the VertexB stage, enforce this in code.
2022-08-08 17:40:35 +01:00
lynxnb
d5174175d1 Implement indexed quads support
We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.
2022-08-08 17:40:35 +01:00
lynxnb
e6741642ba Split out megabuffer allocation from pushing data
The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.
2022-08-08 17:40:35 +01:00
Billy Laws
cdc6a4628a Enable VK uint8 indices feature when supported 2022-08-08 17:40:35 +01:00
Billy Laws
dccc86ea97 Implement transform feedback with VK_EXT_transform_feedback
Tested to work in Xenoblade Chronicles DE, the code handles both hades varying input and buffer setup.
2022-08-08 17:40:35 +01:00
Billy Laws
06053d3caf Rewrite Fermi 2D engine to use the blit helper shader
Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock
2022-08-08 17:40:35 +01:00
Billy Laws
395f665a13 Implement a system for helper shaders together with a simple blit shader
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may  need in the future.
2022-08-08 17:40:35 +01:00
Billy Laws
1da1698f90 Disable unused Vulkan HPP setters and smart handles 2022-08-08 14:57:44 +01:00
Billy Laws
f4e58a9238 Remove redundant synchost creating a new buffer 2022-08-08 14:57:44 +01:00
Billy Laws
11a8feb037 Correct nvdrv DMA copy class ID
Was wrongly copy and pasted.
2022-08-08 14:57:44 +01:00
Billy Laws
13e7b54c61 Ensure failed IOCTLs are logged as a warning log 2022-08-08 14:57:44 +01:00
Billy Laws
eeb86a4f8a Calculate renderArea from min(attachments.dimensions...)
Vulkan doesn't support a renderArea larger than that of the smallest attachment
2022-08-08 14:57:44 +01:00
Billy Laws
9ea658d0ed Don't throw on unsupported TIC formats
These sometimes spuriously occur in games during transitions, to avoid crashing during them just use the null texture if they occur and log an error log
2022-08-08 14:57:44 +01:00
Billy Laws
856818c8eb Emulate the 'None' mipfilter by adjusting LOD
Borrowed this technique from yuzu since Vulkan has no direct equivalent
2022-08-08 14:57:44 +01:00
Billy Laws
9d50b6d0f7 Avoid locking presentation mutex in GetTransformHint
This caused slowdown in Pokemon as it was being called every frame
2022-08-08 14:57:44 +01:00
Billy Laws
460e6c9c84 Use raw pointers to hold constant buffer views
The constant destruction and creation of `BufferView`s in cbuf-heavy games showed up as a large chunk of the profiler. Fix this by taking advantage of the fact that constant buffer `BufferView`s are never deleted and always kept around in the cache to just return a pointer to them in the cache.
2022-08-08 14:54:57 +01:00
Billy Laws
6b2e84712b Avoid race in nvdrv debug prints
Looking up the device name without locking it could race with map insertions or deletions, so lock it to avoid that
2022-08-08 13:24:23 +01:00
Billy Laws
683cd594ad Use a linear allocator for most per-execution GPU allocations
Currently we heavily thrash the heap each draw, with malloc/free taking up about 10% of GPFIFOs execution time. Using a linear allocator for the main offenders of buffer usage callbacks and index/vertex state helps to reduce this to about 4%
2022-08-08 13:24:21 +01:00
Billy Laws
70eec5a414 Store delegate attached state within the delegate itself
Avoids a costly map lookup for every AttachBuffer call, this was a serious bottleneck in SMO
2022-08-08 13:23:26 +01:00
Billy Laws
0268e1d5a0 Force a submit before any i2m engine writes
We need traps to be inplace so we dont end up overwriting a resource that's being actively used by the current context without setting it to dirty
2022-08-08 13:22:37 +01:00
Billy Laws
cb0b132486 Allow supplying push constants to GetPipeline 2022-08-08 13:22:37 +01:00
Billy Laws
1c8863ec3b Use const references for holding pipeline state in pipeline cache
Allows passing in constexpr structs to state directly
2022-08-08 13:22:37 +01:00
Billy Laws
b6b04fa6c5 Use small_vector for VMM TranslateRange results
This was the source of a lot of heap allocs, moving to small_vector helps to avoid most of them
2022-08-08 13:22:37 +01:00
Billy Laws
1fe6d92970
Wait on Swapchain Image copy to complete
Certain titles can have a display frames out of order due to not waiting on the copy from the final RT to the swapchain image to occur. Although `PresentFrame` does wait on the syncpoint, that isn't enough to ensure the source texture is up-to-date due to us signalling syncpoints early. 

By waiting on the swapchain texture after the copy is submitted, we now implicitly wait on the source texture's cycle to be signalled thus waiting on the frame to be done which fixes the issue.
2022-08-07 03:12:27 +05:30
PixelyIon
5b7572a8b3
Introduce chunked MegaBuffer allocation
After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage.

This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2022-08-07 03:12:27 +05:30
Billy Laws
99b5fc35c6
Change SegmentTable semantics to respect unset entries
Accesses to unset entries is now clearly defined as returning a 0'd out value, the prior behavior would be to optimize sets for border segments to use L2 atomicity when the specific segment had no L1 entries set. This would lead to any future lookups of offsets within the same L2 segment but a different L1 entry to incorrectly return an inaccurate value as the only prior guarantee was that lookups after setting a segment would return the same value as was set but lacked the guarantee for unset segments to also consistently return unset values.

This could lead to issues in practical usages such as the `BufferManager` lookups returning the existence of a `Buffer` at a location falsely even though the segment was never set to the value, this was problematic as raw pointers were utilized and bound checks would lead to a segmentation fault.

This commit fixes this issue by introducing this guarantee and refactoring the class accordingly, it also deletes the `Set` method for setting a single entry as the meaning is ambiguous and it's functionality was more akin to the past guarantee and no longer makes sense.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
36b8d3c445
Account for SegmentTable insertions entirely within an L2 entry
We would always write all L1 entries that correspond to an L2 entry, even if setting an input range ended before that. This would effectively reduce the atomicity of the segment table to that of the L2 range and lead to breaking API guarantees by returning entirely wrong segment values for a lookup covering a region that was overwritten.
2022-08-06 22:20:54 +05:30
PixelyIon
c72316d9f6
Rename RangeTable to SegmentTable
It was determined that `RangeTable` was too ambiguous of a name as it could be interpreted to be holding ranges rather than looking them up, to avoid confusion the terminology has been changed to `range` to `segment`. As "segment table" is more clear in describing that it is a table comprised of descriptors regarding segments and it avoids any overlaps with terminology concerning "pages" which would be overly specific for this data structure or the ambiguous "ranges".
2022-08-06 22:20:54 +05:30
Billy Laws
5398eff045
Fix KProcess::MutexUnlock PI CAS
The PI CAS in `MutexUnlock` ends up loading `basePriority` rather than `priority` which could lead to an infinite CAS loop when `basePriority` doesn't equal to `priority` and the `highestPriorityThread`'s priority is lower than `basePriority`.
2022-08-06 22:20:54 +05:30
PixelyIon
850c0f4092
Make Texture::SynchronizeGuest Blocking
It was determined that `Texture::SynchronizeGuest`'s `TextureBufferCopy` had races that were exposed by the introduction of the cycle waiter thread, the synchronization did not take place under a locked context so the texture could be mutated at any point in addition to the destructor not being run during `FenceCycle::Wait` due to `shouldDestroy` being `false`. 

This commit fixes the issue by making `SynchronizeGuest` entirely blocking as all usages of the function required blocking semantics regardless so it would be pointless to retain its async nature while solving any races that may arise from it being async.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
77d15b02a3
Ensure backing continuity when recreating GPU dirty buffers
Since we don't call `SynchronizeHost` on source buffers which are GPU dirty, their mirrors will be out of date. The backing contents of this source buffer's region in the new buffer will be incorrect. By copying from the backing directly, we can ensure that no writes are lost and that if the newly created buffer needs to turn GPU dirty during recreation no copies need to be done since the backing is as up to date as the mirror at a minimum.
2022-08-06 22:20:54 +05:30
Billy Laws
c1bf5a804a
Extend stateMutex scope inside Buffer::SynchronizeHost
The code is much simpler to reason about when reading the code as it doesn't require evaluating all the potential edge cases of trap handlers in different states. It should be noted that this should not change behavior in any meaningful way, at most it can prevent a minor race where the protection could be upgraded after being downgraded by the signal handler leading to a redundant trap.
2022-08-06 22:20:54 +05:30
PixelyIon
c3cf79cb39
Rework KThread::waiterMutex Locking
Two issues exist with locking of `KThread::waiterMutex`:
* It was not always locked when accessing waiter members such as `waitThread`, `waitKey` and `waitTag` which would lead to a race that could end up in a deadlock or most notably a segfault inside `UpdatePriorityInheritance`
* There could be a deadlock from `UpdatePriorityInheritance` locking `waiterMutex` of a thread and waiting to get the owner's `waiterMutex` while on another thread `MutexUnlock` holds the owner's `waiterMutex` and waits on locking the `waiterMutex` held by `UpdatePriorityInheritance`

This commit fixes both issues by adding appropriate locking to all locations where waiter members are accessed in addition to adding a fallback mechanism inside `UpdatePriorityInheritance` that unlocks `waiterMutex` on contention to avoid a deadlock.
2022-08-06 22:20:54 +05:30
PixelyIon
68615703c1
Fix KProcess/SetThreadPriority PI CAS
The condition for exiting the CAS loops is incorrect in several places which leads to additional loops, while this doesn't make the behavior incorrect it does lead to redundant iterations. 

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
8fc3cc7a16
Rework Descriptor Set Allocation/Updates
A substantial amount of time would be spent on creation/destruction of `VkDescriptorSet` which scales on titles doing a substantial amount of draws with bindings, this leads to poor performance on those titles as the frametime is dragged down by performing these tasks while they repeatedly create descriptor sets of the same layouts.

This commit fixes it by pooling descriptor sets per-layout in a dynamically resizable pool and keeping them around rather than destroying them after usage which leads to the vast majority of cases not requiring a new descriptor set to even be created. It leads to significantly improved performance where it would otherwise be spent on redundant destruction/recreation or push descriptor updates which took a substantial amount of time themselves.

Additionally, the `BaseDescriptorSizes` were not kept up to date with all of the descriptor types, it led to no crashes on Adreno/Mali as they were purely used for size calculations on either driver but has been corrected to avoid any future issues.
2022-08-06 22:20:54 +05:30
PixelyIon
e1a4325137
Introduce FenceCycle Waiter Thread
A substantial amount of time is spent destroying dependencies for any threads waiting or polling `FenceCycle`s, this is not optimal as it blocks them from moving onto other tasks while destruction is a fundamentally async task and can be delayed.

This commit solves this by introducing a thread that is dedicated to waiting on every `FenceCycle` then signalling and destroying all dependencies which entirely fixes the issue of destruction blocking on more important threads.
2022-08-06 22:20:54 +05:30
PixelyIon
5f8619f791
Optimize Buffer Lookups using Range Tables
Buffer lookups are a fairly expensive operation that we currently spend `O(log n)` on the simplest and most frequent case of which is a direct match, this is a very frequent operation where that may be insufficient. This commit optimizes that case to `O(1)` by utilizing a `RangeTable` at the cost of slightly higher insertion/deletion costs for setting ranges of values but these are minimal in frequency compared to lookups.
2022-08-06 22:20:54 +05:30
PixelyIon
578ae86cca
Implement Multi-Level Range Table
A data structure that can represent the same value for a range of addresses (pages) is required for fast lookup in certain cases. This commit implements a near optimal data structure for mass insertion and O(1) lookup of range-based data, this is achieved using the host MMU and implementing multiple levels of atomicity for the ranges. 

It should be noted that the table is limited to two levels but can be extended to a variable amount of ranges in the future, it was determined that additional levels of ranges can be beneficial for performance depending on the specific use-case.
2022-08-06 22:20:54 +05:30
Billy Laws
38eab80ed8
Disable Vulkan Push Descriptors on Adreno
Adreno drivers have certain errata which leads to Vulkan Push Descriptors to be broken on them in certain cases which leads to a descriptor set update being swallowed. This has been worked around by disabling push descriptors on Adreno drivers, this may lead to reduced performance on certain titles which frequently bind new descriptors.
2022-08-06 22:20:54 +05:30
Billy Laws
88fd491ed5
Submit after Maxwell3D Semaphore Release
Any semaphore releases are implicit synchronization events that can be utilized by the guest to pick up that the GPU has executed till a certain point and therefore we must submit all prior work accordingly.
2022-08-06 22:20:54 +05:30
Billy Laws
b77da1182f
Don't flush submission on DMA Copies
DMA copies utilized `SubmitWithFlush` instead of `Submit`, this is not required and incurs significant additional synchronization penalties which will no longer be required.
2022-08-06 22:20:54 +05:30
PixelyIon
0992fde028
Don't block on surface creation in GetTransformHint
We want to avoid blocking on surface creation unless necessary, this commit doesn't wait on the creation of the surface as it default initializes the value which'll generally be `Identity` or the transformation of the previous surface if it was lost.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
35133381b6
Fix V-Sync KEvent construction order
The V-Sync `KEvent` would be used by the presentation thread prior to construction leading to dereferencing an invalid value, this has been fixed by changing the order of construction to move the construction of the presentation thread after the V-Sync event.
2022-08-06 22:20:54 +05:30
PixelyIon
ffad246d67
Split NCE Trap page-out functionality from TrapRegions
The `TrapRegions` function performed a page-out on any regions that were trapped as read-only, this wasn't optimal as it would tie them both into the same operation while Buffers/Textures require to protect then synchronize and page-out. The trap was being moved to after the synchronize to get around this limitation but that can cause a potential race due to certain writes being done after the synchronization but prior to the trap which would be lost. This commit fixes these issues by splitting paging out into `PageOutRegions` which can be called after `TrapRegions` by any API users.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
da464d84bc
Consolidate NCE::TrapRegions functionality into CreateTrap
`NCE::TrapRegions` was a bit too overloaded as a method as it implicitly trapped which was unnecessary in all current usage cases, this has now been made more explicit by consolidating the functionality into `NCE::CreateTrap` which handles just creation of the trap and nothing past that, `RetrapRegions` has been renamed to `TrapRegions` and handles all trapping now.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
8a62f8d37b
Rework Texture Synchronization API + Locking
Similar to `Buffer`s, `Texture`s suffered from unoptimal behavior due to using atomics for `DirtyState` and had certain bugs with replacement of the variable at times where it shouldn't be possible which have now been fixed by moving to using a mutex instead of atomics. This commit also updates the API to more closely match what is expected of it now and removes any functions that weren't utilized such as `SynchronizeGuestWithBuffer`.
2022-08-06 22:20:54 +05:30
Billy Laws
04bcd7e580
Rework Buffer DirtyState with BackingImmutability
Having a single variable denoting the exact state of a buffer and the operations that could be performed on it was found to be too restrictive, it's now been expanded into an additional `BackingImmutability` variable but due to these two. We can no longer use atomics without significant additional complexity so all accesses to the state are now mediated through `stateMutex`, a mutex specifically designed for tracking the state.

While designing the system around `stateMutex` it was determined to be more efficient than atomics as it would enforce blocking far less than it would generally have been compared to if the regular atomic fallback of locking the main resource lock which is locked for significantly longer generally.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
1af781c0a5
Add Perfetto Tracing to NCE Trapping API
As a performance sensitive part of code, the NCE Trapping API benefits from having tracing and it helps us better determine where guest code is spending its time for more targeted optimizations.
2022-08-06 22:20:54 +05:30
PixelyIon
9d294b9ccc
Use weak_ptr for TrapHandler Callbacks
The lifetime of the `this` pointer in the trap callbacks could be invalid as the lifetime of the underlying `Buffer`/`Texture` object wasn't guaranteed, this commit fixes that by passing a `weak_ptr` of the objects into the callbacks which is locked during the callbacks and ensures that a destroyed object isn't accessed.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
96d8676d5b
Fix SubmitWithFlush not updating MegaBuffer cycle
The `CommandExecutor`'s `MegaBuffer` was not being updated with the latest `FenceCycle` on being flushed in `SubmitWIthFlush`, this led to the megabuffer being overwritten prior to its GPU-side usage being complete. This commit fixes that by replacing the cycle to the latest cycle and prevents any races that occurred prior.
2022-08-06 22:20:54 +05:30
PixelyIon
3e9d84b0c3
Split FindOrCreate functionality across BufferManager
`FindOrCreate` ended up being monolithic function with poor readability, this commit addresses those concerns by refactoring the function to split it up into multiple member functions of `BufferManager`, while some of these member functions may only have a single call-site they are important to logically categorize tasks into individual functions. The end result is far neater logic which is far more readable and slightly better optimized by virtue of being abstracted better.
2022-08-06 22:20:54 +05:30
PixelyIon
d2a34b5f7a
Implement ContextLock Move-assignment operator
In certain cases the move constructor may not suffice and the move assignment operator is required, this commit implements that and moves to using a pointer for storing the `resource` member rather than a reference as its semantics matched what we desired more and allowed for assignment of the `resource`.
2022-08-06 22:20:54 +05:30
PixelyIon
38d3ff4300
Fix BufferManager::FindOrCreate Recreation Bugs
It was determined that `FindOrCreate` has several issues which this commit fixes:
* It wouldn't correctly handle locking of the newly created `Buffer` as the constructor would setup traps prior to being able to lock it which could lead to UB
* It wouldn't propagate the `usedByContext`/`everHadInlineUpdate` flags correctly
* It wouldn't correctly set the `dirtyState` of the buffer according to that of its source buffers
2022-08-06 22:20:54 +05:30
Billy Laws
d1a682eace
Fix setDirty behavior in Buffer::SynchronizeGuest
The condition for `setDirty` in the dirty state CAS was inverted from what it should've been resulting in synchronizing incorrectly, this commit fixes the condition to correct synchronization.
2022-08-06 22:20:54 +05:30
Billy Laws
00d434efdc
Remove Texture::CopyFrom format check
The formats of the textures involved in a texture were checked for equality, this broke certain copies as the presentation engine would invoke copies between textures of different yet compatible formats.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
58174f255f
Improve ContextLock semantics
`ContextLock` had unoptimal semantics in the form of direct access to the `isFirst` member which wasn't clearly defined, it's now been broken up into function calls `IsFirstUsage` and `OwnsLock` with explicit move semantics and a function for releasing the lock.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30
Billy Laws
561103d3da
Submit GPFIFO work prior to CircularQueue waiting
The position at which we call submit is a significant factor in performance and we did so at the end of PBs (PushBuffers), this isn't optimal as there could be multiple PBs queued up that would benefit from being in the same submission. We now delay the submission of the workload till we run out of PBs.
2022-08-06 22:20:54 +05:30
PixelyIon
3ac5ed8c06
Attach coalesced Buffer if any source Buffer is attached
A buffer that's attached to a context could be coalesced into a larger buffer which isn't attached, this would break as it wouldn't keep the buffer alive till the end of the associated context. To fix this if any source buffers are attached then the resulting coalesced buffer is also attached now.
2022-08-06 22:20:54 +05:30
PixelyIon
284ac53d88
Fix KThread Priority Inheritance CAS
The CAS condition for KThread PI was inverted which lead to entirely incorrect behavior for CAS conditions which while it might work in the vast majority of cases would lead to significantly inaccurate behavior.
2022-08-06 22:20:54 +05:30
PixelyIon
45cb8388cc
Fix NCE Trap API Lock Callback
The lock callback would `continue` which would end up skipping over the current item as it applied to the inner loop rather than the outer loop as intended. This has now been fixed by using `break` and a check instead.
2022-08-06 22:20:54 +05:30
PixelyIon
745d809e07
Fix Buffer::SynchronizeGuest Non-Blocking Behavior
The buffer's non-blocking behavior could lead to an invalid state where the dirty state doesn't adequately represent the buffer's true state, the check has now been moved inside the CAS loop as its behavior changes depending on the dirty state. In addition, `SynchronizeGuest` returns a boolean denoting if the synchronization was successful now to make code flows depending on non-blocking synchronization cleaner.
2022-08-06 22:20:54 +05:30
PixelyIon
c1f2445772
Set state to CpuDirty directly in SynchronizeGuest
`SynchronizeGuest` could only set the dirty state to `Clean` which was redundant since calls to it from inside the write trap handler would set it to `CpuDirty` directly after, this fixes that by doing it inside the function when necessary.
2022-08-06 22:20:54 +05:30
PixelyIon
4f6a67af36
Fix Texture Trap Data Race
The trap callbacks did not wait on the `Texture` to complete synchronization to the guest, this resulted in races where the contents written to the texture would be overwritten by the synced content. This commit fixes that by waiting on the fences at the end of the trap callback.
2022-08-06 22:20:54 +05:30
PixelyIon
cb7c3602e7
Attach TextureView to FenceCycle
The lifetime of `TextureView` objects wasn't correctly managed as they weren't being attached the the `FenceCycle` in `AttachTexture`, this led to them getting deleted and causing all sorts of UB.
2022-08-06 22:20:54 +05:30
PixelyIon
ffaefc82d3
Call all flush callbacks prior to CommandExecutor submission
The flush callbacks inside `CommandExecutor` weren't being called prior to submission as they should've been, this fixes that by calling them. It additionally removes the requirement to manually flush Maxwell3D at the end of `ChannelGpfifo` pushbuffers as it's a flush callback and will automatically be called by `Submit`.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
e65707cd9d
Handle CommandExecutor submission at end of ChannelGpfifo PB
Any work that was done in a `ChannelGpfifo` pushbuffer needs to be submitted at the end of it, if it isn't done then the work might incorrectly be not done till the next submission. This commit fixes it by calling `CommandExecutor::Submit` at the end of a pushbuffer, submitting any buffers that would've been left over.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-08-06 22:20:54 +05:30
PixelyIon
7b209c54a2
Only reallocate MegaBuffer on usage
Certain submissions might not utilize megabuffering but reserve a `MegaBuffer` regardless, this is not optimal since it can inflate the allocations and waste memory. This commit addresses the issue by eliding the allocation given the current submission doesn't utilize them.
2022-08-06 22:20:54 +05:30
PixelyIon
2366f81443
Fix Buffer::PollFence incorrectly handling null-FenceCycle
If a `FenceCycle` isn't attached then `PollFence` returned `false` while it should return if the buffer has any concurrent GPU usages in flight, this has now been fixed by returning `true` in those cases.
2022-08-06 22:20:54 +05:30
PixelyIon
34e1e39d1c
Always reset all attached resources on Submit
Certain resources can be attached to an empty `Submit` with no nodes, this can cause it to become a false dependency and not be removed till the next non-empty submission. This has now been fixed by doing a reset regardless of if any nodes exist.
2022-08-06 22:20:54 +05:30
PixelyIon
47db8e8cbc
Fix GPU inline copy callback for Buffer::Write
The GPU inline copy callback was broken for `Buffer::Write` as it wasn't always called when it needed to be and didn't handle attaching of the buffer to the executor which would cause it to be unlocked. This commit addresses both of these issues, it introduces a `AttachLockedBuffer` method to attach an already locked buffer to the executor.
2022-08-06 22:20:54 +05:30
PixelyIon
2636a37b31
Introduce alternative FPS measurement for disabled frame throttling
The FPS is implicitly bound to the refresh rate due to the timestamp being that of the presentation time, this leads to a misleading FPS figure for disabled frame throttling. It has now been fixed by using the frame submission time rather than the presentation time when frame throttling is disabled and to make this more apparent the color of the OSD FPS has been changed.
2022-08-06 22:20:54 +05:30
PixelyIon
0f56d01e58
Fix Packed format component ordering in IsAdrenoAliasCompatible
All `Packed` formats have their components stored in the opposite ordering to the label, this was not followed for `IsAdrenoAliasCompatible` prior and the ordering has now been flipped.
2022-08-06 22:18:42 +05:30
PixelyIon
3ca56ef578
Fix NCE Trapping API Deadlock
A deadlock was caused by holding `trapMutex` while waiting on the lock of a resource inside a callback while another thread holding the resource's mutex waits on `trapMutex`. This has been fixed by no longer allowing blocking locks inside the callbacks and introducing a separate callback for locking the resource which is done after unlocking the `trapMutex` which can then be locked by any contending threads.
2022-08-06 22:18:42 +05:30
PixelyIon
a6599c30b4
Correct IntervalMap insertion end calculation
The `end` pointer for `interval` was incorrectly calculated as `interval.data() + interval.size_bytes()` which would be incorrect when the interval span type is not `u8` as the pointer derived from `interval.data()` would be a pointer to the span type rather than a byte pointer and be subject to arithmetic of that object's size rather than in terms of a byte.
2022-08-06 22:18:42 +05:30
PixelyIon
b0910e7b1a
Avoid locking Texture/Buffer in trap handler
We generally don't need to lock the `Texture`/`Buffer` in the trap handler, this is particularly problematic now as we hold the lock for the duration of a submission of any workloads. This leads to a large amount of contention for the lock and stalling in the signal handler when the resource may be `Clean` and can simply be switched over to `CpuDirty` without locking and utilizing atomics which is what this commit addresses.
2022-08-06 22:18:42 +05:30
PixelyIon
a60d6ec58f
Replace host immutability FenceCycle with GPU usage tracking
We utilized a `FenceCycle` to keep track of if the buffer was mutable or not and introduced another cycle to track GPU-side requirements only on fulfillment of which could the buffer be utilized on the host but due to the recent change in the behavior this system ended up being unoptimal. 

This commit replaces the cycle with a boolean tracking if there are any usages of the resource on the GPU within the current context that may prevent it from being mutated on the CPU. The fence of the context is simply attached to the buffer based off this which was allowed as the new behavior of buffer fences matches all the requirements for this.
2022-08-06 22:18:42 +05:30
PixelyIon
217d484cba
Abstract TextureView/BufferDelegate locking into LockableSharedPtr
An atomic transactional loop was performed on the backing `std::shared_ptr` inside `BufferView`/`TextureView`'s `lock`/`LockWithTag`/`try_lock` functions, these locks utilized `std::atomic_load` for atomically loading the value from the `shared_ptr` recursively till it was the same value pre/post-locking. 

This commit abstracts the locking functionality of `TextureView`/`BufferDelegate` into `LockableSharedPtr` to avoid code duplication and removes the usage of `std::atomic_load` in either case as it is not necessary due to the implicit memory barrier provided by locking a mutex.
2022-08-06 22:18:42 +05:30
PixelyIon
2d08886e4e
Utilize TextureView rather than Texture for presentation
`PresentationEngine` and `GraphicBufferProducer` methods that utilized textures for the surface utilized the `Texture` type rather than the `TextureView` type, this was never correct but at the time of authoring this code `TextureView` was not finalized and in a major flux which is why it was not utilized and `Texture` was utilized instead. Now that is is far more stable, it has been replaced with `TextureView`.
2022-08-06 22:18:42 +05:30
PixelyIon
d7399e33c1
Avoid waiting on mutex in PresentationEngine::Present
We want to block on the host thread during presentation while the host surface isn't present to implicitly pause the game, this can end up being fairly costly as it involves locking the `PresentationEngine` mutex which can lead to a lot of contention with the presentation thread. This fixes the issue by polling if there is a surface and only if there isn't then doing the wait as it isn't mandatory to wait always, we'll eventually run into the guest thread stalling.
2022-08-06 22:18:42 +05:30
PixelyIon
30475ffc43
Fix queueBuffer GraphicBuffer Compatibility Check
Newer versions of the Deko3D homebrew were crashing due to this check and it was discovered that the check was incorrect and rather than comparing the `NvSurface` what had to be compared was the `GraphicBuffer` associated with the slot directly.

Co-authored-by: lynxnb <niccolo.betto@gmail.com>
2022-08-06 22:18:42 +05:30
PixelyIon
c2685d5f5c
Fix consistency issues with external project copyright headers
The copyright headers for external project such as yuzu/Ryujinx were inconsistent in ordering, Skyline should always be the first item in the list. In addition, they didn't always link to the project's GitHub which has also been fixed.
2022-08-06 22:18:42 +05:30
PixelyIon
0ac5f4ce27
Lock TextureManager/BufferManager during submission
Multiple threads concurrently accessing the `TextureManager`/`BufferManager` (Referred to as "resource managers") has a potential deadlock with a resource being locked while acquiring the resource manager lock while the thread owning it tries to acquire a lock on the resource resulting in a deadlock.

This has been fixed with locking of resource manager now being externally handled which ensures it can be locked prior to locking any resources, `CommandExecutor` provides accessors for retrieving the resource manager which automatically handles locking aside doing so on attachment of resources.
2022-08-06 22:18:42 +05:30
PixelyIon
1239907ce8
Rework Texture & Buffer for Context and FenceCycle Chaining
GPU resources have been designed with locking by fences in mind, fences were treated as implicit locks on a GPU, design paradigms such as `GraphicsContext` simply unlocking the texture mutex after attaching it which would set the fence cycle were considered fine prior but are unoptimal as it enforces that a `FenceCycle` effectively ensures exclusivity. This conflates the function of a mutex which is mutual exclusion and that of the fence which is to track GPU-side completion and led to tying if it was acceptable to use a GPU resource to GPU completion rather than simply if it was not currently being used by the CPU which is the function of the mutex.

This rework fixes this with the groundwork that has been laid with previous commits, as `Context` semantics are utilized to move back to using mutexes for locking of resources and tracking the usage on the GPU in a cleaner way rather than arbitrary fence comparisons. This also leads to cleaning up a lot of methods that involved usage of fences that no longer require it and therefore can be entirely removed, further cleaning up the codebase. It also opens the door for future improvements such as the removal of `hostImmutableCycle` and replacing them with better solutions, the implementation of which is broken at the moment regardless.

While moving to `Context`-based locking the question of multiple GPU workloads being in-flight while using overlapping resources came up which brought a fundamental limitation of `FenceCycle` to light which was that only one resource could be concurrently attached to a cycle and it could not adequately represent multi-cycle dependencies. `FenceCycle` chaining was designed to fix this inadequacy and allows for several different GPU workloads to be in-flight concurrently while utilizing the same resources as long as they can ensure GPU-GPU synchronization.
2022-08-06 22:18:42 +05:30
PixelyIon
07d45ee504
Introduce FenceCycle Chaining
If we want to allow submitting multiple pieces of work to the GPU at once while still requiring CPU synchronization, we'll need to track all past fence cycles associated with a resource alongside the current one. To solve this the concept of chaining fences has been introduced, fences from past usages can be chained to the latest fence which'll then recursively forward operations to chained fences.

This change also ends up mandating a move away from `FenceCycleDependency` as it would prevent fences from concurrently locking the same resources which is required for chaining to work as two fences being chained fundamentally means they're locking the same resources. The `AtomicForwardList` is therefore used as the new container.
2022-08-06 22:18:42 +05:30
PixelyIon
cf9e31c1eb
Implement Atomic Forward List
An implementation of a singly-linked list with atomic access to allow for lock-free access semantics, it eliminates the requirement for a mutex which can introduce additional consideration for synchronization.
2022-08-06 22:18:42 +05:30
PixelyIon
6b9269b88e
Introduce Context semantics to GPU resource locking
Resources on the GPU can be fairly convoluted and involve overlaps which can lead to the same GPU resources being utilized with different views, we previously utilized fences to lock resources to prevent concurrent access but this was overly harsh as it would block usage of resources till GPU completion of the commands associated with a resource.

Fences have now been replaced with locks but locks run into the issue of being per-view and therefore to add a common object for tracking usage the concept of "tags" was introduced to track a single context so locks can be skipped if they're from the same context. This is important to prevent a deadlock when locking a resource which has been already locked from the current context with a different view.
2022-08-06 22:18:42 +05:30
PixelyIon
d913f29662
Only set hasFragileUserData for signed builds
We do not want to allow saving of user data on unsigned builds as they don't have a stable signature and will not properly handle reinstallation. This can lead to a situation where the user has to resort to complex techniques to completely uninstall the package such as ADB or calling into PM directly.
2022-08-06 22:18:42 +05:30
PixelyIon
3139889a09
Implement Asynchronous Presentation
We currently present all frames synchronously on the thread that calls into SurfaceFlinger functions, this is unoptimal as it doesn't match guest behavior which can lead to delaying the guest from working on the next frame. This commit queuing up frames to non-blocking and handles all waiting then presenting the frame on a dedicated thread.
2022-08-06 22:18:42 +05:30
PixelyIon
6e09dc5204
Fix thread name setting
We utilize `pthread_setname_np` to set the thread names but didn't check for any errors which resulted in the `Skyline-Choreographer` and `ChannelCmdFifo` not having proper names as they exceeded the 16 character limit on thread names for the pthread function.  This has now been fixed by changing the names and introducing error checking to invocations of this function.
2022-08-06 22:18:42 +05:30
PixelyIon
7a0cfb484c
Add NPOT AlignUp utility
All our normal alignment functions are designed to only handle power of 2 (`POT`) multiples as we only align or check alignment to `POT` multiples but there are cases where this is not possible and we deal with `NPOT` multiples which is why this function is required.
2022-08-06 22:18:42 +05:30
PixelyIon
662ea532d8
Skip waiting on host GPU after command buffer submission
We waited on the host GPU after `Execute` but this isn't optimal as it causes a major stall on the CPU which can lead to several adverse effects such as downclocking by the governor and losing the opportunity to work in parallel with the GPU.

This has now been fixed by splitting `Execute`'s functionality into two functions: `Submit` and `SubmitWithFlush` which both execute all nodes and submit the resulting command buffer to the GPU but flushing will wait on the GPU to complete while the non-flush variant will not wait and work ahead of the GPU.
2022-08-06 22:18:42 +05:30
PixelyIon
5129d2ae78
Add move-assignment semantics to ActiveCommandBuffer/MegaBuffer
We need move-assignment semantics to viably utilize these objects as class members, they cannot be replaced without move-assign (or copy-assign but that is undesirable here). This commit fixes that by introducing a move assignment operator to them while making the `slot` a pointer which has the necessary nullability semantics.
2022-08-06 22:18:42 +05:30
lynxnb
8991ccac65 Pass ViewHolder on bind to RecyclerView items instead of ViewBinding
This change lets items get the updated position of their view holder in the adapter. Fixes an issue where the position of items was not updated after being removed from a `SelectableGenericAdapter`.
2022-08-06 22:00:19 +05:30
lynxnb
bb922100cb Improve rendering for Right-To-Left layouts 2022-08-06 22:00:19 +05:30
lynxnb
240e7033d7 Support loading a user-selected driver during vulkan initialization 2022-08-06 22:00:19 +05:30
lynxnb
c812de48ea Show an undo button after deleting a gpu driver
After a driver has been deleted, a snackbar will be shown confirming the deletion, with an button to undo it.
2022-08-06 22:00:19 +05:30
lynxnb
59c60df993 Add GPU Driver Configuration preference
This preference launches `GpuDriverActivity` for managing custom gpu drivers. When the device has an incompatible GPU, the preference will be disabled and greyed out.
2022-08-06 22:00:19 +05:30
lynxnb
48cf1263bc Add a custom GPU driver configuration activity
The activity adds the following functionalities:
* Lists installed drivers
* Allows the user to install new drivers, or remove installed ones
* Allows the user to select the driver that will be used by the emulator
2022-08-06 22:00:19 +05:30
lynxnb
e9f609b923 Add a gpuDriver preference setting
This setting represent the GPU driver selected by the user to be used by the emulator.
2022-08-06 22:00:19 +05:30
lynxnb
1815199d2b Add utilities for reading and installing gpu driver packages 2022-08-06 22:00:19 +05:30
lynxnb
f3dd3e53c1 Miscellaneous imports cleanup in preference package 2022-08-06 22:00:19 +05:30
lynxnb
1dfea9ef6f Create an ItemDecorations file for all RecyclerView item decorations
All item decorations are now placed in one file so that any `RecyclerView` in the app can use the same ones.
2022-08-06 22:00:19 +05:30
lynxnb
a59f2baa3a Add a SelectableGenericAdapter as subclass of GenericAdapter
`SelectableGenericAdapter` extends `GenericAdapter` with support for marking one item as selected.
2022-08-06 22:00:19 +05:30
lynxnb
e93fdce845 Add support for removal of items from GenericAdapter 2022-08-06 22:00:19 +05:30
lynxnb
0d1c7965df Add a ZipUtils class for unpacking zip files 2022-08-06 22:00:19 +05:30
lynxnb
b03f624191 Add kotlinx.serialization-json dependencies 2022-08-06 22:00:19 +05:30
Billy Laws
f52ea7bddb Make deferred draw and constant buffer updates reentrant-safe
At some point we will call Submit within draws or constant buffer updates, to avoid any infinite recursion mark draw/cbuf pending as false before performing any operation
2022-07-29 20:07:14 +01:00
Billy Laws
dbb684835f Fix depthClampDisable register offset in Maxwell 3D 2022-07-29 20:07:14 +01:00
Billy Laws
7fd9d347e3 Use per-RT blend enable registers even when independent blend is disabled
The common blend enable register seems to be used for something else. This is required for blending to work correctly in OpenGL games
2022-07-29 20:07:14 +01:00
Billy Laws
048c2fdd29 Fix Vulkan framebuffer dimensions calculations
The framebuffer needs to be large enough to contain both the render area extent and offset
2022-07-29 20:07:14 +01:00
Billy Laws
0e1aa765fc Prevent CNTVCT_EL0 reads from being optimised out by the compiler
Without this the compiler will assume the read always produces the same value, causing issues when the register is used to time function execution
2022-07-29 20:07:14 +01:00
Billy Laws
1df98ba57f Enable fwrapv for defined signed integer overflow behaviour
Nintendo enables this for HOS so we should do the same to avoid any cases where it's relied on.
2022-07-29 20:07:14 +01:00
lynxnb
d183d14e2a Make accesses to setting values thread-safe 2022-07-26 20:16:24 +05:30
lynxnb
30667a0899 Remove unused Compact Logs settings
Since we don't have a log viewer in the app anymore, the setting was left unused and can be safely removed.
2022-07-26 20:16:24 +05:30
lynxnb
5aa2a4cd1c Rename SettingsValues to NativeSettings
The previous name was chosen as an afterthought and didn't clearly indicate what the purpose of the class is. We needed a separate, simple class without delegates members (like PreferenceSettings), so that its fields can be easily accessed via JNI to get settings values from native code.
2022-07-26 20:16:24 +05:30
lynxnb
f734c4d145 Make log level setting changes immediately active 2022-07-26 20:16:24 +05:30
lynxnb
bb4937121f Remove settings from SharedPreference if they are of the wrong type 2022-07-26 20:16:24 +05:30
lynxnb
2840a126dd Introduce AndroidSettings class and use inheritance
The `Settings` class now has a pure virtual `Update` method, and uses inheritance over template specialization for platform-specific behavior override.
2022-07-26 20:16:24 +05:30
lynxnb
3905728447 Make every setting observable individually
A `Setting` delegate class has been introduced, holding the raw value of the setting and adding support for registering callbacks to that setting. Callbacks will then be called when the value of that setting changes.
As a result of this, raw setting values have been made accessible through pointer dereference semantics.
2022-07-26 20:16:24 +05:30
lynxnb
2d70be60d1 Remove PugiXML submodule
`PugiXML` was only used for parsing the SharedPreferences settings file, not needed anymore.
2022-07-26 20:16:24 +05:30
lynxnb
5b4ca79dc8 Rename Settings Kotlin class to PreferenceSettings
SharedPreferences will be partially swapped out in the future to support per-game settings. In the meantime, make it clear from which class settings are coming from.
2022-07-26 20:16:24 +05:30
lynxnb
3b27540250 Rename operationMode setting to isDocked 2022-07-26 20:16:24 +05:30
lynxnb
69cf25b1a7 Initial support for updating settings during emulation + observing settings changes 2022-07-26 20:16:24 +05:30
lynxnb
c5dde5953a Rework how settings are shared between Kotlin and native side
Settings are now shared to the native side by passing an instance of the Kotlin's `Settings` class. This way the C++ `Settings` class doesn't need to parse the SharedPreferences xml anymore.
2022-07-26 20:16:24 +05:30
lynxnb
4be8b4cf66 Add missing SPDX licence header 2022-07-26 20:16:24 +05:30
lynxnb
365ca66b1b Make integer settings use IntegerListPreference
Avoids unnecessary type casting of setting values and duplication in resource files.
2022-07-26 20:16:24 +05:30
lynxnb
cbc896c8f8 Fix waitForFences crash on Mali drivers
Mali GPU drivers utilize the `ppoll()` syscall inside `waitForFences` which isn't correctly restarted after a signal, which we can receive at any time on a guest thread. This commit fixes that by recursively calling the function on failure till it succeeds or returns an unexpected error.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-07-14 20:34:16 +02:00
MCredstoner2004
942e22f275 Write ApplicationErrorArg ErrorApplets to log
These applets are used by applications to display a custom error message to the user. Both the error message and the detailed error message are printed to the error log.


Co-authored-by: lynxnb <niccolo.betto@gmail.com>
2022-07-02 09:48:59 +05:30
MCredstoner2004
f9a0394577
Implement Software Keyboard applet
This implements the non-inline version of the Software Keyboard (swkbd) applet, which games use to get text input from the user.
2022-07-01 15:19:53 -05:00
MCredstoner2004
a9ee06914d Add ByteBufferSerializable
This allows sending C-like structs between Kotlin and C++ without struct-specific code
2022-06-30 01:17:32 +05:30
Billy Laws
a0275418d6 Add a single-header linear allocator implementation
This conforms to the C++ 'Allocator' named requirement allowing it to be used with any STL type and allows drastically reducing allocation times in cases which are suited for linear allocation.
2022-06-28 21:33:04 +01:00
Billy Laws
e816256220 Add blend, scissor, viewport and vertex state to shader hash
These caused a ton of additional comparisons in Zelda Link's Awakening as many shaders would have the same hash.
2022-06-28 21:32:59 +01:00
lynxnb
e6cfdeb06a Fix non-indexed quad draws
Certain non-indexed quad draws would mistakenly take the indexed quad path because of the assumption that they would not have a bound index buffer. This resulted in a crash for most games using quads due to a faulty exception `Indexed quad conversion is not supported`, when in fact they were not using indexed quads.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-06-23 10:57:11 +02:00
lynxnb
8fc3bc75f4 Allow providing an index type to calculate quad conversion buffer size 2022-06-23 00:15:44 +02:00
Billy Laws
7709dc8cf6 Rewrite buffer megabuffering to be per view and more efficient
This commit implements several key optimisations in megabuffering that are all inherently interlinked.
- Megabuffering is moved from per-buffer to per-view copies, this makes megabuffering possible for small views into larger underlying buffers which is often the case with even the simplest of games,
- Megabuffering is no longer the default option, it is only enabled for buffer views that have had inline GPU writes applied to them in the past as that is the only case where they are beneficial. In any other case the cost of copying, even with a 128KiB limit can be significant.
- With both of these changes, there is now possibility for overlapping views where one uses megabuffering and one does not. In order to allow GPU inline writes to work consistently in such cases a system of 'host immutability' has been implemented, when a buffer is marked as host immutable for a given cycle, all writes to the buffer from that point to the point the cycle is signalled will be performed on the GPU, ensuring that the backing contents are correctly sequenced
2022-06-11 17:05:39 +05:30
MCredstoner2004
2e356b8f0b Use spans instead of ptr and size in kernel memory 2022-06-11 17:05:39 +05:30
PixelyIon
e3e92ce1d4 Handle unsigned builds on CI
We don't always have access to CI secrets, such as, when a certain CI action is triggered by a PR from an external repository then it won't have access to secrets and be signed. While we likely will allow for this in the future as all workflows do have to be approved,  it is still important to not crash when keys are unavailable and have a graceful fallback for those situations.
2022-06-11 17:05:39 +05:30
Billy Laws
8689886bbb
Update build tools to 33.0.0 to fix CI
We're never switching to RC build tools again
2022-06-10 00:11:12 +01:00
Billy Laws
22039df301 Transition to std::unordered_set for buffer view tracking
Has the same guarantees of pointer stabilty while also being significantly faster in cases where a buffer has thousands of views. This is the case in RE4 and this change leads to an almost 1000% performance improvement in that game.
2022-06-09 23:52:13 +01:00
Billy Laws
b75a06af1b Support forcing 60Hz display on Xiaomi MIUI
Uses an API found through RE since none of the AOSP APIs work, additionaly the code for setting RR was consolidated to a single function that can be ran after all display updates.
2022-06-09 19:29:18 +01:00
Billy Laws
42c365fe70 Automatically exclude llvm and boost submodules in gradle project
There is a god in this world... his name is bylaws
2022-06-06 23:11:56 +01:00
PixelyIon
a5ca370c36 Implement thread-safe MegaBuffer pool
We currently have a global `MegaBuffer` instance that is shared across all channels, this is very problematic as `MegaBuffer` fundamentally works like a state machine with allocations (especially resetting/freeing) and is thread-specific. Therefore, we now have a pool of several `MegaBuffer`s which is allocated from by the `CommandExecutor` and kept channel specific as a result which also limits its usage to a single thread, this allows for individually resetting or freeing any allocations.
2022-06-05 13:04:40 +05:30
PixelyIon
3e08494146 Minor CommandScheduler refactor
There was a lot of redundant code in the `CommandScheduler` when the same functionality could be achieved with much shorter and cleaner code which this commit fixes. This includes no changes to the user-facing API and does not require any changes on the user side as a result.
2022-06-05 13:04:40 +05:30
Billy Laws
bd99d79b51 OsFileSystem: Close directory after file listing is finished 2022-06-04 21:46:23 +01:00
Billy Laws
4888919515 Stub GetFriendInvitationStorageChannelEvent (0x8C) 2022-06-04 21:45:53 +01:00
Billy Laws
d9f6540831 Fix VFS CreateFile directory creation 2022-06-04 19:19:30 +01:00
Billy Laws
f5bcb40c41 Return number of audio outs in ListAudioOuts 2022-06-04 19:12:37 +01:00
Billy Laws
5d6902b3f8 Stub audin:u 2022-06-04 19:11:57 +01:00
Billy Laws
54999957a2 Remove RGB565 format workaround
Will soon be redundant with new texture manager and is quite hacky so drop it.
2022-06-04 17:49:13 +01:00
Billy Laws
d79832091d Force append slash to directory path in OsFilesystem::CreateDirectory
The recursive path creation algorithm requires this to be the case
2022-06-04 17:44:49 +01:00
Billy Laws
616f7b7826 Correct instanced draw topology changed warning location
Before it would trigger even when the draw had the instanceNext flag set and thus wasn't part of the instanced draw at all.
2022-06-04 17:43:03 +01:00
Billy Laws
deb7a0e22a Implement 5x5 and 10x10 ASTC texture formats 2022-06-04 17:42:37 +01:00
Billy Laws
cc5a3f99c1 Reformat format description file 2022-06-04 17:42:13 +01:00
Billy Laws
a476bbaf4d Add 11_11_10 vertex buffer format 2022-06-04 17:41:10 +01:00
Billy Laws
71c37dd6c4 Add D24X8Unorm depth RT format support 2022-06-04 17:40:49 +01:00
Billy Laws
d3af629b83 Support R32G32B32A32 int RT formats 2022-06-04 17:38:57 +01:00
Billy Laws
0f5f04ade3 Set default surfaceflinger parameters based off of preallocated buffers
Required by resident evil 4 as otherwise Dequeue would fail due to it using BGRA buffers but the default being RGBA.
2022-06-04 16:55:08 +01:00
Billy Laws
106ad597db Support BGRA8888 surfaceflinger format
A swizzle is applied to R8G8B8A8 to transform it to BGRA since BGRA isn't a commonly supported swapchain format on Android.
2022-06-04 16:49:26 +01:00
Billy Laws
2bbeb6b08f Fix OsFileSystem initial directory creation
By passing basePath as an argument the CreateDirectory function did mkdir(basePath+basePath) which is obviously not the intended behaviour, fix this.
2022-06-03 19:33:31 +01:00
Billy Laws
84dec7561c Dont cache rendertarget mappings
Some games remap rendertargets or map them late which would lead to weird graphical bugs or crashes. Drop the caching since VMM lookup is fairly cheap anyway.
2022-06-03 19:31:52 +01:00
Billy Laws
581a016991 Add GuestTexture::GetSize helper function
This code was getting duplicated a bit so commonise into a helper function.
2022-06-03 19:30:54 +01:00
Billy Laws
31d418ad54 Fix 3D semaphore counter type 0 handling
Counter type 0 actually releases the semaphore payload rather than a constant zero as was previously thought. This is required by Skyrim.
2022-06-02 22:03:19 +01:00
Billy Laws
0202bf5531 Add semaphore release debug logs 2022-06-02 22:02:59 +01:00
Billy Laws
55cddc7a66 Update hades 2022-06-02 21:58:30 +01:00
Billy Laws
3736d36b75 Fix KPrivateMemory remap permissions 2022-06-02 18:10:35 +01:00
Billy Laws
389ab0fb50 Add {Map,Unmap}Physical memory debug logs 2022-06-02 18:10:10 +01:00
PixelyIon
2712b3276b Fix incorrect VkBufferImageCopy offset calculations
The `VkBufferImageCopy` offset calculations were wrong inside `CopyIntoStagingBuffer` as it multiplied the mip level's linear size by `levelCount` rather than `layerCount`. This led to substantial UB in games which called this function as it led to an overflow and resulted in writing to other areas of the buffer which caused major issues such as vertex/index buffer corruption and corresponding graphical glitches alongside likely being the cause of some crashes.
2022-06-02 22:14:22 +05:30
PixelyIon
06901ef22a Fix BC7 output swizzling from BGRA to RGBA
BC7 CPU decoding had the red and blue channels swapped around as it outputted a BGRA image after decoding while we expected an RGBA image to be produced. This should fix the colors of certain textures in titles such as Cuphead or Sonic Forces.
2022-06-02 19:48:55 +05:30
Billy Laws
9cb68c31e1 Stub nfp IUser::AttachAvailabilityChangeEvent 2022-06-02 00:04:01 +01:00
Billy Laws
33c9731eca Implement IFileSystem::CreateDirectory 2022-06-02 00:04:01 +01:00
Billy Laws
a09414424b Fix broken VFS directory creation 2022-06-02 00:04:01 +01:00
Billy Laws
3518e04a18 Correct Directory EntryType to be u8 rather than u32 2022-06-02 00:04:01 +01:00
Billy Laws
0c11d9e294 Implement IDirectory::GetEntryCount 2022-06-02 00:04:01 +01:00
MCredstoner2004
c15b3a8d40
Make Applet accesses to the data queues lock
Avoids potential races when the guest access the same applet from more than one thread.
2022-06-02 03:47:38 +05:30
Billy Laws
91b2c47991 Fix potential nvdrv submission race
The syncpoint maximum value represents the maximum possible syncpt value at a given time, however due to PBs being submitted before max was incremented, for a brief moment of time this is not the case which could lead to crashes or other such behaviour if a game waits on the fence at the right moment.
2022-06-01 17:15:25 +01:00
PixelyIon
37453ed7fa Use DocumentsProvider for log sharing
We used a `FileProvider` for log sharing prior, this is no longer necessary since it comes under the `DocumentsProvider` now which can be utilized to share the log document directly.
2022-06-01 21:41:14 +05:30
PixelyIon
8efa9298f9 Fix name conflict resolution for copyDocument
Any documents with the same name existing in a directory that is copied to would cause an exception due to existing already, this fixes that by handling conflict resolution in those cases and automatically determining a file name that would avoid a conflict.
2022-06-01 21:41:14 +05:30
Billy Laws
c4bd9c47e4 Stub NVGPU_GPU_IOCTL_ZBC_SET_TABLE nvdrv ioctl
This was missed in the original implementation and caused crashes in some games.
2022-06-01 16:59:14 +01:00
Billy Laws
c639fdcf06 Fixup NFP service stub state handling
Previously a broken state value was returned from GetState that caused crashes in games using newer SDKs and NFP, correctly handle state now by updating it after initialisation.
2022-06-01 15:00:26 +01:00
Billy Laws
c745e0e02b Move image type logic to GuestTexture, allowing 2D array views for 3D RTs
We can't render to a 3D texture through a 3D view, we instead have to create a 2D array view into it and render to that. The texture manager previously didn't support having a different view type/layer count between a guest texture view and the underlying storage texture that is required to support this so that was also implemented by reading the view layer count from the dimensions depth instead if the underlying texture is 3D (and the view type is 2D array). Additionally move away from our own view type enum to Vulkan, inline with other guest texture member types.
2022-05-31 22:09:53 +01:00
Billy Laws
22695c4feb Stub nim services used for eShop communication
We obviously don't need to implement these so add a simple set of stubs to satify games using them (mainly demos such as DQXII)
2022-05-31 22:07:01 +01:00
Billy Laws
ff12dc9c10 Add R32_SFLOAT to adreno validation layer format filtering 2022-05-31 22:03:53 +01:00
Billy Laws
6cc925c2d3 Reset RT mappings on dimension and format changes 2022-05-31 17:49:16 +01:00
Billy Laws
8180bf852e Lock textures before attaching in BlitContext 2022-05-31 16:54:13 +01:00
Billy Laws
cb2b36e3ab Allow providing a callback that's called after every CPU access in GMMU
Required for the planned implementation of GPU side memory trapping.
2022-05-31 16:04:27 +01:00
Billy Laws
46ee18c3e3 Require depthBiasClamp Vulkan device feature
Used in some UE4 games and supported by 95% of devices so skip implementing a fallback path.
2022-05-31 14:46:45 +01:00
PixelyIon
e592b11039 Drop samplerAnisotropy as a required GPU feature
Sampler anisotropy was made a required feature in an earlier commit due to its widespread availability but this was determined to be incorrect as certain Mali GPUs that can otherwise run 2D games in Skyline do not have this feature, while they are still not officially supported as this was the only roadblock to support them, it has now been made an optional feature.
2022-05-31 01:37:40 +05:30
PixelyIon
4336134b07 Reintroduce android:hasFragileUserData due to stable signature
`android:hasFragileUserData` was added in an earlier commit but then removed due to it not functioning because of signature checks. Now that signatures are consistent across builds, it has been readded and should now allow carrying data across CI and developer builds.
2022-05-31 01:37:07 +05:30
PixelyIon
b91ce939a2 Introduce CI build signing
We've done no signing of any Skyline APKs to date which causes issues regarding authenticity of any APKs as they could be entirely unofficial builds which have not been vetted by the team. Additionally, the different keys remove the ability to reinstall a different build successively as Android checks for matching signatures before installing an APK.
2022-05-31 01:25:18 +05:30
PixelyIon
e1cc8676cf Add option to view internal directory
With the Skyline document provider, easy access to the internal directory is required which may be hard to navigate to through the system file manager. This adds an option in settings to directly open up the directory in the system file manager.
2022-05-31 01:25:18 +05:30
PixelyIon
ba97985b55 Revise DocumentsProvider URI structure
The URIs (Document ID + Root) of the Skyline `DocumentsProvider` was unoptimal as it wasn't relative to a base directory. This is required for opening a root without knowledge of the full path in advance, it is therefore cleaner to provide a uniform `ROOT_ID` in a companion class.
2022-05-31 01:25:18 +05:30
Mylah Dee
ee7da31fc6 Add DocumentProvider for accessing internal files
On Android 12 and above, files from an application's external storage directory cannot be accessed by the user. The only proper SAF-compliant way to solve this is to create a `DocumentProvider` which proxies access to internal storage accordingly.
2022-05-30 15:09:30 +05:30
Narr the Reg
7aa6a5c4ca
Add HID touch attribute and index reporting
Adds missing parameter TouchAttribute and emulates correctly the touch point index. Both changes are necessary on Voez to keep track of each finger.
2022-05-29 10:28:51 +01:00
PixelyIon
80c8fb8791 Implement CPU BCn Texture Decoding
Certain GPU vendors such as ARM's Mali do not have support for BCn textures whatsoever while other vendors such as AMD only have partial support (BC1-BC3). Most titles on the guest utilize BC textures and to address this on host GPUs without support for BCn, we need to decompress the texture on the CPU. This commit implements a CPU BCn texture decoder based off Swiftshader's BC decoder, it also adds the necessary infrastructure to have different formats for the `GuestTexture` and `Texture` objects.
2022-05-28 21:22:24 +05:30
PixelyIon
fe615b1e03 Clarify texture swizzling inner-loop iteration count
The iterations of the inner loop for sector deswizzling was miscalculated as `SectorWidth * SectorHeight` while the result was correct at `32`, it should be determined by the amount of sector lines within a GOB i.e.: `(GobWidth / SectorWidth) * GobHeight`.
2022-05-28 21:22:24 +05:30
PixelyIon
7d4e0a7844 Implement Mipmapped Texture Support
Support for mipmapped textures was not implemented which is fairly crucial to proper rendering of games as the only level that would load is the first level (highest resolution), that might result in a lot more memory bandwidth being utilized. Mipmapping also has associated benefits regarding aliasing as it has a minor anti-aliasing effect on distant textures. 

This commit entirely implements mipmapping support but it does not extend to full support for views into specific mipmap levels due to the texture manager implemention being incomplete.
2022-05-28 21:22:24 +05:30
PixelyIon
da7e6a7df7 Replace Maxwell DMA GuestTexture usage with new swizzling API
Maxwell DMA requires swizzled copies to/from textures and earlier it had to construct an arbitrary `GuestTexture` to do so but with the introduction of the cleaner API, this has become redundant which this commit cleans up and replaces with direct calls to the API with all the necessary values.
2022-05-28 21:22:24 +05:30
PixelyIon
de300bfdbe Refactor Texture Swizzling
The API for texture swizzling is now more concrete and abstracted out from `GuestTexture`, this allows for neater usage in certain areas such as MaxwellDMA while having a `GuestTexture` wrapper as well allowing for neater usage in those cases. 

The code itself has also been cleaned up slightly with all usage of `u32`s being upgraded to `size_t` as this is simply more efficient due to the compiler not needing to emulate wraparound behavior for integer types smaller than the processor word size.
2022-05-19 17:13:55 +05:30
Billy Laws
72473369b6 Account for OOB copyOffsets in CircularBuffer::Read
Caused crashes in Pokemon
2022-05-14 15:30:59 +01:00
Robin Kertels
0a3cf25823 Implement the Fermi 2D blitting engine
The Fermi 2D engine implements both image blit and resolve operations, supporting subpixel sampling with both linear and point filtering.

Resolve operations are performed by sampling from the center of each pixel in order to resolve the final image from the MSAA samples
MSAA images are stored in memory like regular images but each pixels dimensions are scaled: e.g for 2x2 MSAA
```
112233
112233
445566
445566
```
These would be sampled with both duDx and duDy as 2 (integer part), resolving to the following:
```
123
456
```
Blit operations are performed by sampling from the corner of each pixel, scaling the image as one would expect.

This implementation isn't fully complete as Vulkan blit doesn't support some combinations which Fermi does, most notably between colour and depth stencil. These will be implemented properly at a later date, likely after the texture manager rework.
Out of Bounds Blit, used by some OpenGL games is also missing since supporting it requires texture aliasing, this will also be supported after the texture manager rework.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-05-13 22:37:37 +01:00
Billy Laws
be2546138d Move IOVA class to GMMU so it can be used for other engines 2022-05-13 22:37:37 +01:00
Billy Laws
3ad640fcbc Fix accidental graphics context member/parameter duplication 2022-05-13 22:37:37 +01:00
PixelyIon
7a6f27a19a Fix texture swizzling OOB writes
Certain writes during swizzling went out of bounds due to incorrect `blockExtentY` calculation, the previous commit to fix this ended up breaking it further. This commit returns to the original commit's calculations with the proper addendum of a check for exact alignment with a GOB which is the case that was broken earlier.
2022-05-13 14:52:41 +05:30
PixelyIon
168e51e7ad Always use GetLayerStride for layer stride in Texture
The `GuestTexture::GetLayerStride` function was not always being utilized to retrieve the layer stride inside `Texture`, it would instead directly access the `guestTexture::layerStride` member. This is problematic as it may not be initialized and return `0` which would lead to a broken image copy.
2022-05-13 14:21:37 +05:30
Billy Laws
b81d5bc865 Implement and cleanup semaphore operations in all engines
Most engines have the capability to release a semaphore payload (or reduce in the case of GPFIFO) when a method is called or action is complete. Semaphores are used by games for both timing how long things take on GPU and waiting on resources so missing them can cause deadlocks or other related issues.
2022-05-12 19:40:24 +01:00
Billy Laws
bca88685bd Stub nvdrv {Get,Dump}Status 2022-05-12 17:38:22 +01:00
Billy Laws
97e740c986 Fix slight locking bug with nvmap handle duplication 2022-05-12 17:38:22 +01:00
Billy Laws
57378457dc Treat symbol file paths without slashes as filenames
Prevents crashes printing backtrace if this occurs
2022-05-12 17:38:22 +01:00
Billy Laws
d08ac63bbf Use TIC maximum index over TSC when tscIndexLinked is set 2022-05-12 17:38:22 +01:00
Billy Laws
8e021a9f1f Load custom drivers from app private data dir
Required since /sdcard doesn't have exec perm support
2022-05-12 17:38:21 +01:00
Billy Laws
dcef597345 Introduce TrivialObject concept and use where appropriate
Simplifies type checking and handles excluding container types that are trivially copyable but contain pointers
2022-05-12 17:38:21 +01:00
PixelyIon
f2cc25ee9f Implement Array Texture Swizzling
Textures can have more than one layer which we currently don't handle, all layers past the initial one will be filled with random data or 0s, leading to incorrect rendering. This has now been implemented now which fixes any titles which utilize array textures, such as "Super Mario Odyssey" or "Hatsune Miku: Project DIVA MegaMix".
2022-05-12 18:23:45 +05:30
PixelyIon
2a99e1784d Fix Maxwell3D RT Depth/Layer Count Logic
The Maxwell3D RT layer count wasn't being set correctly as it has the same register as the depth values and is toggled between the two based on another register value.
2022-05-12 18:23:05 +05:30
Billy Laws
543ac3042e Cleanup account services and stub StoreSaveDataThumbnail 2022-05-11 23:24:35 +01:00
Billy Laws
7d30ac0cd8 Add additional nifm stubs 2022-05-11 23:24:35 +01:00
Billy Laws
a164635f32 Stub LibraryAppletPlayerSelect 2022-05-11 23:24:35 +01:00
PixelyIon
4ec1cc7086 Update Build Tools to 33.0.0-rc4
Google has removed `33.0.0-rc3` from their servers and it can no longer be downloaded by the CI, the build tools version has been updated as a result.
2022-05-12 02:53:01 +05:30
Billy Laws
dd0004e208 Set Host1x log tag correctly 2022-05-11 22:11:16 +01:00
Billy Laws
f89bacf8ae Fixup Host1x syncpoint locking 2022-05-11 22:04:02 +01:00
Billy Laws
d8ff318a1a Prevent infinite VFS read loop on EOF 2022-05-11 22:03:39 +01:00
shutterbug2000
f078a5d1ec Stub bt and btm:u
Stub BT services which is required by titles such as Pokémon Let's GO Pikachu and Eevee (non-Demo versions).
2022-05-11 20:44:09 +05:30
PixelyIon
588b4529ee Implement 3D Texture Swizzling
The Maxwell GPU supports 3D textures which are tiled with the block-linear layout which didn't handle swizzling 3D textures correctly till now. This commit addresses that by implementing proper swizzling for 3D textures. Titles such as Cluster Truck and Super Mario Odyssey utilize 3D textures alongside a vast majority of other titles.
2022-05-11 14:06:04 +05:30
Billy Laws
601d67e369 Use resource size rather than allocation size for staging buffer size
As per VMA docs: 'Allocation size returned in this variable may be greater than the size requested for the resource e.g. as VkBufferCreateInfo::size. Whole size of the allocation is accessible for operations on memory e.g. using a pointer after mapping with vmaMapMemory(), but operations on the resource e.g. using vkCmdCopyBuffer must be limited to the size of the resource.'
2022-05-10 18:48:20 +01:00
Billy Laws
d2acec24f5 Handle VFS reads into trapped memory regions
pread will refuse to read into any trapped regions so implement a manual path with a staging buffer and memcpy for such cases
2022-05-10 18:33:55 +01:00
Billy Laws
1609fd2a32 Account for layerCount in SynchronizeGuestWithBuffer staging buffer size 2022-05-10 18:33:31 +01:00
Billy Laws
5b97b87503 Restore previous cullMode when cullFace is enabled 2022-05-10 18:31:32 +01:00
Billy Laws
15e9fa1c80 Fix FillRandomBytes
There were two issues here:
- If a skyline span was passed as a param then the 'T &object' version would be called, filling the span itself with random values rather than its contents
- Random numbers were repeated every call since independent_bits_engine copied generator state and thus it was never actually updated
2022-05-10 18:28:15 +01:00
Billy Laws
622ff2a8f1 Correctly track 5.1 audio channel sample count
Size needs to be adjusted for 5.1 buffers since they're downsampled to stereo.
2022-05-10 18:26:20 +01:00
PixelyIon
56c9b03843 Fix incorrect swizzling Y extent calculation
This calculation for the amount of lines on the Y axis relative to the start of the last block was wrong and would instead determine the amount of lines to the last Y-axis GOB which wasn't accurate when padding was considered, this resulted in titles like Celeste having broken texture decoding (on a 1922x1082 texture) for the last ROB as most pixels would be masked out.
2022-05-09 20:25:43 +05:30
Billy Laws
018df355f0 Replace some VFS exceptions with warnings
These errors aren't necessarily fatal so tone them down.
2022-05-08 19:37:10 +01:00
Billy Laws
e1c13bbc08 Update hades 2022-05-08 19:37:10 +01:00
PixelyIon
b307fca115 Fix attachment reuse within the same subpass
Certain titles such as BOTW trigger behavior to reuse an attachment within the same subpass, this caused an exception inside `RenderPassNode::AddAttachment` as it cannot find corresponding subpass for attachment. To fix this issue, we now assume that when it cannot find a subpass for an existing attachment, it is attached to the latest subpass and return the attachment.
2022-05-08 18:26:40 +05:30
PixelyIon
e027555796 Handle Y-axis GOB non-alignment for swizzling
Certain textures may be unaligned with a GOB's height of 8 lines, we already handle the case of being unaligned with a GOB's width of 64-bytes. This case occurs on titles such as SMO when going in-game.
2022-05-07 18:37:22 +05:30
PixelyIon
c910e29168 Extend HostSignalHandler's SIGSEGV debugger path
The function now returns from a segmentation fault when a debugger is present, this allows the entire context to be intact which can allow the debugger to correctly pick up variables from all stack frames while it could not extrapolate most variables when trapped inside the signal handler without the values of all registers.
2022-05-07 18:37:22 +05:30
Billy Laws
4149ab1067 Implement Maxwell 3D instanced draw support
In the Maxwell 3D engine, instanced draws are implemented by repeating the exact same draw in sequence with special flag set in vertexBeginGl. This flag allows either incrementing the instance counter or resetting it, since we need to supply an instance count to the host API we defer all draws until state changes occur. If there are no state changes between draws we can skip them and count the occurences to get the number of instances to draw.
2022-05-07 13:56:09 +01:00
Billy Laws
03594a081c Ensure correct flushing for batched constant buffer updates
Cbufs could be read by non-maxwell3D engines so force a flush when switching to them or before Execute.
2022-05-07 13:56:09 +01:00
PixelyIon
ad989750fc Implement Maxwell3D Point Sprite Size
Implements register state that corresponds to the size of a single point sprite in Maxwell 3D, this is emitted by the shader compiler in the preamble but needs to be only applied if the input topology is a point primitive and it is invalid to set the point size in any other case.
2022-05-07 03:46:25 +05:30
PixelyIon
874a6a2a6c Fix getTextureType enum conversion fomatting 2022-05-07 03:46:25 +05:30
PixelyIon
ae5bcbdb5c Fix Depth RT lock to be in scope
Earlier texture locking design required the lock to be retained but since the introduction of `AttachTexture`, this no longer needs to be done. This being done caused deadlocks when the depth texture is sampled by the fragment shader while being bound as an RT since it would attempt to lock the texture again.
2022-05-07 02:37:48 +05:30
shutterbug2000
1c8d994161 Basic bcat:u implementation
A basic `bcat:u` implementation to prevent titles such as "Kirby and the Forgotten Land" dependent on BCAT support from crashing due to the lack of an implementation.
2022-05-06 15:41:48 +05:30
PixelyIon
4fd64a53e0 Require Vulkan samplerAnisotropy feature
This is a widely supported feature that games may require conditionally but due to it being supported on effectively all target devices, it was made mandatory. This is used by titles such as ARMS.
2022-05-06 15:41:48 +05:30
PixelyIon
1d9b4a865a Add additional formats to Adreno filter
`VK_FORMAT_R32G32B32A32_SFLOAT` and `D32_SFLOAT` have their capabilities misreported as well, this spams the logs in titles such as ARMS.
2022-05-06 15:41:48 +05:30
PixelyIon
b87295374e Improve Controller Applet log
Improves the readability of the log and replaces the previously uninformative prefix of `operator()` due to being in a lambda with `Controller support`.
2022-05-06 15:41:48 +05:30
PixelyIon
98c730a644 Implement linked TIC/TSC handle in Maxwell3D
Maxwell3D has a register for linking the TIC/TSC index in bindless texture handles, this is used by games to implement bindless combined texture-sampler handles.
2022-05-06 14:58:20 +05:30
PixelyIon
23a091100d Implement ReadCbufValue + ReadTextureType
Implements `GraphicsEnvironment::ReadCbufValue` & `GraphicsEnvironment::ReadTextureType` with a framework of heterogeneous lookups for caching and callbacks for querying constant buffer or TIC values with validation checks for successive draws to ensure unique IR is generated.
2022-05-06 14:39:36 +05:30
PixelyIon
765c3f4e1f Allow draws with no descriptor set resources
The `descriptorSetWrites` being filled is now optional and the case of it being empty is handled correctly, this is done by certain titles such as ARMS and is entirely valid behavior. It should be noted that not doing this leads to errors in the guest due to invalid GPU state while working on the host GPU.
2022-05-06 10:33:47 +05:30
PixelyIon
37327f1955 Fix and refactor SVC SignalToAddress/WaitForAddress
SVC `SignalToAddress` had a bug with the behavior of `SignalAndModifyBasedOnWaitingThreadCountIfEqual` which was entirely incorrect and led to deadlocks in titles such as ARMS that were dependent on it. This commit corrects the behavior and refactors both SVCs and moves their arbitration/waiting to inside the corresponding `KProcess` function rather than the SVC to avoid redundancies and improve code readability.
2022-05-05 19:15:37 +05:30
PixelyIon
396979e897 Extend Adreno format-based filtering for Validation Layer
Filtering of validation logs is now extended beyond BCn formats and now covers other format which have their feature set misreported by the driver, this significantly drives down the amount of logs depending on the title.
2022-05-05 19:15:37 +05:30
PixelyIon
62ea2a6da5 Avoid format aliasing warnings on Adreno
Implements an algorithm to determine formats that can be aliased as views without needing `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT`, this avoids spamming warning logs on view creation when the aliased formats will function in practice.
2022-05-05 19:15:37 +05:30
PixelyIon
7206ab4c67 Fix exclusiveSubpass by finishing render pass at end
There was an oversight with exclusive subpasses which could lead to RPs with more than one subpass could be created even though one pass was exclusive, this oversight was not finishing the render pass at the end of `AddSubpass`. This could lead to a future subpass adding to the end of that RP even though it was intended to exclusively have a single subpass.

This case occurs in titles such as Celeste (in-game) and breaks rendering on GPUs that may require exclusive subpasses for proper functionality.
2022-05-05 11:14:38 +05:30
PixelyIon
96fe5f0a0e Set initial subpassCount value to 1 rather than 0 2022-05-05 11:07:43 +05:30
PixelyIon
5d08d6e06f Disable unnecessary Khronos Validation Layer logs
The Khronos Validation Layer can often generate warning/error logs due to our intentional breakage from Vulkan specification, these can occur several times a frame resulting in the logs being spammed and making it difficult to extract useful information out of logs. The scope of these logs has now been reduced with more general filtering and the introduction of specialized filtering to handle complex cases such as BCn hacks with `libadrenotools` on Adreno devices.
2022-05-04 13:20:59 +05:30
PixelyIon
23c9388caf Fix VK_KHR_push_descriptor-less path for descriptor set updates
Descriptor set updates were broken on the non-push-descriptor path due to lifetime issues with VkDescriptorSetLayout's usage during the execution phase which entirely broke rendering on AMD/Mali GPUs due to them not supporting `VK_KHR_push_descriptor`. 

This commit addresses that by moving the allocation of a descriptor set to outside the lambda and into the recording phase, it also simplifies the semantics and resources passed into the lambda by removing redundancies.
2022-05-04 00:49:21 +05:30
PixelyIon
47bc3b4d99 Fix Render Pass Cache
The Vulkan render pass cache was fundamentally broken since it was designed around the Render Pass Compatibility clause due to being designed for framebuffer compatibility initially. As this scope was extended to a general render pass cache, the amount of data in the key was not extended to include everything it should have. This commit introduces the missing pieces in the RP cache and simplifies the underlying code in the process.
2022-05-01 20:31:36 +05:30
PixelyIon
25a29f9044 Skip zero-initializing shader bytecode backing
The backing for shader data would implicitly be zero-initialized due to a `resize` on every shader parse, this was entirely unnecessary as we would overwrite the entire range regardless. 

We avoid this by using statically allocated storage and a span over it containing the shader bytecode which avoids any unnecessary clear semantics without resorting to more complex solutions such as a custom allocator.
2022-05-01 18:27:27 +05:30
PixelyIon
42573170c6 Implement Framebuffer Cache
Implements a cache for storing `VkFramebuffer` objects with a special path on devices with `VK_KHR_imageless_framebuffer` to allow for more cache hits due to an abstract image rather than a specific one. 

Caching framebuffers is a fairly crucial optimization due to the cost of creating framebuffers on TBDRs since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding framebuffer memory.
2022-05-01 18:27:27 +05:30
PixelyIon
af7f0c301e Avoid redundant VkImageView recreation
There are a lot of cases of `VkImageView` being recreated arbitrarily due to it being tied to the ephemeral object `TextureView` rather than `Texture`, this commit flips that by storing all `VkImageView`s inside `Texture` with `TextureView` simply holding a copy of the handle to them. Additionally, this change results in stable `VkImageView` handles and helps in paving the path for framebuffer caching when `VK_KHR_imageless_framebuffer` is unavailable.
2022-05-01 18:27:27 +05:30
PixelyIon
41b2c2dc7b Add profileable attribute to AndroidManifest.xml
As we desire more accurate profiling data in certain circumstances, making the app explicitly profilable will allow for this, it will also remove the (annoying) prompt to do this in the Android Studio profiler.
2022-05-01 18:27:27 +05:30
PixelyIon
da931cf07b Implement Render Pass Cache
Implements a cache for storing `VkRenderPass` objects which are often reused, they are not extremely expensive to create generally but this is a required step to build up to a framebuffer cache which is an extremely expensive object to create on TBDRs generally since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding memory.
2022-05-01 18:16:53 +05:30
Billy Laws
ae77bde171 Fixup audio device name writing in services
Games expect the output buffer the be entirely zero filled past the device name.
2022-04-30 16:00:33 +01:00
Billy Laws
194cbe6c7c Stub several HID functions 2022-04-30 16:00:33 +01:00
Billy Laws
112c20cef2 Stub QueryAudioDevice{Input,Output}Event
Used in many 3.0.0+ games
2022-04-30 16:00:33 +01:00
Billy Laws
8d7dbe2c4e Add a way to get a readonly span of Buffer contents
Avoids the need redundantly copy data when it is being directly processed on the CPU (e.g. quad coversion)
2022-04-30 16:00:33 +01:00
MK73DS
4c71ef5c31 Fix American English language code 2022-04-30 18:43:22 +05:30
PixelyIon
4ec0f62e30 Update Kotlin, AGP, Gradle and Build Tools
Kotlin was updated to 1.6.21, AGP to 7.1.3, Gradle to 7.4.2 and Build Tools to 33.0.0-rc3.
2022-04-27 14:00:36 +05:30
PixelyIon
90c635bf78 Coalesce subpasses with compatible attachments together
We run into a lot of successive subpasses with the exact same framebuffer configuration which we now exploit to avoid the creation of a new subpass due to the overhead involved with this. This provides significant performance boosts in certain cases due to the magnitude of difference in the amount of subpasses being created while providing next to no benefit in other cases.
2022-04-27 13:22:34 +05:30
PixelyIon
a947933bf0 Fix Buffer cycle check being inverted
The check for the fence cycle being the same as the current cycle was incorrectly inverted to be the opposite of what it should have been, leading to bugs.
2022-04-27 13:07:36 +05:30
PixelyIon
54794f4b71 Move Texture locking and synchronization to PresentationEngine
The responsibility for synchronizing a texture and locking it is now on the `PresentationEngine` rather than the API-user as this'll allow more fine grained locking and delay waiting until necessary.
2022-04-25 21:01:16 +05:30
Billy Laws
1dd230afde Refactor all std::lock_guard usages to std::scoped_lock 2022-04-25 15:00:30 +01:00
PixelyIon
94e6f3cfa0 Add quirk for relaxed render pass compatibility
As we require a relaxed version of the Vulkan render pass compatibility clause for caching multi-subpass render passes, we now utilize a quirk to determine if this is supported which it is on Nvidia/Adreno while AMD/Mali where it isn't supported we force single-subpass render passes.
2022-04-24 16:18:36 +05:30
PixelyIon
44615c8dd2 Implement per-vendor VkQueue maximum global priority
We found out that certain vendors such as Nvidia had a limitation on the global priority of a queue and requesting `VK_QUEUE_GLOBAL_PRIORITY_HIGH_EXT` would result in `VK_ERROR_NOT_PERMITTED_EXT`. A quirk has been introduced to supply the maximum supported global priority which is currently set on a per-vendor basis to avoid future crashes.
2022-04-24 16:15:01 +05:30
PixelyIon
7ef4959060 Implement Graphics Pipeline Cache
Implements a cache for storing `VkPipeline` objects which are fairly expensive to create and doing so on a per-frame basis was rather wasteful and consumed a significant part of frametime. It should be noted that this is **not** compliant with the Vulkan specification and **will** break unless the driver supports a relaxed version of the Vulkan specification's Render Pass Compatibility clause.
2022-04-24 14:31:00 +05:30
PixelyIon
50a8b69f7b Optimize descriptor set writes using push descriptors
We can use inline push descriptors for writing to descriptor rather than allocating a descriptor set for a one time write and freeing it as this is rather inefficient while an inline push descriptor generally ends up being a direct `memcpy` on the driver side designed for this use-case.
2022-04-24 13:45:09 +05:30
PixelyIon
5adafbff04 Set VkQueue's global priority to high
We want Skyline to have the most favorable GPU scheduling possible due to low latency and high throughput requirements, we request high priority scheduling due to this reason.
2022-04-24 13:34:09 +05:30
PixelyIon
f9c052d1b7 Implement Maxwell3D Tessellation State
This implements all Maxwell3D registers and HLE Vulkan state for Tessellation including invalidation of the TCS (Tessellation Control Shader) state during state changes.
2022-04-24 13:23:00 +05:30
Billy Laws
de796cd2cd Implement overhead-free sequenced buffer updates with megabuffers
Previously constant buffer updates would be handled on the CPU and only the end result would be synced to the GPU before execute. This caused issues as if the constant buffer contents was changed between each draw in a renderpass (e.g. text rendering) the draws themselves would only see the final resulting constant buffer.

We had earlier tried to fix this by using vkCmdUpdateBuffer however this caused significant performance loss due to an oversight in Adreno drivers. We could have worked around this simply by using vkCmdCopy buffer however there would still be a performance loss due to renderpasses being split up with copies inbetween.

To avoid this we introduce 'megabuffers', a brand new technique not done before in any other switch emulators. Rather than replaying the copies in sequence on the GPU, we take advantage of the fact that buffers are generally small in order to replay buffers on the GPU instead. Each write and subsequent usage of a buffer will cause a copy of the buffer with that write, and all prior applied to be pushed into the megabuffer, this way at the start of execute the megabuffer will hold all used states of the buffer simultaneously. Draws then reference these individual states in sequence to allow everything to work without any copies. In order to support this buffers have been moved to an immediate sync model, with synchronisation being done at usage-time rather than execute (in order to keep contents properly sequenced) and GPU-side writes now need to be explictly marked (since they prevent megabuffering). It should also be noted that a fallback path using cmdCopyBuffer exists for the cases where buffers are too large or GPU dirty.
2022-04-23 22:48:28 +01:00
lynxnb
0d9992cb8e Implement QuadList support for non-indexed draws 2022-04-20 18:17:10 +02:00
lynxnb
bcaf7dfe1c Make GetVertexBuffer return a pointer to the requested buffer
This avoids a redundancy in the `Draw` function and makes code easier to read
2022-04-20 18:16:45 +02:00
Billy Laws
5c3559e888 Revert "Implement support for GPU-side constant buffer updating"
This reverts commit d79635772f.
2022-04-18 13:28:58 +01:00
Billy Laws
7bf3580031 Revert "Allow external synchronization for buffers"
This reverts commit 372ab8befa.
2022-04-18 13:28:58 +01:00
PixelyIon
ddc9622b90 Fix Shader Module Cache
As bindings weren't correctly handled due to the fact that `EmitSPIRV` would change the bindings, the shader module cache would not correctly function and have no cache hits in `find` and rather have them in `try_emplace` which negated any performance benefit of it. This has now been fixed by retaining the initial cache key for insertion into the cache while also storing the post-emit bindings and restoring them during a cache hit.
2022-04-18 12:18:15 +05:30
Billy Laws
32fe01e145 Implement batch constant buffer updates
Avoids spamming the driver with hundreds of cbuf updates per frame by batching all consecutive updates into one.
2022-04-17 00:35:00 +01:00
PixelyIon
02f99273ac Implement Shader Module Cache
Implements caching of the compiled shader module (`VkShaderModule`) in an associative map based on the supplied IR, bindings and runtime state to avoid constant recompilation of shaders. This doesn't entirely address shader compilation as an issue since host shader compilation is tied to Vulkan pipeline objects rather than Vulkan shader modules, they need to be cached to prevent costly host shader recompilation.
2022-04-16 18:45:56 +05:30
PixelyIon
76d8172a35 Implement Shader IR Cache
This implements the first step of a full shader cache with caching any IR by treating the shared pointer as a handle and key for an associative map alongside hashing the Maxwell shader bytecode, it supports both single shader program and dual vertex program caching.
2022-04-16 18:45:56 +05:30
PixelyIon
0baa90d641 Implement SpanEqual and SpanHash
We desire the ability to hash and check equality of data across spans to use associative containers such as `std::unordered_map` with spans. The implemented functions provide an easy way to do that.
2022-04-16 18:45:56 +05:30
Billy Laws
df5d1256c2 Implement an object backed IStorage backing
This is more convinient and efficient to use when passing structured data out of applets
2022-04-16 18:45:56 +05:30
Billy Laws
d115ce3c05 Stub the controller applet
Mostly based off of yuzu's implementation, this will need to be extended in the future to open up a UI for configuring controllers according to the applications requirements.
2022-04-16 18:45:56 +05:30
Billy Laws
9a8e39cba1 Slightly refactor controller code in HID
Now uses ranges where possible and a function to get the number of connected controllers has been added.
2022-04-16 18:45:56 +05:30
Billy Laws
2873f11baa Pass shared pointers by value in applet infrastructure
This is more optimal than crefs when used together with std::move
2022-04-16 18:45:56 +05:30
PixelyIon
8ccef733ff Fix UB with guest-less Texture/Buffers in MarkGpuDirty
As there was no check for the lack of a `GuestTexture`/`GuestBuffer`, it would lead to UB when a texture/buffer that had no guest such as the `zeroTexture` from `GraphicsContext` would be marked as dirty they would cause a call to `NCE::RetrapRegions` with a `nullptr` handle that would be dereferenced and cause a segmentation fault.
2022-04-16 18:45:56 +05:30
PixelyIon
372ab8befa Allow external synchronization for buffers
In certain situations such as constant buffer updates, we desire to use the guest buffer as a shadow buffer forwarding all writes directly to it while we update the host using inline buffer updates so they happen in-sequence. This requires special behavior as we cannot let any synchronization operations take place as they would break the shadow buffer, as a result, an external synchronization flag has been added to prevent this from happening. 

It should be noted that this flag is not respected for buffer recreation which will lead to UB, this can and will break updates in certain cases and this change isn't complete without buffer manager support.
2022-04-16 18:44:53 +05:30
PixelyIon
c0c4db68a8 Fix BufferView offset not being added in vkCmdUpdateBuffer
The offset of the view wasn't added to the `vkCmdUpdateBuffer`, this would cause the offset to be incorrect given the buffer was a view of a larger buffer that wasn't the start of it. This commit fixes that by adding the offset of the view to the buffer update.
2022-04-14 18:06:15 +05:30
PixelyIon
a1c06e0401 Mark GPU resources as dirty before GPU usage
We didn't call `MarkGpuDirty` on textures/buffers prior to GPU usage, this would cause them to not be R/W protected when they should be and provide outdated copies if there were any read accesses from the CPU (which are not possible at the moment since we assume all accesses are writes at the moment). This has now been fixed by calling it after synchronizing the resource.
2022-04-14 17:20:05 +05:30
PixelyIon
41a6afed01 Fix GraphicsContext code formatting for auto formatter 2022-04-14 15:27:22 +05:30
PixelyIon
624df92616 Change AddNonGraphicsPass to AddOutsideRpCommand
The terminology "Non-Graphics pass" was deemed to be fairly inaccurate since it simply covered all Vulkan commands (not "passes") outside the render-pass scope, these may be graphical operations such as blits and therefore it is more accurate to use the new terminology of "Outside-RenderPass command" due to the lack of such an implication while being consistent with the Vulkan specification.
2022-04-14 15:20:22 +05:30
Billy Laws
a31332e35f Align Maxwell 3D macro newline slashes 2022-04-14 14:14:52 +05:30
Billy Laws
d79635772f Implement support for GPU-side constant buffer updating
Previously constant buffer updates would be handled on the CPU and only the end result would be synced to the GPU before execute. This caused issues as if the constant buffer contents was changed between each draw in a renderpass (e.g. text rendering) the draws themselves would only see the final resulting constant buffer. Fix this by updating cbufs on the GPU/CPU seperately, only ever syncing them back at the start or after a guest side CPU write, at the moment only a single word is updated at a time however this can be optimised in the future to batch all consecutive updates into one large one.
2022-04-14 14:14:52 +05:30
Robin Kertels
036faedabd Implement a way to run non-graphics passes with command executor
These commands will end the current renderpass and run on their own, this is useful for compute, blits etc.
2022-04-14 14:14:52 +05:30
Billy Laws
feb179fcff Implement primitive restart support
Maxwell3D also supports using an arbitrary restart index value however no games are known to use this so leave it for now.
2022-04-14 14:14:52 +05:30
Billy Laws
3f3acc31d8 Rework swizzle infrastructure to support arbritary format swizzles
This is required to support R4G4B4A4 which has no directly corresponding Vulkan format.

Co-authored-by: Lunar-Pixel <lunarn452@gmail.com>
2022-04-14 14:14:52 +05:30
PixelyIon
6f85a66151 Implement host-only Buffers
We require certain buffers to only be on the host while being accessible through the same abstractions as a guest buffer as they must be interchangeable in usage.
2022-04-14 14:14:52 +05:30
Billy Laws
2c697ec36a Determine depth/stencil texture aspect based off of image swizzle
Required since we can't have a non-rt image with both a depth/stencil aspect at the same time according to vk spec.
2022-04-14 14:14:52 +05:30
PixelyIon
1878e582ad Add ScopedStackBlocker to RomFile.populate
We needed to block stack frame lookups past JNI code as Java doesn't follow the ARMv8 frame pointer ABI which leads to invalid pointer dereferences. Any JNI function that throws or handles exceptions must do this now or it may lead to a `SIGSEGV`.
2022-04-14 14:14:52 +05:30
Billy Laws
68e693d9f4 Fix DMA Engine debug logs to not crash emu
Address causes some type issues when printing directly so explicitly cast to u64 first to prevent them.
2022-04-14 14:14:52 +05:30
Billy Laws
8eaca87de8 Use an empty host texture in place of invalid TIC entries on guest
Some games may pass empty TICs as inputs to shaders while not actually using them within the shader. Create an empty texture and pass this in instead when we hit this case, the nullDescriptor feature could be used but it's not supported by all devices so we chose to do it this way instead.
2022-04-14 14:14:52 +05:30
PixelyIon
41b98c7daa Add stack tracing to skyline::exception
Skyline's `exception` class now stores a list of all stack frames during the invocation of the exception. These can later be parsed by the exception handler to generate a human-readable stack trace. To assist with more complete stack traces, `-fno-omit-frame-pointer` is now passed on debug builds which forces the inclusion of frames on function calls.
2022-04-14 14:14:52 +05:30
PixelyIon
cd8fa66326 Fix NCE Destruction
NCE is implicitly depended on by the `GPU` class due to the NCE Memory Trapping API so the destruction of it must take place after the destruction of the `GPU` class. Additionally, to prevent bugs the NCE destructor must set `staticNce` to `nullptr` as the signal handler will potentially access a destroyed instance of NCE otherwise.
2022-04-14 14:14:52 +05:30
Billy Laws
815f1f4067 Add support for sRGB TIC textures
Without this sRGB textures would be interpreted as RGB leading to colours being slighly off. The sRGB flag isn't stored as part of format word so we reuse the _pad_ field of it to store the flag for the switch case.
2022-04-14 14:14:52 +05:30
Billy Laws
1ba4abf950 Add Astc{6x6,8x8} and R4G4B4A4 image formats 2022-04-14 14:14:52 +05:30
MCredstoner2004
dec0571eee Infrastructure for applets to be implemented
This removes a stub for an applet and implements several applet related service calls.
2022-04-14 14:14:52 +05:30
PixelyIon
164d4852fa Sleep-loop rather than abort during termination
We don't want to actually exit the process as it'll automatically be restarted gracefully due to a timeout after being unable to exit within a fixed duration so we just want to infinite sleep during termination. This should fix issues where exiting any game would cause the app to force close after some time as exception signal handling would fail in the background, the app should stay open now and automatically restart itself when another game is loaded in.
2022-04-14 14:14:52 +05:30
PixelyIon
ea00f1bb82 Flush emulation logs after exceptions
A lot of logs are incomplete due to being unable to flush inside the signal handler, now we flush after any exceptions so that there is a guarantee of any exceptions being logged as this is crucial for proper debugging.
2022-04-14 14:14:52 +05:30
PixelyIon
62ba180550 Use R5G6B5 as Vulkan swapchain format rather than B5G6R5
B5G6R5 isn't generally supported by the swapchain and the format is used for R5G6B5 with swapped R/B channels to avoid aliasing so we reverse that by using R5G6B5 as the underlying Vulkan format for the swapchain which should be automatically handled by the driver for any copies from B5G6R5 textures and the data representation should be the same as B5G6R5 with swapped R/B channels so not reporting the correct texture::Format should be fine.
2022-04-14 14:14:52 +05:30
MK73DS
e54f86e923 Fix IApplicationFunctions::GetDisplayVersion id
(https://switchbrew.org/wiki/Applet_Manager_services#IApplicationFunctions)
2022-04-14 14:14:52 +05:30
Billy Laws
77cf33b643 Trigger command executor before DMA copies
DMA copies can use textures currently in active use on the GPU as dst/src so Execute before to prevent a deadlock
2022-04-14 14:14:52 +05:30
Billy Laws
dbbc5704d2 Implement DMA engine Block Linear->Linear copies 2022-04-14 14:14:52 +05:30
Billy Laws
3e4e8de1d2 Implement primitive Linear->Block Linear DMA engine copies
Slightly inaccurate and misses some features but good enough for most games, should be revisted later.
2022-04-14 14:14:52 +05:30
Billy Laws
3c26921d54 Implement the Maxwell DMA engine
The DMA engine is used to perform DMA buffer/texture copies directly on the GPU. It can deswizzle arbritary regions of input textures, perform component remapping and swizzle into output textures.
This impl only supports 1D buffer copies, 2D ones will come later.
2022-04-14 14:14:52 +05:30
Billy Laws
3df76e84c3 Stub IRequest::GetAppletInfo in nifm 2022-04-14 14:14:52 +05:30
Billy Laws
6c5f9941ad Stub additional IAddOnContentManager functions
Used mainly by UE4 games
2022-04-14 14:14:52 +05:30
Billy Laws
486a835d0a Use guest texture view type to determine the underlying image type
If we have a Nx1x1 image then determining the type from dimensions will result in a 1D image being created thus preventing us from creating a 2D view. By using the image view type we can avoid this for textures from TICs since we know in advance how they will be used
2022-04-14 14:14:52 +05:30
Billy Laws
05966f34e5 Stub a pair of ISelfController functions
Both used by SMO, SetScreenShotPermission and SetAlbumImageOrientation
2022-04-14 14:14:52 +05:30
Billy Laws
fe37d7c9be Implement ICommonStateGetter::SetRequestExitToLibraryAppletAtExecuteNextProgramEnabled 2022-04-14 14:14:52 +05:30
Billy Laws
9813f9f8dc Implement ICommonStateGetter::GetDefaultDisplayResolutionChangeEvent 2022-04-14 14:14:52 +05:30
Billy Laws
7e7c0252ca Implement IApplicationFunctions::GetDisplayVersion 2022-04-14 14:14:52 +05:30
Billy Laws
b1f10865a0 Attach depth RT to command executor before draws
This enforces that the depth RT outlives the draw, without this the depth RT could be freed while in active use by command executor leading to UAFs and crashes.
2022-04-14 14:14:52 +05:30
Billy Laws
0182fabc50 Stub {Set,Get}NpadHandheldActivationMode in HID 2022-04-14 14:14:52 +05:30
Billy Laws
2e197cead5 Support D32S8_Float_Uint_Unorm_Unorm depth/stencil format 2022-04-14 14:14:52 +05:30
Billy Laws
7717a86fb1 Implement VMM region->region copies
Required by the DMA engine, a simple memcpy doesn't work since the buffers could span multiple blocks.
2022-04-14 14:14:52 +05:30
Billy Laws
af90d4f977 Implement audren Surround->Stereo downmixing 2022-04-14 14:14:52 +05:30
PixelyIon
ad0005f398 Remove guard-page from main thread stack
This was erroneously included while migrating from older code where stack creation was entirely handled with host constructs such as `mmap` directly to using `KPrivateMemory` to manage it, we would create a guard page with `mprotect` that the guest was unaware about and would cause a segfault when a guest accessed the extents of the stack as reported to the guest.
2022-04-14 14:14:52 +05:30
PixelyIon
de81d28b1d Implement SVC GetThreadContext3
A partial implementation of the `GetThreadContext3` SVC, we cannot return the whole thread context as the kernel only stores the registers we need according to the ARMv8 ABI convention and so far usages of this SVC do not require the unavailable registers but all future usage must be monitored and potentially require extending the amount of saved registers.
2022-04-14 14:14:52 +05:30
PixelyIon
b706aa3463 Implement SVC SetThreadActivity
This SVC can pause/resume a thread, it is used by engines like Unity to pause a thread during a GC world stop.
2022-04-14 14:14:52 +05:30
PixelyIon
36a7ad06bd Use built-in vibrator by default for controller #0
The vibration device had to be set manually prior which led to it generally not being set at all even though a user might want vibration, this commit fixes that by making controller #0 use the built-in vibrator by default.
2022-04-14 14:14:52 +05:30
lynxnb
69ba4f8abb Swap out boostorg/boost for skyline-emu/boost 2022-04-14 14:14:52 +05:30
PixelyIon
b45437b78b Move Skyline internal files to external directory
Any Skyline files that should have been user-accessible were moved from `/data/data/skyline.emu/files` to `/sdcard/Android/data/skyline.emu/files` as the former directory is entirely private and cannot be accessed without either adb or root. This made retrieving certain data such as saves or loading custom driver shared objects extremely hard to do while this can be trivially done now.
2022-04-14 14:14:52 +05:30
Billy Laws
e5e20f39c9 Implement a simple constant buffer cache
In some games such as SMO thousands of constant buffers are bound per frame which was causing an unreasonable number of lookups in both vmm and the buffer manager. Work around this by introducing a simple hashmap based cache, eviction is currently unsupported but not really necessary yet due to the small size of the buffers in the cache.
2022-04-14 14:14:52 +05:30
PixelyIon
cb2614f80e Handle host accesses for NCE Memory Trapping API
We cannot ignore accesses from the host to a region protected by the NCE Memory Trapping API, there's often access to regions which have overlap with a protected region unintentionally and those accesses need to be handled correctly rather than leading to a crash. This is done by implementing an additional signal handler `NCE::HostSignalHandler` to lookup any potential traps on a `SIGSEGV` and handle them correctly or when there isn't a corresponding trap raise a `SIGTRAP` when debugger is connected or delegate to `signal::ExceptionalSignalHandler` when it isn't.
2022-04-14 14:14:52 +05:30
PixelyIon
b04a0c386a Page out RW-trapped memory in NCE Memory Trapping
To cut down memory usage we now page out memory that is RW trapped via the NCE memory trapping API, the callbacks are supposed to page in the memory. This behavior is backed up by Texture/Buffer syncing which would read the host copies of data and write it to the guest, by paging the corresponding data on the guest we're avoiding redundant memory usage.
2022-04-14 14:14:52 +05:30
PixelyIon
344c5f2a62 Implement RAII wrapper over file descriptors
The `FileDescriptor` class is a RAII wrapper over FDs which handles their lifetimes alongside other C++ semantics such as moving and copying. It has been used in `skyline::kernel::MemoryManager` to handle the lifetime of the ashmem FD correctly, it wasn't being destroyed earlier which can result in leaking FDs across runs.
2022-04-14 14:14:52 +05:30
PixelyIon
7ce2a903a1 Update LLVM + Oboe
Initially this commit was only intended to update LLVM but due to a compilation error  on latest LLVM libcxx due to the C++ stdlib header `<algorithm>` being a transitive dependency that is no longer transitively included on the latest LLVM libcxx (as of https://reviews.llvm.org/D119667), this required changes in Skyline and Oboe which were done in https://github.com/google/oboe/pull/1521 and the submodule has been updated to include those changes.
2022-04-14 14:14:52 +05:30
Billy Laws
c549788377 Update shader compiler 2022-04-14 14:14:52 +05:30
Billy Laws
01c027b9f6 Fix GetBlockLinearLayerSize to avoid incorrectly calculating a zero size 2022-04-14 14:14:52 +05:30
PixelyIon
c84badb498 Update NDK (25.0.8221429) + Gradle (7.4.1) + Build Tools (33.0.0) 2022-04-14 14:14:52 +05:30
lynxnb
08e24915d8 Add support for drawing inside the display cutout areas 2022-04-14 14:14:52 +05:30
MK73DS
6e929e6f6a Stub ICommonStateGetter::SetCpuBoostMode
This makes Metroid Dread boot
2022-04-14 14:14:52 +05:30
Billy Laws
d033ff2478 Fix draws when no colour RTs and only depth is bound 2022-04-14 14:14:52 +05:30
Billy Laws
d137051833 Add basic support for 3d/cubemap textures
These are mostly used in 3D games like SMO, support is still quite basic and synchronising block linear 3D texture will crash in most cases due to them being unimplemented.
2022-04-14 14:14:52 +05:30
Billy Laws
bcc00216b7 Fix incorrect Bc2/3 block sizes 2022-04-14 14:14:52 +05:30
PixelyIon
7e9b0fec77 Increase reported audren revision to 11
Some games crash due to requiring an `audren` version greater than 7. The `audren` version can be increased without any issues as `audren` is stubbed and therefore the reported version doesn't matter.
2022-04-14 14:14:52 +05:30
PixelyIon
e294fa8c91 Add subpass limit quirk to fix Adreno driver bug
Older Adreno proprietary drivers (5xx and below) will segfault while destroying the renderpass and associated objects if more than 64 subpasses are within a renderpass due to internal driver implementation details. This commit introduces checks to automatically break up a renderpass when that limit is hit.
2022-04-14 14:14:52 +05:30
PixelyIon
65d5a3bce5 Align all Buffers to page boundary
We have support for overlapping buffers which allows us to merge a lot of smaller buffers located on a single page into a single larger buffer which allows for better performance. It additionally ensures that all host buffers match the alignment guarantees of the guest and adequately fulfill host alignment requirements.
2022-04-14 14:14:52 +05:30
PixelyIon
cb1ec9a7f4 Rework BufferManager, Buffer and BufferView
This commit encapsulates a complex sequence of cascading changes in the process of supporting overlaps for buffers:
* We determined that it is impossible to resolve overlaps with multiple intervals per buffer within the constraints of each overlap being a contiguous view, support for multiple intervals was therefore dropped. The older buffer manager code was entirely reworked to be simpler due to only handling one interval per buffer with code now being based off `IntervalMap` but tailored specifically for buffers.
* During overlap resolution, the problem of how existing views into the buffer being recreated would be updated, it had to be replaced with a larger buffer that could contain all overlaps and all existing views would need to be repointed to it. This was addressed by a buffer owning all views to itself, we could automatically recalculate the offset of all views and update the buffers with it.
* We still needed to update usage of existing views which was done by handling all access (such as inside a recorded draw) to buffer view properties via `BufferView::RegisterUsage` which dispatches a callback with the view and the corresponding backing buffer. This callback can be stored and called during overlap resolution with the new buffer.
* We had issues with lifetime of the buffer with the handle-like semantics of `BufferView` introduced in the last buffer-related commit, if we updated the view to be owned by a new buffer we'd need to extend the lifetime of the new buffer not the older one and the only way to do this was a proxy owner object `BufferDelegate` which holds a shared pointer to the real `Buffer` which in-turn holds a pointer to all `BufferDelegate` objects to update on repointing. A `BufferView` is effectively just a wrapper around `std::shared_ptr<BufferDelegate>` with more favorable semantics but generally just forwarding calls.
It should be additionally noted that to support usage of `RegisterUsage` the code around buffers in `GraphicsContext` was refactored to defer truly binding till the recording phase.
2022-04-14 14:14:52 +05:30
PixelyIon
a6781b38f4 Clear syncBuffers after CommandExecutor execution
Due to an oversight, we weren't clearing the list of buffers that needed to be synced after every execution which led to them building up. Due to the relatively cheap synchronization of buffers and only doing so on faults this wasn't caught until now, it does depress the framerate significantly over time due to the size of the list growing to be in the range of 100k buffer views depending on the title.
2022-04-14 14:14:52 +05:30
kaikecarlos
49c0ba1207 Implement IAccountServiceForApplication::IsUserRegistrationRequestPermitted 2022-04-14 14:14:52 +05:30
kaikecarlos
e8cc760b10 Implement IHidServer Functions
Add GetVibrationDeviceInfo and StartSixAxisSensor
2022-04-14 14:14:52 +05:30
kaikecarlos
9f51664b1d Stub IRS Service 2022-04-14 14:14:52 +05:30
lynxnb
707c0cc0af Stub aocsrv::IAddOnContentManager::ListAddOnContent 2022-04-14 14:14:52 +05:30
lynxnb
873ed641ea Stub nfp::IUser::ListDevices and nfp::IUser::GetState 2022-04-14 14:14:52 +05:30
lynxnb
7d518cba2b Stub am::ICommonStateGetter::IsVrModeEnabled 2022-04-14 14:14:52 +05:30
Billy Laws
c55e1a135e Update adrenotools 2022-04-14 14:14:52 +05:30
Robin Kertels
594f061b21 Implement SSBOs
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-04-14 14:14:52 +05:30
Billy Laws
82d2a9ab56 Unify engine related macros to avoid excessive code duplication 2022-04-14 14:14:52 +05:30
Billy Laws
ae41ddf4f0 Implement a skeleton compute engine
The Kepler compute engine is used to run compute jobs encapsulated in to QMDs on the GPU, this commit doesn't implement compute itself but adds the register and QMD structs that will be needed for it in the future.
2022-04-14 14:14:52 +05:30
Billy Laws
0298a7b1f6 Implement the actual inline to memory engine on subch 2
Used mostly by OGL games for copying stuff around.
2022-04-14 14:14:52 +05:30
Billy Laws
ba7111d33a Add maxwell3d I2M support 2022-04-14 14:14:52 +05:30
Billy Laws
8c73b62b2c Implement basic inline2memory engine support
Not currently used by anything but will be used by both compute, 3D and its own engine in the future. Block linear copies are currently unsupported.
2022-04-14 14:14:52 +05:30
Billy Laws
5c387f5c5a Fixup depth mode init value to allow ignoring redundant calls 2022-04-14 14:14:52 +05:30
PixelyIon
7a5c771f44 Rework GPU BufferView to have handle-like semantics
We wanted views to extend the lifetime of the underlying buffers and at the same time preserve all views until the destruction of the buffer to prevent recreation which might be costly in the future when we need `VkBufferView`s of the buffer but also require a centralized list of all views for recreation of the buffer. It also removes the inconsistency between `BufferView*` being returned in `GetXView` in `GraphicsContext`.
2022-04-14 14:14:52 +05:30
Billy Laws
fae5332f20 Disable descriptor aliasing on Adreno to workaround shader compiler bug
Alised descriptor sets are incorrectly interpreted by the shader compiler causing it to bugger up LLVM function argument types and crash

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-04-14 14:14:52 +05:30
Billy Laws
fc2c123ae2 Implement GPU depthMode register
This controls the depth range used by the shader, hades already has support for the necessary patching so we only need to pass the current mode over to it and it'll do the necessary work.
2022-04-14 14:14:52 +05:30
Billy Laws
7e088ca465 Fix constbuf updates to actually increment the write offset
Uses the register directly now as when we modify it we want the changes to be visible from macros too.
2022-04-14 14:14:52 +05:30
PixelyIon
d2f3479610 Use eB5G6R5UnormPack16 VkFormat for B5G6R5Unorm and R5G6B5Unorm
Using `eB5G6R5UnormPack16` (with a swizzle for `R5G6B5Unorm`) removes the need for `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT` when those formats are aliased which happens in Sonic Mania among other titles.
2022-04-14 14:14:52 +05:30
PixelyIon
24d7066d8b Add quirk to avoid VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT on Adreno GPUs
Adreno GPUs have significant performance penalties from usage of `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT` which require disabling UBWC and on Turnip, forces linear tiling. As a result, it's been made an optional quirk which doesn't supply the flag in `VkImageCreateInfo` and logs a warning if a view with a different Vulkan format from the original image is created.
2022-04-14 14:14:52 +05:30
PixelyIon
731d06010d Set eMutableFormat in Texture Image Creation
We often need to alias the underlying data as multiple Vulkan formats which requires the `eMutableFormat` bit to be set in `VkImageCreateInfo`, without doing this there'll be validation layer errors and potentially GPU bugs.
2022-04-14 14:14:52 +05:30
PixelyIon
dafcfa68ca Transition texture layout to eGeneral after creation
As we no longer set the layout to general inside the Texture constructor, yet, we need it to be set prior to the image being used as an attachment. We need to transition the layout to `eGeneral` after creation of the texture object.
2022-04-14 14:14:52 +05:30
PixelyIon
5dea15632c Add Controller Setup Guide
A setup guide for controllers that goes through every available button/stick sequentially and opens up a corresponding dialog to map them.
2022-04-14 14:14:52 +05:30
PixelyIon
e2cae74425 Fix RecyclerView height in CoordinatorLayout for non touch-mode
Any `RecyclerView`s with an app bar in a `CoordinatorLayout` would end up going off-screen due to the layout behavior implementing an offset by using a transform which would not correctly handle focusing on off-screen objects. This has now been fixed by manually adjusting height to be clipped to what is visible on the screen.
2022-04-14 14:14:52 +05:30
PixelyIon
3ae62c7fcc Collapse appBarLayout on appList focus
We collapse the app bar when the focus is on the app list which only occurs while using a controller, this is required as the app bar will never be collapsed otherwise. It also removes the older code to work around the limitation on `View.FOCUS_DOWN` by collapsing only when the end of the list was reached.
2022-04-14 14:14:52 +05:30
PixelyIon
3e4ec7323b Tweak grid compact items
Removes card elevation as it visually conflicts with the scrim, this also makes the scrim a bit darker to emphasize the text and slightly reduces the border radius.
2022-04-14 14:14:52 +05:30
PixelyIon
1d984b6de3 Add padding to end of app_list
A small amount of padding at the end of `app_list` to signify that the end of the list has been reached was added.
2022-04-14 14:14:52 +05:30
PixelyIon
bac7c526ef Make layout selectable for grid items
The entire layout is now selectable for grid items rather than just the card, this greatly increases the visibility of the selection when not in touch mode as the contrast of a darken effect on the icon can be minimal depending on how dark the icon already is.
2022-04-14 14:14:52 +05:30
PixelyIon
1d070e6332 Close InputStream after usage in KeyReader
The `InputStream` would not be closed after reading the key file in `KeyReader#import`, it's now wrapped with `use{ }` which handles closing the stream after usage.
2022-04-14 14:14:52 +05:30
MK73DS
647cb07dc8 Stub functions in IAccountServiceForApplication:
- GetUserCount
- InitializeApplicationInfo
- IsUserAccountSwitchLocked
2022-04-14 14:14:52 +05:30
PixelyIon
b41d8b7997 Use Surface#setFrameRate for suggesting display refresh rate
Setting the refresh rate via the Display API's`preferredDisplayModeId` is an outdated method to do it on Android 11 and above, we now use `Surface#setFrameRate` alongside it to suggest a refresh rate for the display.
2022-04-14 14:14:52 +05:30
PixelyIon
730bf504f8 Correct Adreno texture binding quirk
We incorrectly determined an Adreno driver bug to require padding between binding slots but the real issue was not supporting consecutive binding writes for `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` and was fixed by the padding slot unintentionally requiring individual writes. The quirk has now been corrected to explicitly specify this as the bug and the solution is more apt.
2022-04-14 14:14:52 +05:30
PixelyIon
da8cb48933 Fix Interval Map GetAlignedRecursiveRange lookup bug
Any lookups done using `GetAlignedRecursiveRange` incorrectly added intervals in the exclusive interval entry lookups as the condition for adding them was the reverse of what it should've been due to a last minute refactor, it led to graphical glitches and crashes. This has been fixed and the lookups should return the correct results.
2022-04-14 14:14:52 +05:30
PixelyIon
f2faa74707 Fix crashes due to SEGV_ACCERR check
On certain devices, accesses to a protected memory region can return `si_code` as non-`SEGV_ACCERR` values, this leads to a crash as we only pass access violations to the trap handler and would lead to not doing so on those devices which would then result in going to the crash handler.
2022-04-14 14:14:52 +05:30
PixelyIon
62c16fa73e Upgrade Gradle (7.4), AGP (7.1.2) and Kotlin Dependencies 2022-04-14 14:14:52 +05:30
PixelyIon
77e2797219 Delete expired weak_ptrs for Texture/Buffer views
A large amount of Texture/Buffer views would expire before reuse could occur in `Texture::GetView`/`Buffer::GetView`. These can lead to a substantial memory allocation given enough time and they are now deleted during the lookup while iterating on all entries.

It should be noted that there are a lot of duplicate views that don't live long enough to be reused and the ultimate solution here is to make those views live long enough to be reused.
2022-04-14 14:14:52 +05:30
PixelyIon
881bb969c4 Implement access-driven Buffer synchronization
Similar to constant redundant synchronization for textures, there is a lot of redundant synchronization of buffers. Albeit, buffer synchronization is far cheaper than texture synchronization it still has associated costs which have now been reduced by only synchronizing on access.
2022-04-14 14:14:52 +05:30
PixelyIon
7532eaf050 Attach Texture to Cycle in Texture::TransitionLayout
Not doing so could result in the texture being destroyed before the completion of a transition and lead to undefined behavior.
2022-04-14 14:14:52 +05:30
PixelyIon
3268b3779a Implement access-driven Texture synchronization
There was a lot of redundant synchronization of textures to and from host constantly as we were not aware of guest memory access, this has now been averted by tracking any memory accesses to the texture memory using the NCE Memory Trapping API and synchronizing only when required.
2022-04-14 14:14:52 +05:30
PixelyIon
3e33d49faf Implement NCE Memory Trapping API
An API for trapping accesses to guest memory and performing callbacks based on those accesses alongside managing protection of the memory. This is a fundamental building block for avoiding redundant synchronization of resources from the guest and host.

Note: All accesses are treated as write accesses at the moment, support for picking up read accesses will be implemented later
2022-04-14 14:14:52 +05:30
PixelyIon
08510d75b0 Implement Interval Map
An interval map is a crucial piece of infrastructure required for memory faulting to track any regions that have an associated callback and their protection. Additionally, efficient page-aligned lookups with semantics optimal for memory faulting are also a requirement and the ability to associate multiple regions with a single callback/protection entry rather than doing so on a per-region basis as we deal with split-mapping resources.
2022-04-14 14:14:52 +05:30
PixelyIon
5c9e42e384 Use mirror mappings for Textures and Buffers
This is a prerequisite to memory trapping as we need to write to the mirror to avoid a race condition with external threads writing to a texture/buffer while we do so ourselves for the sync on a read/write, it also avoids an additional `mprotect` to `-WX`/`RWX` on a read access.

An additional advantage for textures especially is that we now support split-mapping textures due to laying them out in a contiguous mirror and they will not require costly algorithmic changes. Buffers should also benefit from not needing to iterate over every region when they are split into multiple mappings.
2022-04-14 14:14:52 +05:30
PixelyIon
577a67babd Support mirrors of multiple non-contiguous memory regions
`CreateMirror` is limited to creating a mirror of a single contiguous region which does not work when creating a contiguous mirror of multiple non-contiguous regions. To support this functionality, `CreateMirrors` which expects a list of page-aligned regions and maps them into a contiguous mirror.
2022-04-14 14:14:52 +05:30
PixelyIon
e35ab6d1e0 Move to mapping guest AS as shared memory
We want to create arbitrary mirrors in the guest address space and to make this possible, we map the entire address space as a shared memory file. A mirror is mapped by using `mmap` with the offset into the guest address space.
2022-04-14 14:14:52 +05:30
Billy Laws
a5dd961f01 Add support for batched method sending
Important for constbuf updates which would be very slow if done one at a
time.
2022-04-14 14:14:52 +05:30
Robin Kertels
43879e2476 Round up when calculating size of compressed texture in bytes 2022-04-14 14:14:52 +05:30
Robin Kertels
d889550e84 Don't set COLOR_ATTACHMENT_BIT for compressed formats.
The better solution would be to only set this for formats that support
it on original HW but this will get rid of the validation errors for now.
2022-04-14 14:14:52 +05:30
Robin Kertels
82296ac5b8 Use buffer size instead of allocation size for Buffer constructor
Fixes a validation error.
2022-04-14 14:14:52 +05:30
Robin Kertels
752245c3c8 Enable provoking vertex feature 2022-04-14 14:14:52 +05:30
Robin Kertels
dd45d054e7 Enable shaderDrawParameters 2022-04-14 14:14:52 +05:30
Billy Laws
737ff840a5 Update adrenotools for BTI 2022-04-14 14:14:52 +05:30
Billy Laws
7e16c1f989 Heavily optimise GPFIFO command dispatch to reduce redundant checks
Previously for methods with count > 1 the subchannel and engine would be looked up for each part of the method rather than only doing so at the start. Each call also needed to be looked up to see if it touched a macro or GPFIFO method. Fix this by doing checks outside of the main dispatch loop with templated helper lambdas to avoid needing to repeat lots of code. Maxwell3D is the only subchannel with a fast path for now but more can be added later if needed.
2022-04-14 14:14:52 +05:30
Billy Laws
b4927d0138 Add support for turnip and driver file redirection via libadrenotools 2022-04-14 14:14:52 +05:30
Billy Laws
dd91d063a5 Pass native library dir to OS + reorder OS init order so paths are first
This is required for integrating libadrenotools, which needs access to
library and app directories in the GPU class constructor.
2022-04-14 14:14:52 +05:30
Billy Laws
900d00a876 Update tzcode with missed bugfix 2022-04-14 14:14:52 +05:30
Billy Laws
011de98940 Rework formats to support passing through guest swizzle values
Almost every Maxwell format now directly corresponds to a Vulkan format. This allows formats to be passed through and the swizzle used directly from guest (with some extra swizzle handling for edge cases) thus saving the need to explicitly support each swizzle combination which is adds a lot of code bloat. The format header is additionally reordered with line breaks to separate formats by their bits-per-block.
2022-04-14 14:14:52 +05:30
Billy Laws
6f17d1351f Fixup ordering for B10G11R11Float texture format 2022-04-14 14:14:52 +05:30
Billy Laws
78238d550a Add 6 channel downmixing support for Audout
The specific attenuations used for each channel are taken from Ryujinx.
2022-04-14 14:14:52 +05:30
Billy Laws
2e1a1a965d Fixup AudioTrack locking 2022-04-14 14:14:52 +05:30
PixelyIon
727f83e969 Fix Incorrect Vertex Binding Divisor State Submission
We always submit pipeline divisor descriptions regardless of binding input rate being vertex rather than instance. This is invalid behavior and has been fixed by only submitting binding descriptors when the input rate is per-instance.
2022-04-14 14:14:52 +05:30
PixelyIon
9f7e80cf8f Fix Adreno Texture Sampler Binding Bug
Adreno proprietary drivers suffer from a bug where `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` requires 2 descriptor slots rather than one, we add a padding slot to fix this issue. `QuirkManager` was introduced to handle per-vendor/per-device errata and allow enabling this on Adreno proprietary drivers specifically as to not affect the performance of other devices.
2022-04-14 14:14:52 +05:30
PixelyIon
ddb2ba8a1b Rename QuirkManager to TraitManager
Quirk terminology was deemed to be inappropriate for describing the features/extensions of a device. It has been replaced with traits which is far more fitting but quirks will be used as a terminology for errata in devices.
2022-04-14 14:14:52 +05:30
PixelyIon
0b2ce6a8f3 Fix Texture Handle Offset Calculation
The texture handle offset calculation involved an incorrect shift by descriptor size which was found to be unnecessary and would result in an invalid handle that had the wrong TIC/TSC index and caused broken rendering.
2022-04-14 14:14:52 +05:30
PixelyIon
aa57ec6d55 Destroy CommandExecutor Nodes Before Waiting on Execution
`nodes` and `syncTextures` were cleared after waiting on the `CommandExecutor` fence rather than before, this wasted execution time after the wait for something that could be performed prior to the wait.
2022-04-14 14:14:52 +05:30
PixelyIon
90a1b3348c Implement D24S8 + R11G11B10 Formats 2022-04-14 14:14:52 +05:30
PixelyIon
bd718175ce Enable VK_KHR_uniform_buffer_standard_layout when available
We now attempt to enable `VK_KHR_uniform_buffer_standard_layout` when present as lax UBO layout significantly reduces complexity. If a device doesn't support this extension, we still assume that the device supports it implicitly as this has proven to be true across all major mobile GPU vendors regardless of the driver version but enabling this prevents validation layer errors.
2022-04-14 14:14:52 +05:30
PixelyIon
22ce531e6f Force Memory Barrier at VkRenderPass Start
We depend on past commands to have completed execution in a renderpass, a subpass dependency on all graphics stages from `VK_SUBPASS_EXTERNAL` to subpass #0 is used to enforce this. Nvidia and Adreno proprietary drivers implicitly do this but Turnip or Mali drivers require this or they execute out of order.
2022-04-14 14:14:52 +05:30
PixelyIon
35fde2cd0b Rework Blocklinear Texture Deswizzling
Blocklinear texture decoding was broken for padding blocks and would incorrectly decode them resulting in major texture corruption for any textures with their widths not aligned to 64 bytes. This has now been fixed with neater code which avoids redundant repetition of any code using lambdas and functions where necessary.
2022-04-14 14:14:52 +05:30
PixelyIon
043be4d8f7 Implement Maxwell3D Two-Side Stencil Toggle
Stencil operations are configurable to be the same for both sides or have independent stencil state for both sides. It is controlled via the previously unimplemented `stencilTwoSideEnable`.
2022-04-14 14:14:52 +05:30
PixelyIon
80ae7b255a Implement Maxwell3D Front Face Flip 2022-04-14 14:14:52 +05:30
PixelyIon
40a3887695 Implement Maxwell3D Viewport Y Swizzle & Lower-Left Origin 2022-04-14 14:14:52 +05:30
Billy Laws
3be30e68c3 Add D16 depth format and ZF32 TIC format
Used by One Piece Unlimited World Red
2022-04-14 14:14:52 +05:30
Billy Laws
be007c4ccc Fixup texture swizzling to actually function
Before this we were not applying the supplied swizzles, will be
superseeded in the future by using guest swizzle values.
2022-04-14 14:14:52 +05:30
Billy Laws
6e48460c0d Add BC2/3 format support 2022-04-14 14:14:52 +05:30
Billy Laws
2253bc3151 Reorder GPU quirks member to prevent it constructing after device init 2022-04-14 14:14:52 +05:30
Billy Laws
62db21fb78 Rework GPFIFO method distribution and macros to support multiple engines
Fermi2D supports macros in addition to Maxwell3D, these both share code memory. To support this we rework the macro interpreter to support passing in a target engine and abstract the communications out into an interface that can be implemented by applicable engines.

```
GPFIFO <-> MME <-> Maxwell3D
    ^        ^---> Fermi2D
    X------------> I2M
    X------------> MaxwellComputeB
    X--Flush-----> MaxwellDMA
```
2022-04-14 14:14:52 +05:30
Billy Laws
8d5463ef28 Drop engine base class usage from GPFIFO
This class does nothing since we made stopped GPFIFO submits from using
virtual functions so it can be dropped.
2022-04-14 14:14:52 +05:30
Billy Laws
4378658cbc Update BCeNabler to support ---X .text devices 2022-04-14 14:14:52 +05:30
PixelyIon
41aad83c33 Tie Shader ObjectPool Lifetime to Shader Program
Shader programs allocate instructions and blocks within an `ObjectPool`, there was a global pool prior that was never reaped aside from on destruction. This led to a leak where the pool would contain resources from shader programs that had been deleted, to avert this the pools are now tied to shader programs.
2022-04-14 14:14:52 +05:30
PixelyIon
e747de37cf Implement Blocklinear TIC Type 2022-04-14 14:14:52 +05:30
PixelyIon
723189a948 Calculate Blocklinear Texture Aligned Size Correctly
The size of blocklinear textures did not consider alignment to Block/ROB boundaries before, it is aligned to them now. Incorrect sizes led to textures not being aliased correctly due to different size calculations for GraphicBufferProducer surfaces and Maxwell3D color RTs.
2022-04-14 14:14:52 +05:30
Billy Laws
95685b8207 Avoid iterator invalidation segfault when unregistering a syncpt waiter
erase invalidated `it` leading to a potential segfault if the GPU was very far behind, bail out early to avoid that since there can only be one occurence at most in the buffer anyway.
2022-04-14 14:14:52 +05:30
Billy Laws
e7bfd93541 Implement BC7 format support
Used by ARMS
2022-04-14 14:14:52 +05:30
Billy Laws
99652c5eda Support partially mapped cbufs
Buggy games sometimes supply an incorrect cbuf size so limit buffers to the first unmapped region.
2022-04-14 14:14:52 +05:30
PixelyIon
6a6f51ea84 Implement Maxwell3D Depth/Stencil State
Implements the entirety of Maxwell3D Depth/Stencil state for both faces including compare/write masks and reference value. Maxwell3D register `stencilTwoSideEnable` is ignored as its behavior is unknown and could mean the same behavior for both stencils or the back facing stencil being disabled as a result of this it is unimplemented.
2022-04-14 14:14:52 +05:30
PixelyIon
9f5c3d8ecd Force Textures to be Optimal on Host GPU
We don't respect the host subresource layout in synchronizing linear textures from the guest to the host when mapped to memory directly, this leads to texture corruption and while the real fix would involve respecting the host subresource layout, this has been deferred for later as real world performance advantages/disadvantages associated with this change can be observed more carefully to determine if it's worth it.
2022-04-14 14:14:52 +05:30
Billy Laws
ab4962c4e4 Implement additional texture formats, including BCn
BCeNabler is required for BCn textures, the pre-swizzled formats will be removed when arbitary swizzle support is added later.
2022-04-14 14:14:52 +05:30
Billy Laws
600b94505c Fix A2R10G10B10 render target format
This was wrongly described as R10G10B10A2 in the enum when it's actually A2R10G10B10, a format natively supported in Vulkan with just a swizzle.
2022-04-14 14:14:52 +05:30
Billy Laws
175ba11f07 Integrate BCeNabler support into QuirkManager
Allows using BCn format textures on devices where they are unsupported by the driver.
2022-04-14 14:14:52 +05:30
Billy Laws
47d920d91e Make GPU private static functions file-local 2022-04-14 14:14:52 +05:30
PixelyIon
edd51c3dfa Fix Color RT Disabling Bug
Color RTs are disabled by setting their format as `None`, it was removed while transitioning to macros and resulted in a missing format exception. It has been readded as several applications depend on this behavior.
2022-04-14 14:14:52 +05:30
PixelyIon
a2285669b3 Use static vector for shader bytecode to prevent constant reallocation
Using `std::vector` for shader bytecode led to a lot of reallocation due to constant resizing, switching over the static vector allows for a single static allocation of the maximum possible guest shader size (1 MiB) to be done for every stage resulting in a 6 MiB preallocation which is unnoticeable given the total memory overhead of running a Switch application.
2022-04-14 14:14:52 +05:30
PixelyIon
21a6866def Fix Maxwell3D Blend Enum Conversion Bugs
The `OneMinusSourceAlpha` blending factor was converted to `eOneMinusSrcColor` rather than `eOneMinusSrcAlpha` leading to incorrect blending behavior in certain titles. A similar issue with the order of `MinimumGL`/`MaximumGL` and `SubtractGL`/`ReverseSubtractGL` being the opposite of what it should've been, both of these issues have been fixed.
2022-04-14 14:14:52 +05:30
PixelyIon
0a506088f4 Fix NextSubpassNode Subpass Index Bug
`NextSubpassNode` didn't increment `subpassIndex` which runs commands with the wrong subpass index resulting in them accessing invalid attachments or other bugs that may arise from using the wrong subpass.
2022-04-14 14:14:52 +05:30
PixelyIon
defbfe8f78 Serialize Maxwell3D Draw State for Subpass
All Maxwell3D state was passed by reference to the draw command lambda, this would break if there was more than one pass or the state was changed in any way before execution. All state has now been serialized by value into the draw command lambda capture, retaining state regardless of mutations of the class state.
2022-04-14 14:14:52 +05:30
PixelyIon
934130b3e6 Remove Implicit Command Executor Resource Attachment
Any usage of a resource in a command now requires attaching that resource externally and will not be implicitly attached on usage, this makes attaching of resources consistent and allows for more lax locking requirements on resources as they can be locked while attaching and don't need to be for any commands, it also avoids redundantly attaching a resource in certain cases.
2022-04-14 14:14:52 +05:30
PixelyIon
f0e9c42097 Fix Fence Cycle Double Insertion Lifetime Bug
If an object is attached to a `FenceCycle` twice then it would cause `FenceCycleDependency::next` to be overwritten and lead to destruction of dependencies prior to the fence being signaled causing usage of deleted resources. This commit fixes this by tracking what fence cycle a dependency is currently attached to and doesn't reattach if it's already attached to the current fence cycle.
2022-04-14 14:14:52 +05:30
PixelyIon
6a831f6ed7 Add VK_EXT_shader_demote_to_helper_invocation Quirk
An assumption was hardcoded into `Shader::Profile` regarding devices supporting demotion of shader invocations to helpers. This assumption wasn't backed by enabling the `VK_EXT_shader_demote_to_helper_invocation` extension via a quirk leading to assertions when it was used by the shader compiler, a quirk has now been added for the extension and is supplied to the shader compiler accordingly.
2022-04-14 14:14:52 +05:30
lynxnb
58c871ed9a Correctly hide system bars in EmulationActivity on Android >= 11 2022-04-14 14:14:52 +05:30
Billy Laws
3ff8075151 Move vertex and RT format conv to macros and fill them fully in
Makes the format conversions easier to read and shorter, and adds in
some new formats needed to complete the RT table properly.
2022-04-14 14:14:52 +05:30
PixelyIon
8f0db18624 Fix ControllerActivity Controller Type Change Crash
If the controller type was changed from a type with a larger amount of buttons/axes to one with a fewer amount, a crash would occur due to the transition animation retaining those elements as children yet returning `NO_POSITION` from `getChildAdapterPosition` in `DividerItemDecoration` which was an unhandled case and led to an OOB array access.
2022-04-14 14:14:52 +05:30
PixelyIon
2c46709064 Fix ControllerPreference's index not being passed to Activity
A bug caused by not passing the index argument to `ControllerActivity` led to all preferences opening the activity that pertained to Controller #1. This was fixed by passing the `index` argument in the activity launch intent.
2022-04-14 14:14:52 +05:30
PixelyIon
270ee4a7a6 Update Gradle + AGP + Kotlin Dependencies
Gradle was updated to `7.3.3` and AGP to `7.1.0-rc01` from `beta04` with all other dependencies being updated to the latest available versions.
2022-04-14 14:14:52 +05:30
PixelyIon
98b366c1f5 Fix Texture Synchronization Bug
Fixes texture corruption due to incorrect synchronization, the barrier would not enforce waiting till the texture was entirely rendered causing an incomplete texture to be downloaded which lead to rendering bugs for certain GPUs including ARM's Mali GPUs.
2022-04-14 14:14:52 +05:30
PixelyIon
aea40e6496 Fix enabledFeature2 Unlinking Assertion Bug
A bug caused an assertion if both `VK_EXT_custom_border_color` and `VK_EXT_vertex_attribute_divisor` due to mistakenly unlinking `PhysicalDeviceVertexAttributeDivisorFeaturesEXT` instead of `PhysicalDeviceCustomBorderColorFeaturesEXT` when `VK_EXT_custom_border_color` isn't supported which would potentially lead to unlinking the same structure twice and cause the assertion.
2022-04-14 14:14:52 +05:30
Billy Laws
68f31c3688 Use macros for defining texture formats and their conversions
Avoids the need to repeat all the possible component types for each texture format while also making them simpler to add and easier to read.
2022-04-14 14:14:52 +05:30
lynxnb
a9d4e6bb1a Add screen orientation setting 2022-04-14 14:14:52 +05:30
PixelyIon
bc29b23972 Implement CPU-only Maxwell3D Inline Constant Buffer Updates
Implements inline constant buffer updates that are written to the CPU copy of the buffer rather than generating an actual inline buffer write, this works for TIC/TSC index updates but won't work when the buffer is expected to actually be updated inline with regard to sequence rather than just as a buffer upload prior to rendering. 

GPU-sided constant buffer updates will be implemented later with optimizations for updating an entire range by handling GPFIFO `Inc`/`NonInc`directly and submitting it as a host inline buffer update.
2022-04-14 14:14:52 +05:30
PixelyIon
08f29f7da4 Make ActiveDescriptorSet movable and non-copyable
There should only ever be a single instance of a `ActiveDescriptorSet` that tracks the lifetime of a descriptor set as the destructor is responsible for freeing the descriptor set.

There are cases where a new object inheriting the descriptor set needs to be created in these cases we need to have move semantics and make the destructor of the prior object inert, this allows for moving to the new object without any side effects. If the copy constructor was used in these cases the older object would free the set on its destruction which would lead to the set being invalid on existing instances which is incorrect behavior and would likely lead to driver crashes.
2022-04-14 14:14:52 +05:30
PixelyIon
bb14af4f7a Implement Maxwell3D Sampled Textures
The descriptor sets should now contain a combined image and sampler handle for any sampled textures in the guest shader from the supplied offset into the texture constant buffer.

Note: Games tend to rely on inline constant buffer updates for writing the texture constant buffer and due to it not being implemented, the value will be read as 0 which is incorrect.
2022-04-14 14:14:52 +05:30
PixelyIon
d9a9e52350 Use ConstantBuffer instead of BufferView for Shader Constant Buffers
We want read semantics inside the constant buffer object via the mappings to avoid a pointless GPU VMM mapping lookup. It is a fairly frequent operation so this is necessary, the ability to write directly will be added in the future as well.
2022-04-14 14:14:52 +05:30
PixelyIon
adb0a16873 Implement Maxwell 3D Textures
Implements parsing for the Maxwell 3D TIC pool and conversion of a TIC into a `GuestTexture`, support is limited to pitch-linear RGB565/A8R8G8B8 textures at the moment but will be extended as games utilize more formats and layouts. Support for 1D buffers is also omitted at the moment since they need special handling with them effectively being treated as buffers in Vulkan rather than images.
2022-04-14 14:14:52 +05:30
PixelyIon
a7b90e7825 Change Texture Pitch Unit to Bytes from Pixels
The pitch of the texture should always be supplied in terms of bytes as it denotes alignment on a byte boundary rather than a pixel one, it is also always utilized in terms of bytes rather than pixels so this avoids an unnecessary conversion.

Note: GBP stride unit was assumed to be pixels earlier but is likely bytes which is why there are no changes to the supplied value there, if this is not the case it'll be fixed in the future
2022-04-14 14:14:52 +05:30
PixelyIon
a9aa16798f Add -fsigned-bitfields for defined bitfield int behavior
We want consistent behavior between signed `int`s in bitfields and outside of bitfields, the `-fsigned-bitfields` flag enforces this behavior.
2022-04-14 14:14:52 +05:30
PixelyIon
87c8dc94d2 Implement Maxwell3D Samplers
Maxwell3D `TextureSamplerControl` (TSC) are fully converted into Vulkan samplers with extension backing for all aspects that require them (border color/reduction mode) and approximations where Vulkan doesn't support certain functionality (sampler address mode) alongside cases where extensions may not be present (border color).
2022-04-14 14:14:52 +05:30
PixelyIon
e48a7d7009 Fix Mapping Caching For Maxwell 3D Buffers
Code involving caching of mappings was copied from `RenderTarget` without much consideration for applicability in buffers, the reason for caching mappings in RTs was that the view may be invalidated by more than the IOVA/Size being changed but this doesn't hold true for buffers generally so invalidation can only be on the view level with the mappings being looked up every time since the invalidation would likely change them.
2022-04-14 14:14:52 +05:30
PixelyIon
ff27dce24c Implement ObjectHash for hashing trivial objects in maps
`std::hash` doesn't have a generic template where it can be utilized for arbitrary trivial objects and implementing this might result in conflicts with other types. To fix this a generic templated hash is now provided as a utility structure, that can be utilized directly in hash-based containers such as `unordered_map`.
2022-04-14 14:14:52 +05:30
PixelyIon
97cfcba0da Add Nullability for Optional Semantics to span
Nullability allow for optional semantics where a span may be explicitly invalidated with `nullptr` being used as a sentinel value for it and a boolean operator that allows trivial checking for if the span is valid or not.
2022-04-14 14:14:52 +05:30
PixelyIon
c11962e8e4 Implement Maxwell3D Bindless Texture Constant Buffer Index
The index of the constant buffer with bindless texture descriptors is now retrieved from Maxwell3D register state and passed to the shader compiler.
2022-04-14 14:14:52 +05:30
PixelyIon
1c3f62b7b4 Implement Maxwell3D Indexed Drawing 2022-04-14 14:14:52 +05:30
PixelyIon
23cdfe2139 Implement Maxwell3D Index Buffers
Adds support for index buffers including U8 index buffers via the `VK_EXT_index_type_uint8` extension which has been added as an optional quirk but an exception will be thrown if the guest utilizes it but the host doesn't support it.
2022-04-14 14:14:52 +05:30
PixelyIon
a4041364e1 Address CR comments
Note: CR comments regarding `ShaderSet` and `PipelineStages` will be addressed at a later date with a common class for associative enum arrays.
2022-04-14 14:14:52 +05:30
PixelyIon
e1e14e781f Support Dual Vertex Shader Programs
Add support for parsing and combining `VertexA` and `VertexB` programs into a single vertex pipeline program prior to compilation, atomic reparsing and combining is supported to only reparse the stage that was modified and recombine once at most within a single pipeline compilation.
2022-04-14 14:14:52 +05:30
PixelyIon
974cf03c18 Add Atomic Pipeline Stage Invalidation
Atomically invalidate pipeline stages as runtime information that pertains to them changes rather than never recompiling pipelines on runtime information being updated resulting in out of date pipelines or recompiling all pipelines on any runtime information updates.
2022-04-14 14:14:52 +05:30
PixelyIon
5414db8411 Rework Maxwell3D Shader/Pipeline Stages Compilation with UBO support
Shader compilation is now broken into shader program parsing and pipeline shader compilation which will allow for supporting dual vertex shaders and more atomic invalidation depending on runtime state limiting the amount of work that is redone. 

Bindings are now properly handled allowing for bound UBOs to be converted to the appropriate host UBO as designated by the shader compiler by creating Vulkan Descriptor Sets that match it.
2022-04-14 14:14:52 +05:30
PixelyIon
055d315048 Seperate Maxwell3D Stages into Shader/Pipeline
We need this to make the distinction between a shader and pipeline stage in as shader programs are bound at a different rate than that of pipeline stage resources such as UBO.
2022-04-14 14:14:52 +05:30
PixelyIon
492dd47218 Implement Vulkan Descriptor Set Allocator
A fixed descriptor set allocator which manages the size of the pool with automatic reallocations when any allocations run out of descriptors.
2022-04-14 14:14:52 +05:30
PixelyIon
9af9f1d41a Implement Maxwell3D Constant Buffer Selector
The Constant Buffer Selector is used to point to a constant buffer that will be bound to a shader stage or updated with inline data.
2022-04-14 14:14:52 +05:30
PixelyIon
afa34e320a Retain Shader Binding State Across Stages
An instance of `Shader::Backend::Bindings` must be retained across all stages for correct emission of bindings, which is now done inside `GraphicsContext::GetShaderStages`.
2022-04-14 14:14:52 +05:30
PixelyIon
550d12b7fa Set Shader Runtime Generic Vertex Attribute Types Correctly
The vertex attribute types supplied prior were just the default which is `Float`, this works for some cases but will entirely break if the attribute type isn't a float. The attribute types are now set correctly.
2022-04-14 14:14:52 +05:30
PixelyIon
a2de6b9255 Fix Maxwell3D vertexEndGl Register Offset
The offset was set to 0x586 which is the location of `vertexBeginGl`, it's been corrected now and set to 0x585.
2022-04-14 14:14:52 +05:30
Billy Laws
5815cda7a7 Update Vulkan-Hpp to v1.2.202 2022-04-14 14:14:52 +05:30
PixelyIon
bd6cd0056c Support Multi-Aspect Copy in Texture::CopyIntoStagingBuffer
Only copying a single aspect was supported by `CopyIntoStagingBuffer` earlier due to not supplying a `VkBufferImageCopy` for each aspect separately, this has now been done with Color/Depth/Stencil aspects having their own `VkBufferImageCopy` for the `VkCmdCopyImageToBuffer` command.
2022-04-14 14:14:52 +05:30
PixelyIon
daff17c776 Order TextureView Definition Correctly
The definition of the `TextureView` class was spread across `texture.cpp` and has now been moved to the top of the file above the other half of the definition.
2022-04-14 14:14:52 +05:30
PixelyIon
189b9533f2 Disable Vertex Buffers With 0 as IOVAs
A buffer with 0 as the start/end IOVA should be invalid as there shouldn't be any mappings at 0 in GPU VA, titles such as Puyo Puyo Tetris configure the Vertex Buffer with 0 IOVAs which leads to a segmentation fault without this exception.
2022-04-14 14:14:52 +05:30
PixelyIon
cfeb8098db Attach TextureView/BufferView Lifetime to FenceCycle
The lifetime of a texture and buffer view is now bound by the `FenceCycle` in `CommandExecutor`, this ensures that a `VkImageView` isn't destroyed prior to usage leading to UB.
2022-04-14 14:14:52 +05:30
PixelyIon
34fc1e32b8 Remove Textures from RenderPassNode::Storage
The lifetime of all textures bound to a RenderPass alongside syncing of textures is already handled by `CommandExecutor` and doesn't need to be redundantly handled by `RenderPassNode`. It's been removed as a result of this.
2022-04-14 14:14:52 +05:30
PixelyIon
45c7a89fc3 Cleanup BufferView/TextureView Locking Code
Renames the variable to be neater and less confusing alongside adding comments for `try_lock()` to make the goal of the function more apparent.
2022-04-14 14:14:52 +05:30
PixelyIon
7776ef2cd0 Support Depth/Stencil RT in Draw
Adds the depth/stencil RT as an attachment for the draw but with `VkPipelineDepthStencilStateCreateInfo` stubbed out, it'll not function correctly and the contents will not be what the guest expects them to be.
2022-04-14 14:14:52 +05:30
PixelyIon
525850ae09 Stub VkPipelineDepthStencilStateCreateInfo
Maxwell3D Depth State is composed of several registers and will be implemented at a later date, for the time being it's been stubbed.
2022-04-14 14:14:52 +05:30
PixelyIon
9e63ecf05d Implement Maxwell3D Depth/Stencil Clears
Support for clearing the depth/stencil RT has been added as its own function via either optimized `VkAttachmentLoadOp`-based clears or `vkCmdClearAttachments`. A bit of cleanup has also been done for color RT clears with the lambda for the slow-path purely calling the command rather than creating the parameter structures.
2022-04-14 14:14:52 +05:30
PixelyIon
bf89f96bf5 Implement Optimized LoadOp Clears for Depth/Stencil Attachments
Implements `AddClearDepthStencilSubpass` in `CommandExecutor` which is similar to `ClearColorAttachment` in that it uses `VK_ATTACHMENT_LOAD_OP_CLEAR` for the clear which is far more efficient than using `VK_ATTACHMENT_LOAD_OP_LOAD` then doing the clear.
2022-04-14 14:14:52 +05:30
PixelyIon
6f6413f02d Fix VkSubpassDependency for Depth/Stencil Attachments
The stage/access mask for `VkSubpassDependency` were hardcoded to only be valid for color attachments earlier, this has now been fixed by branching based on the format aspect.
2022-04-14 14:14:52 +05:30
PixelyIon
aa32f6b017 Add Depth/Stencil Format Support to Texture
Sets `VkImageUsageFlags` correctly rather than hardcoding it for color attachments and adds multiple `VkBufferImageCopy` to `VkCmdCopyBufferToImage` for Color/Depth/Stencil aspects of an image.
2022-04-14 14:14:52 +05:30
PixelyIon
68c990c041 Implement Maxwell3D Depth/Stencil Render Target
Support the Maxwell3D Depth RT for Z-buffering, this just creates an equivalent `RenderTarget` object with no support on the API-user side (IE: `Draw` and `ClearBuffers`).
2022-04-14 14:14:52 +05:30
PixelyIon
2a8bcc60c7 Make Render Targets Abstract for Color/Depth RTs
This prefixes all RT functions that deal with color RTs with `Color` and abstracts out common functions that will be used for both color and depth RTs. All common Maxwell3D structures are also moved out of the `ColorRenderTarget` (`RenderTarget` previously) structure.
2022-04-14 14:14:52 +05:30
PixelyIon
b0f084ae32 Implement Shader Compiler Input Topology
Sets the input toplogy in the runtime information for the shader compiler correctly based on the Maxwell3D input topology.
2022-04-14 14:14:52 +05:30
PixelyIon
7a63ad7d3d Implement VkPipelineCache for host pipeline caching
To allow for caching of pipelines on the host a `VkPipelineCache` has been added, it is entirely in-memory and is not flushed to the disk which'll be done in the future alongside caching guest shaders to further avoid translation where possible.
2022-04-14 14:14:52 +05:30
PixelyIon
4dcf12c4c0 Implement Maxwell3D Draws
Uses all Maxwell3D state converted into Vulkan state to do an equivalent draw on the host GPU, it sets up RT/Vertex Buffer/Vertex Attribute/Shader state and creates a stubbed out `VkPipelineLayout` for the draw. Any descriptor state isn't currently handled and is yet to be implemented, currently there's no Vulkan pipeline cache supplied which will be implemented subsequently.
2022-04-14 14:14:52 +05:30
PixelyIon
57b0d6a2fb Stub VkPipelineMultisampleStateCreateInfo
Multisampling will be worked on later and for the time being is being safely stubbed by setting the sample count to 1.
2022-04-14 14:14:52 +05:30
PixelyIon
56b3a01a59 Track VkRenderPass and Subpass Index for Subpass Function Nodes
We require a handle to the current renderpass and the index of the subpass in certain cases, this is now tracked by the `CommandExecutor` and passed in as a parameter to `NextSubpassFunctionNode` and the newly-introduced `SubpassFunctionNode`.
2022-04-14 14:14:52 +05:30
PixelyIon
cb7f68b98d Allow Attaching Texture/Buffers to CommandExecutor
Switch from `SubmitWithCycle` to manually allocating the active command buffer to tag dependencies with the `FenceCycle` that prevents them from being mutated prior to execution. This new paradigm could also allow eager recording of commands with only submission being deferred.
2022-04-14 14:14:52 +05:30
PixelyIon
aeea3e6f66 Allow manual allocation of ActiveCommandBuffer
`CommandScheduler` API users can now directly allocate an active command buffer that they need to manage alongside its fence, this can allow for more efficient recording as it doesn't need to be immediately submitted after, it can also allow attaching objects to a `FenceCycle` prior to submission that can be useful for locking resources.
2022-04-14 14:14:52 +05:30
PixelyIon
8989305637 Implement Host Vertex Buffer Translation
Uses the buffer cache to retrieve an equivalent host vertex buffer for a corresponding guest vertex buffer.
2022-04-14 14:14:52 +05:30
PixelyIon
b6ba770a27 Implement Maxwell3D Shader Compilation
Compiles shaders supplied by the guest with caching and automatic invalidation, the size of the shader is also automatically determined by looking for `BRA $` instructions which cause an infloop, it should be noted that we have a maximum shader bytecode size, any shader above this size will not be supported.
2022-04-14 14:14:52 +05:30
PixelyIon
08afda6ac4 Implement Graphics Shader Compilation in ShaderManager
Graphics shaders can now be compiled using the shader compiler and emit SPIR-V that can be used on the host. The binding state isn't currently handled alongside constant buffers and textures support in `GraphicsEnvironment` yet.
2022-04-14 14:14:52 +05:30
PixelyIon
353ca8ec84 Fix Viewport X/Y Translation
The operands of the subtraction in the X/Y translation calculation were the wrong way around which led to negative translations that would translate the viewport off the screen.
2022-04-14 14:14:52 +05:30
PixelyIon
f06a12170f Set Default Color Write Mask to RGBA
The default color write mask should mask no channels and write all of them and should be mutated to mask out certain channels as required by the guest.
2022-04-14 14:14:52 +05:30
PixelyIon
23faf1370c Use Static Arrays for Vertex Buffer Bindings & Attributes
We cannot statically construct the vertex buffer/attribute arrays for Vulkan due to inactive attributes or buffers which isn't possible on Vulkan, we also cannot just change the count dynamically as there might be disabled buffers or attributes in the middle. We just have a `static_array` which should dynamically be filled in with buffer binding/attribute Vulkan structures before submission.
2022-04-14 14:14:52 +05:30
PixelyIon
8652edb07b Make GuestBuffer format-less
Buffers generally don't have formats that are fundamentally associated with them unless they're texel buffers, if that is the case it can be manually set in `BufferView`.
2022-04-14 14:14:52 +05:30
PixelyIon
03314ec7d2 Introduce BufferManager
The Buffer Manager handles mapping of guest buffers to host buffer views with automatic handling of sub-buffers and eventually supporting recreation of overlapping buffers to create a single larger buffer.
2022-04-14 14:14:52 +05:30
PixelyIon
bde61d72cc Introduce Buffer and BufferView
Implements infrastructure for using guest buffers on the host for rendering, a `BufferManager` is still missing which'd handle mapping from guest buffers to host buffers and will be subsequently committed. It should be noted that `BufferView` is also disconnected from `Buffer` and shared for every instance with the same properties like `TextureView` is now.
2022-04-14 14:14:52 +05:30
PixelyIon
6eda1777c5 Rework TextureView to be disconnected from Texture
We want `TextureView`(s) to be disconnected from the backing on the host and instead represent a specific texture on the guest with a backing that can change depending on mapping of new textures which'd invalidate the backing but should now be automatically repointed to an appropriate new backing. This approach also requires locking of the backing to function as it is mutable till it has been locked or the backing has an attached `FenceCycle` that hasn't been signaled which will be added for `CommandExecutor` in a subsequent commit.
2022-04-14 14:14:52 +05:30
PixelyIon
82916657fb Only Enable Shader Compiler Debug Mode in Debug Builds
Sets properties that relate to debugging in `Shader::Settings` to `true` only for debug builds while leaving them disabled for release builds.
2022-04-14 14:14:52 +05:30
PixelyIon
b09f28c0ba Implement Missing Shader Compiler Quirks
Introduces the `supportsShaderViewportIndexLayer` quirk and sets `Shader::Profile::support_int64_atomics` depending on if the `supportsAtomicInt64` quirk is available.
2022-04-14 14:14:52 +05:30
PixelyIon
f3e81094a2 Implement Shader Compiler Property Quirks
Introduces the `floatControls`, `supportsSubgroupVote` and `subgroupSize` quirks for the shader compiler which are based on Vulkan `PhysicalDevice` properties.
2022-04-14 14:14:52 +05:30
PixelyIon
51c4df24b5 Switch from VK_VERSION_* to VK_API_VERSION_* macros
Vulkan has officially deprecated `VK_VERSION_*` macros for versioning as it has introduced the variant into the version. It should however be `0` for the Vulkan APi and doesn't need to be printed.
2022-04-14 14:14:52 +05:30
PixelyIon
0588a525b4 Implement Shader Compiler Extension/Feature Quirks
Introduces several quirks for optional features used by the shader compiler which are now reported in the `Shader::HostTranslateInfo` and `Shader::Profile` structure. There are still property-related quirks for the shader compiler which haven't been implemented in this commit.
2022-04-14 14:14:52 +05:30
PixelyIon
8f3887c56a Create memory::Buffer & Implement StagingBuffer as derivative
A `Buffer` class was created to hold any generic Vulkan buffer object with `span` semantics, `StagingBuffer` was implemented atop it as a wrapper for `Buffer` that inherits from `FenceCycleDependency` and can be used as such.
2022-04-14 14:14:52 +05:30
PixelyIon
a55aca76c6 Rename TextureView::backing to TextureView::texture
It was determined that `backing` wasn't a very descriptive name and that it conflicted with the texture's own backing, the name was changed to `texture` to make it more apparent that it was specifically the `Texture` object backing the view.
2022-04-14 14:14:52 +05:30
PixelyIon
482c573b81 Introduce FlatMemoryManager::ReadTill for scanning semantics
A memory manager function to read into a vector till it satisfies the supplied function or hits an early stop condition like hitting the end of vector or reaching an unmapped region. This can be used to efficiently scan for values in GPU VA.
2022-04-14 14:14:52 +05:30
PixelyIon
31c4f1ca4e Unlink VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT when disabled
When `VK_EXT_vertex_attribute_divisor` is not available, `VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT` is unlinked from the device enabled feature list as it is undefined behavior to link a structure provided by an extension without enabling that extension.
2022-04-14 14:14:52 +05:30
PixelyIon
032866c9b1 Allow Injecting External Vulkan Layers
Set `com.android.graphics.injectLayers.enable` to allow injection of external Vulkan layers which is done by GPU debuggers such as RenderDoc.
2022-04-14 14:14:52 +05:30