1160 Commits

Author SHA1 Message Date
TheASVigilante
a1ff4e1777
Implement OpenHardwareOpusDecoderEx and GetWorkBufferSizeEx
Implements these 2 functions which were introduced in HOS 12.0.0. Fixes a crash in Xenoblade Chronicles 3.
2022-09-10 22:04:15 +01:00
lynxnb
34bd16426c Fix quads index buffer conversion not accounting for first index
Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index.
Indexed quad draws also suffered from the same issue, but was never encountered in games.
This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.
2022-09-04 12:42:33 +02:00
Billy Laws
9af5df4bae Increase IPC pointer buffer size
Some services report a pointer buffer size > 0x1000, so up it as to not cause issues when they're implemented.
2022-09-02 23:14:05 +01:00
Billy Laws
51f4e7662e Add support for the TIPC protocol introduced in HOS 12.0.0
TIPC is a much lighter layer ontop of the Horizon IPC system than CMIF and is used by SM in 12.0.0+. This implementation is slightly hacky since it doesn't really keep a seperation between the underlying kernel IPC stuff and userspace like CMIF/TIPC, this should be fixed eventually, probably together with an IPC dispatch rewrite to avoid the mess of frozen maps.

Tested with Hentai Uni, which now crashes needing 'ldr:ro'.
2022-09-02 23:13:23 +01:00
Billy Laws
a40d7c78ad Always recreate oboe stream on error
This is what's done in oboe examples to avoid spurious errors breaking audio entirely.
2022-08-31 21:26:14 +01:00
Billy Laws
8917ec9c88 Don't set framesPerCallback for main stream as per oboe guidance
It's best to let oboe figure it out on it's own
2022-08-31 21:26:14 +01:00
Billy Laws
b00008daf5 Fix identifier release check in AudioTrack::Stop 2022-08-31 21:26:14 +01:00
Billy Laws
d9c8e62d1c Don't warn on GetConfig IOCTL fails 2022-08-31 21:26:14 +01:00
Billy Laws
4aef24ba32 Implement NVGPU_GPU_IOCTL_GET_GPU_TIME in nvdrv 2022-08-31 21:26:14 +01:00
Billy Laws
5841799420 Fix decoding of IOCTLs with padding at the end 2022-08-31 21:26:14 +01:00
Billy Laws
82444f3b0a Don't set push descrptor flag for desc sets
This is redundant and against the spec since we no longer use push descriptors.
2022-08-31 21:26:14 +01:00
PixelyIon
94fdd6aa43 Fix HID touch points not being removed from screen
Tapping anything in titles that supported touch (such as Puyo Puyo Tetris or Sonic Mania) wouldn't work due to the first touch point never being removed from the screen, it is supposed to be removed after a 3 frame delay from the touch ending.

This commit introduces a mechanism to "time-out" touch points which counts down during the shared memory updates and removes them from the screen after a specified timeout duration.
2022-08-31 23:43:02 +05:30
PixelyIon
70ad4498a2 Write HID LIFO entries at fixed intervals
Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior.

This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest.

Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>
2022-08-31 22:49:36 +05:30
PixelyIon
7966bfa9f6 Fix PI update KThread::waiterMutex deadlock
It was determined that deadlocks inside `KThread::UpdatePriorityInheritance` would not only arise from the first level of locking with `waitingOn->waiterMutex` but also the second level of locking with `nextThread->waiterMutex` which has now also been fixed to fallback when facing contention.
2022-08-28 20:15:08 +05:30
Abandoned Cart
04cae942ea Follow typical per-file detail formatting
Format the details in the expected format of individual files (instead of a complete game) and move the code to match the updated placement.
2022-08-27 18:54:27 +05:30
KikiManjaro
8fb4e62c28 Add version information about rom
Review:
Co-authored-by: Niccolò Betto <niccolo.betto@gmail.com>
2022-08-20 13:48:07 +02:00
lynxnb
e9618d9e2c Use pragma pack directions for tightly packing structs containing u128
Using `__attribute__((packed))` doesn't work in new NDKs when a struct contains 128-bit integer members, likely because of a ndk/compiler bug. We now enclose the requiring structs in `#pragma pack` directives to tightly pack them.
2022-08-17 12:22:11 +02:00
Billy Laws
bf491f71f9 Simplify blit helper shader vertex order 2022-08-10 15:43:16 +01:00
Billy Laws
c32bec071c Adjust blit src{X,Y} to account for centred sampling before calling into helper shader
Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.
2022-08-10 15:39:37 +01:00
Billy Laws
08f36aac33 Enable hades vertex position input workaround for Adreno
Caused crashes in any games using geometry shaders as by default hades uses the position builtin directly.
2022-08-08 18:09:00 +01:00
Billy Laws
04e7b684d2 Enable vertexPipelineStoresAndAtomics, fragmentStoresAndAtomics and shaderStorageImageWriteWithoutFormat Vulkan features
Used by Xenoblade Chronicles DE
2022-08-08 17:43:18 +01:00
Billy Laws
390558c802 Add partial support for legacy attribute conversion
We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.
2022-08-08 17:43:18 +01:00
Billy Laws
540437b547 Fixup index buffer view caching
We forgot to set the view size, which would end up forcing a view to be recreated with every call
2022-08-08 17:43:18 +01:00
Billy Laws
c966cd3b26 Prevent runtimeInfo vertex state from leaking into wrong shaders
This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.
2022-08-08 17:43:13 +01:00
Billy Laws
c52d3195cf Ensure shader stage enable state matches pipeline stage enable state
As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.
2022-08-08 17:40:35 +01:00
Billy Laws
b1c669ba14 Always keep the VertexB shader stage enabled
HW doesn't allow disabling the VertexB stage, enforce this in code.
2022-08-08 17:40:35 +01:00
lynxnb
d5174175d1 Implement indexed quads support
We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.
2022-08-08 17:40:35 +01:00
lynxnb
e6741642ba Split out megabuffer allocation from pushing data
The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.
2022-08-08 17:40:35 +01:00
Billy Laws
cdc6a4628a Enable VK uint8 indices feature when supported 2022-08-08 17:40:35 +01:00
Billy Laws
dccc86ea97 Implement transform feedback with VK_EXT_transform_feedback
Tested to work in Xenoblade Chronicles DE, the code handles both hades varying input and buffer setup.
2022-08-08 17:40:35 +01:00
Billy Laws
06053d3caf Rewrite Fermi 2D engine to use the blit helper shader
Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock
2022-08-08 17:40:35 +01:00
Billy Laws
395f665a13 Implement a system for helper shaders together with a simple blit shader
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may  need in the future.
2022-08-08 17:40:35 +01:00
Billy Laws
f4e58a9238 Remove redundant synchost creating a new buffer 2022-08-08 14:57:44 +01:00
Billy Laws
11a8feb037 Correct nvdrv DMA copy class ID
Was wrongly copy and pasted.
2022-08-08 14:57:44 +01:00
Billy Laws
13e7b54c61 Ensure failed IOCTLs are logged as a warning log 2022-08-08 14:57:44 +01:00
Billy Laws
eeb86a4f8a Calculate renderArea from min(attachments.dimensions...)
Vulkan doesn't support a renderArea larger than that of the smallest attachment
2022-08-08 14:57:44 +01:00
Billy Laws
9ea658d0ed Don't throw on unsupported TIC formats
These sometimes spuriously occur in games during transitions, to avoid crashing during them just use the null texture if they occur and log an error log
2022-08-08 14:57:44 +01:00
Billy Laws
856818c8eb Emulate the 'None' mipfilter by adjusting LOD
Borrowed this technique from yuzu since Vulkan has no direct equivalent
2022-08-08 14:57:44 +01:00
Billy Laws
9d50b6d0f7 Avoid locking presentation mutex in GetTransformHint
This caused slowdown in Pokemon as it was being called every frame
2022-08-08 14:57:44 +01:00
Billy Laws
460e6c9c84 Use raw pointers to hold constant buffer views
The constant destruction and creation of `BufferView`s in cbuf-heavy games showed up as a large chunk of the profiler. Fix this by taking advantage of the fact that constant buffer `BufferView`s are never deleted and always kept around in the cache to just return a pointer to them in the cache.
2022-08-08 14:54:57 +01:00
Billy Laws
6b2e84712b Avoid race in nvdrv debug prints
Looking up the device name without locking it could race with map insertions or deletions, so lock it to avoid that
2022-08-08 13:24:23 +01:00
Billy Laws
683cd594ad Use a linear allocator for most per-execution GPU allocations
Currently we heavily thrash the heap each draw, with malloc/free taking up about 10% of GPFIFOs execution time. Using a linear allocator for the main offenders of buffer usage callbacks and index/vertex state helps to reduce this to about 4%
2022-08-08 13:24:21 +01:00
Billy Laws
70eec5a414 Store delegate attached state within the delegate itself
Avoids a costly map lookup for every AttachBuffer call, this was a serious bottleneck in SMO
2022-08-08 13:23:26 +01:00
Billy Laws
0268e1d5a0 Force a submit before any i2m engine writes
We need traps to be inplace so we dont end up overwriting a resource that's being actively used by the current context without setting it to dirty
2022-08-08 13:22:37 +01:00
Billy Laws
cb0b132486 Allow supplying push constants to GetPipeline 2022-08-08 13:22:37 +01:00
Billy Laws
1c8863ec3b Use const references for holding pipeline state in pipeline cache
Allows passing in constexpr structs to state directly
2022-08-08 13:22:37 +01:00
Billy Laws
b6b04fa6c5 Use small_vector for VMM TranslateRange results
This was the source of a lot of heap allocs, moving to small_vector helps to avoid most of them
2022-08-08 13:22:37 +01:00
Billy Laws
1fe6d92970
Wait on Swapchain Image copy to complete
Certain titles can have a display frames out of order due to not waiting on the copy from the final RT to the swapchain image to occur. Although `PresentFrame` does wait on the syncpoint, that isn't enough to ensure the source texture is up-to-date due to us signalling syncpoints early. 

By waiting on the swapchain texture after the copy is submitted, we now implicitly wait on the source texture's cycle to be signalled thus waiting on the frame to be done which fixes the issue.
2022-08-07 03:12:27 +05:30
PixelyIon
5b7572a8b3
Introduce chunked MegaBuffer allocation
After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage.

This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2022-08-07 03:12:27 +05:30
Billy Laws
99b5fc35c6
Change SegmentTable semantics to respect unset entries
Accesses to unset entries is now clearly defined as returning a 0'd out value, the prior behavior would be to optimize sets for border segments to use L2 atomicity when the specific segment had no L1 entries set. This would lead to any future lookups of offsets within the same L2 segment but a different L1 entry to incorrectly return an inaccurate value as the only prior guarantee was that lookups after setting a segment would return the same value as was set but lacked the guarantee for unset segments to also consistently return unset values.

This could lead to issues in practical usages such as the `BufferManager` lookups returning the existence of a `Buffer` at a location falsely even though the segment was never set to the value, this was problematic as raw pointers were utilized and bound checks would lead to a segmentation fault.

This commit fixes this issue by introducing this guarantee and refactoring the class accordingly, it also deletes the `Set` method for setting a single entry as the meaning is ambiguous and it's functionality was more akin to the past guarantee and no longer makes sense.

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-08-06 22:20:54 +05:30