Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.
This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.
Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.
Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.
In the upcoming GPU code each state member will hold a reference to its corresponding Maxwell 3D regs, this helper is needed to allow easy transformation from the the main 3D register struct into them.
Example:
```c++
struct Regs {
std::array<View, 10> viewRegs;
u32 enable;
} regs;
struct ViewState {
const View &view;
const u32 &enable;
size_t index;
};
std::array<ViewState, 10> viewStates{MergeInto<ViewState, 10>(regs.viewRegs, regs.enable, IncrementingT{})
```
Useful for cases where allocations are guaranteed to be unused by the time `Reset()` is called and calling `Free()` would be difficult or add extra performance cost due to how the allocation is used.
In some games performing the binary search in `TranslateRange()` ended up taking a fairly large (~8%) proportion of GPFIFO time. By using a segment table for O(1) lookups this is reduced to <2% for non-split mappings at the cost of slightly increased memory usage (2GiB in the absolute worse case but more like 50MiB in real world situations).
In addition to adapting `TranslateRange()` to use the segment table, a new function `LookupBlock()` for cases where only a single mapping would ever be looked up so the small_vector handling and fallback paths can be skipped and the entire lookup be inlined.
Forward this function to OpenSaveDataFileSystem for now. A proper implementation should wrap the underlying filesystem with nn::fs::ReadOnlyFileSystem.
We want to know when the `KProcess` is being killed and flushing log during it is important since it can often result in hangs due to joining not working correctly.
We currently don't wait on a slot to be freed if none are free, this worked prior to async presentation as GBP's slots wouldn't change their state until other commands were called but now slots can be held by the presentation engine. As a result, we now have to wait on the presentation engine to free up slots.
This commit also fixes the behavior of the `async` flag in `DequeueBuffer` as it was treated as a non-blocking flag but isn't supposed to do anything on HOS.
Needed for games such as AC:NH.
The `Auto` option automatically selects a region based on the currently selected system language.
Co-Authored-By: Timotej Leginus <35149140+timleg002@users.noreply.github.com>
As part of this commit, a new preference category for debug settings is being introduced. All future settings only relevant for debugging purposes will be put there. The category is hidden on release builds.
Host synchronization of a guest texture with a different guest format represents a valid use case where the host doesn't support the guest format and conversion to a host-compatible format must be performed. The issue is most evident on Mali GPUs, as they don't support BCn texture formats thus needing manual decoding before submission. It was disabled by mistake in a previous commit, this commit re-enables it.
Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index.
Indexed quad draws also suffered from the same issue, but was never encountered in games.
This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.
TIPC is a much lighter layer ontop of the Horizon IPC system than CMIF and is used by SM in 12.0.0+. This implementation is slightly hacky since it doesn't really keep a seperation between the underlying kernel IPC stuff and userspace like CMIF/TIPC, this should be fixed eventually, probably together with an IPC dispatch rewrite to avoid the mess of frozen maps.
Tested with Hentai Uni, which now crashes needing 'ldr:ro'.
Tapping anything in titles that supported touch (such as Puyo Puyo Tetris or Sonic Mania) wouldn't work due to the first touch point never being removed from the screen, it is supposed to be removed after a 3 frame delay from the touch ending.
This commit introduces a mechanism to "time-out" touch points which counts down during the shared memory updates and removes them from the screen after a specified timeout duration.
Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior.
This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest.
Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>
It was determined that deadlocks inside `KThread::UpdatePriorityInheritance` would not only arise from the first level of locking with `waitingOn->waiterMutex` but also the second level of locking with `nextThread->waiterMutex` which has now also been fixed to fallback when facing contention.
PR #1758 introduced a bug where the game list would be entirely loaded every time the app was opened. This commit addresses that issue, which was caused by the `version` member of the cached game list being serialized to file (although incorrectly) but never actually read back when deserializing.
* Remove `package` from manifest and from activity prefixes, gradle `namespace` will be used instead
* Removed deprecated `android.support.PARENT_ACTIVITY` metadata
entries
* Make `MainActivity` and `SettingsActivity` launched in `singleTop` mode to avoid unnecessary activity restarts while navigating the app
Using `__attribute__((packed))` doesn't work in new NDKs when a struct contains 128-bit integer members, likely because of a ndk/compiler bug. We now enclose the requiring structs in `#pragma pack` directives to tightly pack them.
Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.
We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.
This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.
As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.
We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.
The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.
Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may need in the future.
These sometimes spuriously occur in games during transitions, to avoid crashing during them just use the null texture if they occur and log an error log
The constant destruction and creation of `BufferView`s in cbuf-heavy games showed up as a large chunk of the profiler. Fix this by taking advantage of the fact that constant buffer `BufferView`s are never deleted and always kept around in the cache to just return a pointer to them in the cache.
Currently we heavily thrash the heap each draw, with malloc/free taking up about 10% of GPFIFOs execution time. Using a linear allocator for the main offenders of buffer usage callbacks and index/vertex state helps to reduce this to about 4%