Commit Graph

193 Commits

Author SHA1 Message Date
Billy Laws
8471ab754d Introduce a spin lock for resources locked at a very high frequency
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
2022-11-02 17:46:07 +00:00
lynxnb
6b76c61cd1 Introduce a releasedebug build variant 2022-10-17 18:39:32 +02:00
PixelyIon
70ad4498a2 Write HID LIFO entries at fixed intervals
Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior.

This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest.

Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>
2022-08-31 22:49:36 +05:30
Billy Laws
395f665a13 Implement a system for helper shaders together with a simple blit shader
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may  need in the future.
2022-08-08 17:40:35 +01:00
Billy Laws
1da1698f90 Disable unused Vulkan HPP setters and smart handles 2022-08-08 14:57:44 +01:00
PixelyIon
5b7572a8b3
Introduce chunked MegaBuffer allocation
After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage.

This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2022-08-07 03:12:27 +05:30
lynxnb
1815199d2b Add utilities for reading and installing gpu driver packages 2022-08-06 22:00:19 +05:30
Billy Laws
1df98ba57f Enable fwrapv for defined signed integer overflow behaviour
Nintendo enables this for HOS so we should do the same to avoid any cases where it's relied on.
2022-07-29 20:07:14 +01:00
lynxnb
2840a126dd Introduce AndroidSettings class and use inheritance
The `Settings` class now has a pure virtual `Update` method, and uses inheritance over template specialization for platform-specific behavior override.
2022-07-26 20:16:24 +05:30
lynxnb
2d70be60d1 Remove PugiXML submodule
`PugiXML` was only used for parsing the SharedPreferences settings file, not needed anymore.
2022-07-26 20:16:24 +05:30
MCredstoner2004
942e22f275 Write ApplicationErrorArg ErrorApplets to log
These applets are used by applications to display a custom error message to the user. Both the error message and the detailed error message are printed to the error log.


Co-authored-by: lynxnb <niccolo.betto@gmail.com>
2022-07-02 09:48:59 +05:30
MCredstoner2004
f9a0394577
Implement Software Keyboard applet
This implements the non-inline version of the Software Keyboard (swkbd) applet, which games use to get text input from the user.
2022-07-01 15:19:53 -05:00
Billy Laws
5d6902b3f8 Stub audin:u 2022-06-04 19:11:57 +01:00
Billy Laws
22695c4feb Stub nim services used for eShop communication
We obviously don't need to implement these so add a simple set of stubs to satify games using them (mainly demos such as DQXII)
2022-05-31 22:07:01 +01:00
PixelyIon
80c8fb8791 Implement CPU BCn Texture Decoding
Certain GPU vendors such as ARM's Mali do not have support for BCn textures whatsoever while other vendors such as AMD only have partial support (BC1-BC3). Most titles on the guest utilize BC textures and to address this on host GPUs without support for BCn, we need to decompress the texture on the CPU. This commit implements a CPU BCn texture decoder based off Swiftshader's BC decoder, it also adds the necessary infrastructure to have different formats for the `GuestTexture` and `Texture` objects.
2022-05-28 21:22:24 +05:30
PixelyIon
de300bfdbe Refactor Texture Swizzling
The API for texture swizzling is now more concrete and abstracted out from `GuestTexture`, this allows for neater usage in certain areas such as MaxwellDMA while having a `GuestTexture` wrapper as well allowing for neater usage in those cases. 

The code itself has also been cleaned up slightly with all usage of `u32`s being upgraded to `size_t` as this is simply more efficient due to the compiler not needing to emulate wraparound behavior for integer types smaller than the processor word size.
2022-05-19 17:13:55 +05:30
Robin Kertels
0a3cf25823 Implement the Fermi 2D blitting engine
The Fermi 2D engine implements both image blit and resolve operations, supporting subpixel sampling with both linear and point filtering.

Resolve operations are performed by sampling from the center of each pixel in order to resolve the final image from the MSAA samples
MSAA images are stored in memory like regular images but each pixels dimensions are scaled: e.g for 2x2 MSAA
```
112233
112233
445566
445566
```
These would be sampled with both duDx and duDy as 2 (integer part), resolving to the following:
```
123
456
```
Blit operations are performed by sampling from the corner of each pixel, scaling the image as one would expect.

This implementation isn't fully complete as Vulkan blit doesn't support some combinations which Fermi does, most notably between colour and depth stencil. These will be implemented properly at a later date, likely after the texture manager rework.
Out of Bounds Blit, used by some OpenGL games is also missing since supporting it requires texture aliasing, this will also be supported after the texture manager rework.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-05-13 22:37:37 +01:00
Billy Laws
7d30ac0cd8 Add additional nifm stubs 2022-05-11 23:24:35 +01:00
Billy Laws
a164635f32 Stub LibraryAppletPlayerSelect 2022-05-11 23:24:35 +01:00
shutterbug2000
f078a5d1ec Stub bt and btm:u
Stub BT services which is required by titles such as Pokémon Let's GO Pikachu and Eevee (non-Demo versions).
2022-05-11 20:44:09 +05:30
shutterbug2000
1c8d994161 Basic bcat:u implementation
A basic `bcat:u` implementation to prevent titles such as "Kirby and the Forgotten Land" dependent on BCAT support from crashing due to the lack of an implementation.
2022-05-06 15:41:48 +05:30
PixelyIon
42573170c6 Implement Framebuffer Cache
Implements a cache for storing `VkFramebuffer` objects with a special path on devices with `VK_KHR_imageless_framebuffer` to allow for more cache hits due to an abstract image rather than a specific one. 

Caching framebuffers is a fairly crucial optimization due to the cost of creating framebuffers on TBDRs since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding framebuffer memory.
2022-05-01 18:27:27 +05:30
PixelyIon
da931cf07b Implement Render Pass Cache
Implements a cache for storing `VkRenderPass` objects which are often reused, they are not extremely expensive to create generally but this is a required step to build up to a framebuffer cache which is an extremely expensive object to create on TBDRs generally since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding memory.
2022-05-01 18:16:53 +05:30
PixelyIon
7ef4959060 Implement Graphics Pipeline Cache
Implements a cache for storing `VkPipeline` objects which are fairly expensive to create and doing so on a per-frame basis was rather wasteful and consumed a significant part of frametime. It should be noted that this is **not** compliant with the Vulkan specification and **will** break unless the driver supports a relaxed version of the Vulkan specification's Render Pass Compatibility clause.
2022-04-24 14:31:00 +05:30
lynxnb
0d9992cb8e Implement QuadList support for non-indexed draws 2022-04-20 18:17:10 +02:00
Billy Laws
d115ce3c05 Stub the controller applet
Mostly based off of yuzu's implementation, this will need to be extended in the future to open up a UI for configuring controllers according to the applications requirements.
2022-04-16 18:45:56 +05:30
PixelyIon
41b98c7daa Add stack tracing to skyline::exception
Skyline's `exception` class now stores a list of all stack frames during the invocation of the exception. These can later be parsed by the exception handler to generate a human-readable stack trace. To assist with more complete stack traces, `-fno-omit-frame-pointer` is now passed on debug builds which forces the inclusion of frames on function calls.
2022-04-14 14:14:52 +05:30
MCredstoner2004
dec0571eee Infrastructure for applets to be implemented
This removes a stub for an applet and implements several applet related service calls.
2022-04-14 14:14:52 +05:30
Billy Laws
3c26921d54 Implement the Maxwell DMA engine
The DMA engine is used to perform DMA buffer/texture copies directly on the GPU. It can deswizzle arbritary regions of input textures, perform component remapping and swizzle into output textures.
This impl only supports 1D buffer copies, 2D ones will come later.
2022-04-14 14:14:52 +05:30
lynxnb
69ba4f8abb Swap out boostorg/boost for skyline-emu/boost 2022-04-14 14:14:52 +05:30
kaikecarlos
9f51664b1d Stub IRS Service 2022-04-14 14:14:52 +05:30
Billy Laws
ae41ddf4f0 Implement a skeleton compute engine
The Kepler compute engine is used to run compute jobs encapsulated in to QMDs on the GPU, this commit doesn't implement compute itself but adds the register and QMD structs that will be needed for it in the future.
2022-04-14 14:14:52 +05:30
Billy Laws
8c73b62b2c Implement basic inline2memory engine support
Not currently used by anything but will be used by both compute, 3D and its own engine in the future. Block linear copies are currently unsupported.
2022-04-14 14:14:52 +05:30
Billy Laws
b4927d0138 Add support for turnip and driver file redirection via libadrenotools 2022-04-14 14:14:52 +05:30
PixelyIon
ddb2ba8a1b Rename QuirkManager to TraitManager
Quirk terminology was deemed to be inappropriate for describing the features/extensions of a device. It has been replaced with traits which is far more fitting but quirks will be used as a terminology for errata in devices.
2022-04-14 14:14:52 +05:30
Billy Laws
62db21fb78 Rework GPFIFO method distribution and macros to support multiple engines
Fermi2D supports macros in addition to Maxwell3D, these both share code memory. To support this we rework the macro interpreter to support passing in a target engine and abstract the communications out into an interface that can be implemented by applicable engines.

```
GPFIFO <-> MME <-> Maxwell3D
    ^        ^---> Fermi2D
    X------------> I2M
    X------------> MaxwellComputeB
    X--Flush-----> MaxwellDMA
```
2022-04-14 14:14:52 +05:30
Billy Laws
175ba11f07 Integrate BCeNabler support into QuirkManager
Allows using BCn format textures on devices where they are unsupported by the driver.
2022-04-14 14:14:52 +05:30
PixelyIon
a9aa16798f Add -fsigned-bitfields for defined bitfield int behavior
We want consistent behavior between signed `int`s in bitfields and outside of bitfields, the `-fsigned-bitfields` flag enforces this behavior.
2022-04-14 14:14:52 +05:30
PixelyIon
492dd47218 Implement Vulkan Descriptor Set Allocator
A fixed descriptor set allocator which manages the size of the pool with automatic reallocations when any allocations run out of descriptors.
2022-04-14 14:14:52 +05:30
PixelyIon
03314ec7d2 Introduce BufferManager
The Buffer Manager handles mapping of guest buffers to host buffer views with automatic handling of sub-buffers and eventually supporting recreation of overlapping buffers to create a single larger buffer.
2022-04-14 14:14:52 +05:30
PixelyIon
bde61d72cc Introduce Buffer and BufferView
Implements infrastructure for using guest buffers on the host for rendering, a `BufferManager` is still missing which'd handle mapping from guest buffers to host buffers and will be subsequently committed. It should be noted that `BufferView` is also disconnected from `Buffer` and shared for every instance with the same properties like `TextureView` is now.
2022-04-14 14:14:52 +05:30
PixelyIon
ece2785582 Introduce ShaderManager with Proxy Shader Compiler Logger/Settings
This class will be entirely responsible for any interop with the shader compiler, it is also responsible for caching and compilation of shaders in itself.
2022-04-14 14:14:52 +05:30
PixelyIon
def9cedbee Add yuzu Shader Compiler as a submodule
We plan to use our fork of yuzu's shader compiler for GPU shader compilation so it's been added as a submodule.
2022-04-14 14:14:52 +05:30
PixelyIon
746af4cb4c Add Sirit as a submodule
We require Sirit as it is a dependency for yuzu's shader compiler where it uses it to emit SPIR-V in an easy and efficient manner.
2022-04-14 14:14:52 +05:30
PixelyIon
dbc94f36d3 Add Range v3 as a submodule
We want to utilize features from C++ 20 ranges but they haven't been entirely implemented in libc++ so in the meantime we use the reference implementation for it which is Ranges v3.
2022-04-14 14:14:52 +05:30
PixelyIon
49cc0964e2 Initialize Maxwell3D Registers Correctly
Maxwell3D Registers weren't initialized to the correct values prior, this commit fixes that by doing `HandleMethod` calls with all the register values being initialized. This is in contrast to the registers being set without calling the methods in `GraphicsContext` or otherwise resulting in bugs.
2022-04-14 14:14:52 +05:30
PixelyIon
8ef225a37d Introduce QuirkManager for runtime GPU quirk tracking
We require a way to track certain host GPU features that are optional such as Vulkan extensions, this is what the `QuirkManager` class does as it checks for all quirks and holds them allowing other components to branch based off these quirks.
2022-04-14 14:14:52 +05:30
lynxnb
092dcb18c8 Stub ectx:w and ectx:aw Glue services 2022-02-06 21:57:38 +05:30
lynxnb
769e6c933d Make Logger class static and introduce LoggerContext
A thread local LoggerContext is now used to hold the output file stream instead of the `Logger` class. Before doing any logging operations, a LoggerContext must be initialized.
This commit will not build successfully on purpose.
2021-11-11 16:13:24 +01:00
Billy Laws
baefb0fe93 Implement the Host1x command FIFO together with barebones Host1x classes
The Host1x block of the TX1 supports 14 separate channels to which commands can be issued, these all run asynchronously so are emulated the same way as GPU channels with one FIFO emulation thread each. The command FIFO itself is very similar to the GPFIFO found in the GPU however there are some differences, mainly the introduction of classes (similar to engines) and the Mask opcode (which allows writing to a specific set of offsets much more efficiently).

There is an internal Host1x class which functions similar to the GPFIFO class in the GPU, handling general operations such as syncpoint waits, this is accessed via the simple method interface. Other channels such as NVDEC and VIC are behind the 'Tegra Host Interface' (THI) in HW, this abstracts out the classes internal details and provides a uniform method interface ontop of the Host1x method one. We emulate the THI as a templated wrapper for the underlying class.

Syncpoint increments in Host1x are different to GPU, the THI allows submitting increment requests that will be queued up and only be applied after a specific condition in the associated engine is met; however the option to for immediate increments is also available.
2021-11-10 21:35:36 +05:30