The API for texture swizzling is now more concrete and abstracted out from `GuestTexture`, this allows for neater usage in certain areas such as MaxwellDMA while having a `GuestTexture` wrapper as well allowing for neater usage in those cases.
The code itself has also been cleaned up slightly with all usage of `u32`s being upgraded to `size_t` as this is simply more efficient due to the compiler not needing to emulate wraparound behavior for integer types smaller than the processor word size.
The Fermi 2D engine implements both image blit and resolve operations, supporting subpixel sampling with both linear and point filtering.
Resolve operations are performed by sampling from the center of each pixel in order to resolve the final image from the MSAA samples
MSAA images are stored in memory like regular images but each pixels dimensions are scaled: e.g for 2x2 MSAA
```
112233
112233
445566
445566
```
These would be sampled with both duDx and duDy as 2 (integer part), resolving to the following:
```
123
456
```
Blit operations are performed by sampling from the corner of each pixel, scaling the image as one would expect.
This implementation isn't fully complete as Vulkan blit doesn't support some combinations which Fermi does, most notably between colour and depth stencil. These will be implemented properly at a later date, likely after the texture manager rework.
Out of Bounds Blit, used by some OpenGL games is also missing since supporting it requires texture aliasing, this will also be supported after the texture manager rework.
Co-authored-by: Billy Laws <blaws05@gmail.com>
A basic `bcat:u` implementation to prevent titles such as "Kirby and the Forgotten Land" dependent on BCAT support from crashing due to the lack of an implementation.
Implements a cache for storing `VkFramebuffer` objects with a special path on devices with `VK_KHR_imageless_framebuffer` to allow for more cache hits due to an abstract image rather than a specific one.
Caching framebuffers is a fairly crucial optimization due to the cost of creating framebuffers on TBDRs since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding framebuffer memory.
Implements a cache for storing `VkRenderPass` objects which are often reused, they are not extremely expensive to create generally but this is a required step to build up to a framebuffer cache which is an extremely expensive object to create on TBDRs generally since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding memory.
Implements a cache for storing `VkPipeline` objects which are fairly expensive to create and doing so on a per-frame basis was rather wasteful and consumed a significant part of frametime. It should be noted that this is **not** compliant with the Vulkan specification and **will** break unless the driver supports a relaxed version of the Vulkan specification's Render Pass Compatibility clause.
Mostly based off of yuzu's implementation, this will need to be extended in the future to open up a UI for configuring controllers according to the applications requirements.
Skyline's `exception` class now stores a list of all stack frames during the invocation of the exception. These can later be parsed by the exception handler to generate a human-readable stack trace. To assist with more complete stack traces, `-fno-omit-frame-pointer` is now passed on debug builds which forces the inclusion of frames on function calls.
The DMA engine is used to perform DMA buffer/texture copies directly on the GPU. It can deswizzle arbritary regions of input textures, perform component remapping and swizzle into output textures.
This impl only supports 1D buffer copies, 2D ones will come later.
The Kepler compute engine is used to run compute jobs encapsulated in to QMDs on the GPU, this commit doesn't implement compute itself but adds the register and QMD structs that will be needed for it in the future.
Quirk terminology was deemed to be inappropriate for describing the features/extensions of a device. It has been replaced with traits which is far more fitting but quirks will be used as a terminology for errata in devices.
Fermi2D supports macros in addition to Maxwell3D, these both share code memory. To support this we rework the macro interpreter to support passing in a target engine and abstract the communications out into an interface that can be implemented by applicable engines.
```
GPFIFO <-> MME <-> Maxwell3D
^ ^---> Fermi2D
X------------> I2M
X------------> MaxwellComputeB
X--Flush-----> MaxwellDMA
```
The Buffer Manager handles mapping of guest buffers to host buffer views with automatic handling of sub-buffers and eventually supporting recreation of overlapping buffers to create a single larger buffer.
Implements infrastructure for using guest buffers on the host for rendering, a `BufferManager` is still missing which'd handle mapping from guest buffers to host buffers and will be subsequently committed. It should be noted that `BufferView` is also disconnected from `Buffer` and shared for every instance with the same properties like `TextureView` is now.
This class will be entirely responsible for any interop with the shader compiler, it is also responsible for caching and compilation of shaders in itself.
We want to utilize features from C++ 20 ranges but they haven't been entirely implemented in libc++ so in the meantime we use the reference implementation for it which is Ranges v3.
Maxwell3D Registers weren't initialized to the correct values prior, this commit fixes that by doing `HandleMethod` calls with all the register values being initialized. This is in contrast to the registers being set without calling the methods in `GraphicsContext` or otherwise resulting in bugs.
We require a way to track certain host GPU features that are optional such as Vulkan extensions, this is what the `QuirkManager` class does as it checks for all quirks and holds them allowing other components to branch based off these quirks.
A thread local LoggerContext is now used to hold the output file stream instead of the `Logger` class. Before doing any logging operations, a LoggerContext must be initialized.
This commit will not build successfully on purpose.
The Host1x block of the TX1 supports 14 separate channels to which commands can be issued, these all run asynchronously so are emulated the same way as GPU channels with one FIFO emulation thread each. The command FIFO itself is very similar to the GPFIFO found in the GPU however there are some differences, mainly the introduction of classes (similar to engines) and the Mask opcode (which allows writing to a specific set of offsets much more efficiently).
There is an internal Host1x class which functions similar to the GPFIFO class in the GPU, handling general operations such as syncpoint waits, this is accessed via the simple method interface. Other channels such as NVDEC and VIC are behind the 'Tegra Host Interface' (THI) in HW, this abstracts out the classes internal details and provides a uniform method interface ontop of the Host1x method one. We emulate the THI as a templated wrapper for the underlying class.
Syncpoint increments in Host1x are different to GPU, the THI allows submitting increment requests that will be queued up and only be applied after a specific condition in the associated engine is met; however the option to for immediate increments is also available.
host1x channels are generally similar to GPU channels however there is only one channel for each specific class (like a GPU engine) and an address space is shared between them all.
This PR implements the simple IOCTLs with the larger ones that will depend on changes outside of nvdrv being left for future commits. This is enough to partly run oss-nvjpeg.
To offset some of the performance overhead of using debug builds, we now optimize all libraries using `-Ofast` while building Skyline itself with `-O0`.
The version of libcxx shipped with Android NDK is fairly outdated and doesn't contain several features we desire such as C++ 20 ranges. This has been fixed by using libcxx directly from the LLVM Project which has been added as a submodule and can be updated independently of NDK.
We've moved to using a beta AGP as `7.0.2` is breaks `clangd` and other C++ features on Beta/Canary Android Studio. NDK was additionally updated with `mbedtls` to fix warnings caused by it alongside some other minor fixes to code for newer versions of libcxx.
The new AGP has a bug where it does not look for executables specified in `android_gradle_build.json` in `PATH` that includes `ninja` which is provided by the `ninja-build` package on the system rather than Android SDK's CMake on GitHub Actions (Ubuntu 20.04). This has been fixed by symlinking `/usr/bin/ninja` to the project root which is searched in for the `ninja` executable.
Library headers would produce errors that are out of our control and as a result of that, we just want to ignore this. This is possible by including the offending headers as system headers, compilers don't emit any warnings arising from them. This was extended to all libraries rather than just those which currently emitted warnings for consistency's sake.
* Fix handling `SA_EXPOSE_TAGBITS` bit being set in Android 12 `sigaction`
* Fix CMake bug using `CMAKE_INTERPROCEDURAL_OPTIMIZATION_RELEASE` when not supported causing `-fuse-ld=gold` to be emitted as a linker flag
* Support using `VIBRATOR_MANAGER_SERVICE` rather than `VIBRATOR_SERVICE` on Android 12
* Optimize Imports for Kotlin code
* Move away from deprecated APIs in Kotlin or explicitly mark where it's not possible
* Update SDK, NDK and libraries
* Enable Gradle Configuration Cache
Allows the execution of multiple channels at the same time, with locking
being performed on the host GPU scheduler layer, address spaces can be
bound to one or more channels.
This commit introduces the `CommandExecutor` which is responsible for creating and orchestrating a Vulkan command graph comprising of `CommandNode`s that construct all the objects required for rendering. As a result of the infrastructure provided by `CommandExecutor`, `ClearBuffers` could be implemented and be appropriately utilized.
A bug regarding scissors was also determined and fixed in the PR, the extent of them were previously inaccurate and this has now been fixed.
Note: We don't synchronize any textures from the guest for now as this would override the contents on the host, this'll be fixed with the appropriate write tracking but it also results in a black screen for anything that writes to FB
This commit implements a filter by type for any validation layer output, this allows filtering out any logs which may be unnecessary and additionally triggering a breakpoint as required.
An issue concerning the `NDEBUG` flag never being set was fixed, it's now supplied as a release compiler flag. The issue can manifest itself by always relying on a validation layer even though it shouldn't on release, this is why the validation layer was mistakenly disabled entirely previously by using `#ifndef` rather than `#ifdef`.
An issue with the initial layout of a texture being supplied as neither `VK_IMAGE_LAYOUT_UNDEFINED` or `VK_IMAGE_LAYOUT_PREINITIALIZED` was fixed, these cases are now handled by transitioning to those layouts after creating the image rather than supplying it within `initialLayout`.
Another issue was fixed regarding not maintaining a transformation after a surface has been destroyed and recreated existed and manifested itself when the user would go out of the app and come back in, they would see the surface having an identity transformation rather than the desired one.
Fixes bugs with the Texture Manager lookup, fixes `RenderTarget` address extraction (`low`/`high` were flipped prior), refactors `Maxwell3D::CallMethod` to utilize a case for the register being modified + preventing redundant method calls when no new value is being written to the register, and fixes the behavior of shadow RAM which was broken previously and would lead to incorrect arguments being utilized for methods.
Implement the groundwork for the texture manager to be able to report basic overlaps and be extended to support more in the future. The Maxwell3D registers `RenderTargetControl`, `RenderTarget` and a stub for `ClearBuffers` were implemented.
A lot of changes were also made to `GuestTexture`/`Texture` for supporting mipmapping and multiple array layers alongside significant architectural changes to `GuestTexture` effectively disconnecting it from `Texture` with it no longer being a parent rather an object that can be used to create a `Texture` object.
Note: Support for fragmented CPU mappings hasn't been added for texture synchronization yet
Utilize Boost Container's `small_vector` for optimizing allocations, fix certain implicit casting issues and make `ILogger` not output an additional newline in the log when the application supplies one at the end of the log
Implements the 'csrng:' service using C++ <random>'s Mersenne Twister, this does make it insecure for cryptographic purposes but it is pointless to attempt to do this regardless as we cannot ensure that the guest will run in a secure environment which cannot be mutated by an attacker. Used by Prison Princess, Pokemon Cafe Mix, Paint your Pet and more.
We've moved from using an AAR for Mbed TLS to a submodule as the AAR was packaged manually and used from a local repository which ended up being very hacky and resulted in Linter errors, it could also not be updated with ease as it would need to be repackaged. All of these issues have been solved by moving to a git submodule tied to the official Mbed TLS GitHub repository.
VI/IHOSBinder suffered from major inaccuracies in their function due to being quickly thrown together initially with little concern for accuracy, this has now been fixed with them being substantially more accurate now.
This commit reworks the `Texture` class to include a Vulkan Image backing that can be optionally owning or non-owning and swapped in with consideration for Vulkan image layout, it also adds CPU-sided synchronization for the texture objects with FenceCycle. It also makes the appropriate changes to `PresentationEngine` and `GraphicBufferProducer` to work with the new `Texture` class while setting the groundwork for supporting swapchain recreation. It also fixes a log in `IpcResponse` and improves the display mode selection algorithm by further weighing refresh rate.
Implements a wrapper over fences to track a single cycle of activation, implement a Vulkan memory manager that wraps the Vulkan-Memory-Allocator library and a command scheduler for scheduling Vulkan command buffers
Vulkan Device initialization is handled now, it supports required extensions but support for optional extensions/features/properties will come in later when we require those. In addition, we now correctly report the version of Skyline to Vulkan which can be accessed from debugging tools.
There's also a minor change regarding the search pattern for `SkylineLibraries` which now only searches in headers of libraries and it also explicitly excludes the redundant `vulkan.hpp` from the `Vulkan-Headers` repository.
The GPU class has been extended in this for Vulkan initialization, this is done to the point of initializing the instance alongside loading in `VK_LAYER_KHRONOS_validation` which is also now packed into all Debug APKs for Skyline. In addition, `VK_EXT_debug_report` is also initialized and it's output is piped directly into the Logger.
A minor change regarding the type of the `Fps` and `Frametime` globals was changed to `skyline::i32`s which is a more suitable type due to those having a smaller chance of overflowing while being signed as Java doesn't have unsigned integral types.
This is used by games before calling into nvdec in order to clock up the
HW module, it can also be used to request a RAM frequency. Since we
obviously don't emulate the hardware down to this level a basic stub
that provides the correct reponses is enough.
Fixes a crash on first level of Super Mario Odyssey.
We used a custom version of Vulkan-Hpp which split the files a lot prior to avoid any developers needing to manually set IDE settings for IntelliJ to work but this wasn't practical due to how it required modifications to Vulkan-Hpp's generator which would make maintenance extremely difficult. It was determined that we should just add the requirement for changing the IDE settings and use Vulkan-Hpp directly.
We decided to restructure Skyline to draw a layer of separation between guest and host GPU. We're reserving the `gpu` namespace and directory for purely host GPU and creating a new `soc` directory and namespace for emulation of parts of the X1 SoC which is currently limited to guest GPU but will be expanded to contain components like the audio DSP down the line.
This just disables all compiler warnings generated while compiling TZCode as those are irrelevant while compiling Skyline and need to be tackled in that repository.
Add Tracing for SVCs, Services, NVDRV, and Synchronization Primitives. In addition, fix `TRACE_EVENT_END("guest")` being emitted when a signal is received while being in the guest rather than host which would cause an exception. This commit also disables warnings for the Perfetto library as we do not control fixing them.
This serves as an extension to the initial time commit and combined
they provide a complete implementation of everything application facing
in time.
psc:ITimeZoneService and glue:ITimeZoneService are used to convert
between POSIX and calendar times according to the device location.
Timezone binaries are used during the conversion, details of them can
be read about in the previous commit.
This is based off my own glue RE and Thog's time RE.
This reimplements our time backend to be significantly more accurate to
the real PSC and provides complete implementations for every time IPC
allowing many newer games to work properly.
Time is unique in its use of glue services, the core sysmodule is fully
isolated and doesn't interface with any other services. Glue is instead
used where that is needed (e.g. for fetching settings), this distinction
is also present in our implementation.
Another unique feature of time is its global state, as time is
calibrated from the start of the service its state cannot be lost as
that would result in the application offsetting time incorrectly
whenever it closed a session.
A large proportion of this is based off of Thog's 9.0.0 PSC reversing.
These are used for timezone conversions between POSIX and calander time.
Tzdata is in exactly the same format as HOS to allow loading sysarchives
in the future if needed. See its README for more info.
Details on tzcode can be found in its own repo, there are several
changes done Vs the base release to allow for HOS compat.
These only implement the subset of VFS needed for time, implementing
more is difficult due to some issues in the AAsset API which make
support quite ugly. The abstract asset filesystem can be accessed by
services through the OS class allowing other implementations to be used
in the future.
An exceptional signal handler allows us to convert an OS signal into a C++ exception, this allows us to alleviate a lot of crashes that would otherwise occur from signals being thrown during execution of games and be able to handle them gracefully.
* Fix alignment handling in NvHostAsGpu::AllocSpace
* Implement Ioctl{2,3} ioctls
These were added in HOS 3.0.0 in order to ease handling ioctl buffers.
* Introduce support for GPU address space remapping
* Fix nvdrv and am service bugs
Syncpoints are supposed to be allocated from ID 1, they were allocated
at 0 before. The ioctl functions were also missing from the service map
* Fix friend:u service name
* Stub NVGPU_IOCTL_CHANNEL_SET_TIMESLICE
* Stub IManagerForApplication::CheckAvailability
* Add OsFileSystem Directory support and add a size field to directory entries
The size field will be needed by the incoming HOS IDirectory support.
* Implement support for IDirectory
This is used by applications to list the contents of a directory.
* Address feedback
This commit refactors the C++ end of Input so it'll be in line with the rest of the codebase and be ready for the extension with multiple players and controller configuration.
This commit contains the C++ side of the initial Input implementation, this is based on the work done in the `hid` branch in `bylaws/skyline`.
Co-authored-by: ◱ PixelyIon <pixelyion@protonmail.com>