The Host1x block of the TX1 supports 14 separate channels to which commands can be issued, these all run asynchronously so are emulated the same way as GPU channels with one FIFO emulation thread each. The command FIFO itself is very similar to the GPFIFO found in the GPU however there are some differences, mainly the introduction of classes (similar to engines) and the Mask opcode (which allows writing to a specific set of offsets much more efficiently).
There is an internal Host1x class which functions similar to the GPFIFO class in the GPU, handling general operations such as syncpoint waits, this is accessed via the simple method interface. Other channels such as NVDEC and VIC are behind the 'Tegra Host Interface' (THI) in HW, this abstracts out the classes internal details and provides a uniform method interface ontop of the Host1x method one. We emulate the THI as a templated wrapper for the underlying class.
Syncpoint increments in Host1x are different to GPU, the THI allows submitting increment requests that will be queued up and only be applied after a specific condition in the associated engine is met; however the option to for immediate increments is also available.
This avoids the excessive repetition needed for the case where array
members have no default constructor.
eg:
```c++
std::array<Type, 10> channels{util::MakeFilledArray<Type, 10>(typeConstructorArg, <...>)};
```
nvmap allows mapping handles into the SMMU address space through 'pins'. These are refcounted mappings that are lazily freed when required due to AS space running out. Nvidia implements this by using several MRU lists of handles in order to choose which ones to free and which ones to keep, however as we are unlikely to even run into the case of not having enough address space a naive queue approach works fine. This pin infrastructure is used by nvdrv's host1x channel implementation in order to handle mapping of both command and data buffers for submit.
host1x channels are generally similar to GPU channels however there is only one channel for each specific class (like a GPU engine) and an address space is shared between them all.
This PR implements the simple IOCTLs with the larger ones that will depend on changes outside of nvdrv being left for future commits. This is enough to partly run oss-nvjpeg.
The element containing the size first needs to be saved to a save slot with Save<T, slotId>, it can then be read back later as the size of a span with SlotSizeSpan<T, slotId>. This is needed to support the Host1XChannel Submit IOCTL.
Maxwell3D registers were primarily written with absolute offsets and ended up being fairly messy due to attempting to emulate this using struct relative positioning resulting in a lot of pointless padding members. This has now been improved by utilizing `OffsetMember` to directly use offsets resulting in much neater code.
It was found to be semantically advantageous to directly pass-through certain operators such as the assignment (`=`) and array index (`[]`) operators. These would require a dereference prior to their usage otherwise but now can be directly used.
The offset of a member in a structure was determined by its relative position and compiler alignment. This is unoptimal for larger structures such as those found in GPU Engines that are primarily named by offset rather than relative positioning and end up requiring a massive amount of padding to function as is. A solution to this problem was simply to supply member offsets directly which can be done by using `OffsetMember` alongside a `union`.
We used to manually call JNI UTF-8 string allocation and deallocation functions when utilizing a `jstring` but this could be erroneous and is just inconvenient. All of this has now been consolidated into an class `JniString` which is a wrapper around `std::string` and creates a copy of the contents of the `jstring` in its constructor and passes them into the `std::string` constructor.
The Vulkan Pipeline Barriers were unoptimal and incorrect to some degree prior as we purely synchronized images and not staging buffers. This has now been fixed and improved in general with more relevant synchronization.
`EmulationActivity.vibrateDevice` would assert when a `null` Vibrator is provided due to one not being set in the controller configuration. This has now been fixed by the code not playing a vibration when a vibration device isn't selected.
The guest -> host vibration conversion code was entirely broken as it didn't set the vibration `start`/`end` timestamps correctly for a cycle nor did it subtract from the `totalAmplitude` (`currentAmplitude` now) when it a cycle ended due to an incorrect `if` statement and contents. It would just end up saturating the amplitude as much as possible by adding more and more to `totalAmplitude` on every cycle while never subtracting which is entirely wrong and led to a very noticeable drop in amplitude when a vibration was repeated.
It's been entirely reworked to fix all the issues listed above and remove a lot of code that had no understandable purpose. More comments have also been added to the code to make it more readable with better variable and function naming alongside it.
The version of libcxx shipped with Android NDK is fairly outdated and doesn't contain several features we desire such as C++ 20 ranges. This has been fixed by using libcxx directly from the LLVM Project which has been added as a submodule and can be updated independently of NDK.
We've moved to using a beta AGP as `7.0.2` is breaks `clangd` and other C++ features on Beta/Canary Android Studio. NDK was additionally updated with `mbedtls` to fix warnings caused by it alongside some other minor fixes to code for newer versions of libcxx.
The new AGP has a bug where it does not look for executables specified in `android_gradle_build.json` in `PATH` that includes `ninja` which is provided by the `ninja-build` package on the system rather than Android SDK's CMake on GitHub Actions (Ubuntu 20.04). This has been fixed by symlinking `/usr/bin/ninja` to the project root which is searched in for the `ninja` executable.
Locking `KProcess::threadMutex` when a process is being killed by another thread with `join` can lead to the non-joining killer effectively joining as it's waiting on the joining killer to relinquish the mutex. This has been fixed by having an atomic boolean tracking if the process has already been killed and if it has, immediately returning prior to locking the mutex for any non-joining killers.
Resampling would sometimes perform an OOB read into `inputBuffer` due it not containing enough data to calculate corresponding the output sample, this has been fixed by introducing bounds checking to ensure that the buffer has enough data.
The method used to finish (`finishAndRemoveTask`) an activity prior to going back to `MainActivity` or restarting the process led to the process prematurely exiting entirely and would result in it not being restarted or another activity not being launched. This has now been fixed by utilizing `finishAffinity` in its place which correctly only ends the activities with the same affinity as the caller.
Members of `GuestTexture` were apparently not being initialized and this led to UB since they would be read as random values. Titles such as Super Mario Odyssey avoided setting `baseArrayLayer` which led to it being left at the default value which was completely random and this would lead to crashes. This commit fixes this by initializing said values correctly.
Some titles don't clear the output buffer prior to submission, as the service is expected to fill all of it in, our audren implementation is incomplete and doesn't end up doing this leaving the contents of the buffer to be undefined leading to UB in the form of SEGFAULTs or the application throwing a fatal error. This has been patched over by 0-filling the buffer which is a sane default value for the fields that aren't filled in albeit not a replacement for a proper audren implementation.
Certain titles can submit logs where the last field is one off by the buffer end, the logger loop now considers this and terminates if there isn't enough data left to read the field type and length.
Access to the `vibrations` field in `vibrations[3].period` could lead to UB, this has been replaced with a proper check which adds up the period over all vibrations instead. A minor cleanup with variable names and explicit types for integer arithmetic has also been done.
If a non-builtin vibrator was attempted to be fetched, it'd insert it in the vibrator cache and return directly as opposed to playing the vibration on it prior to returning. This has now been fixed, the value is both put into the cache and the vibration is played on it.
The decomposition from `texture::Dimension` to `vk::Rect2D` was somehow implicit and completely incorrect resulting in wrong conversion with undefined values. It's now been fixed by explicitly setting `vk::Rect2D::extent` to `scissor` specifically.
The second parameter of `std::string_view::substr` was assumed to be an end position (similar to `std::span`) rather than `count` which it is. As a result of this, it was entirely broken but only held together by a constant factor being subtracted from it which was derived by trial and error. It's now been fixed by returning a count rather than the absolute position.
Fix a bug where attempting to launch Skyline from the launcher while emulation was in progress would result in `MainActivity` launching rather than `EmulationActivity`. We always want `EmulationActivity` to stay on top of the stack and be launched whenever Skyline is resumed, no other activity should be able to run in parallel to `EmulationActivity` in any user-accessible manner.
Guest-driven exiting could cause objects left on the heap due to a `std::longjmp` from high up in the host call stack, this has been fixed by introducing `ExitException` which implicitly unrolls the stack with the exception handling mechanism.
* Resolves dependency cycles in some components
* Allows for easier navigation of certain components like `span` which were especially large
* Some imports have been moved from `common.h` into their own files due to their infrequency
* Update licenses for dependent projects
* Add copyright notices (as provided)
* Revamp styling for `LicenseDialog`
* Fix invisible `PreferenceDialog` buttons in Settings
* Consolidating color variables into `colorPrimary`, `backgroundColor` and `backgroundColorVariant`
* Use `com.google.android.flexbox:flexbox:3.0.0` (Google Maven) rather than `com.google.android:flexbox:2.0.1` (Bintray)
* Clean Exiting was improved by implementing a robust system for when to abandon clean exiting and simply restart the process alongside moving clean exiting to the background when the application is quit by using the back button
* Audio is now automatically paused whenever the application is moved to the background and automatically resumed when it's brought to the foreground
* The system language setting had several errors and inconsistencies which have now been fixed, it's been brought more in line with HOS language (Albeit not entirely due to no region setting in Skyline)
* Fix a bug with `ThreadLocal` where the atomic `list` pointer was uninitialized resulting in a `SEGFAULT` during the destructor
* Fix handling `SA_EXPOSE_TAGBITS` bit being set in Android 12 `sigaction`
* Fix CMake bug using `CMAKE_INTERPROCEDURAL_OPTIMIZATION_RELEASE` when not supported causing `-fuse-ld=gold` to be emitted as a linker flag
* Support using `VIBRATOR_MANAGER_SERVICE` rather than `VIBRATOR_SERVICE` on Android 12
* Optimize Imports for Kotlin code
* Move away from deprecated APIs in Kotlin or explicitly mark where it's not possible
* Update SDK, NDK and libraries
* Enable Gradle Configuration Cache
These are used heavily in OpenGL games, which now, together with the
previous syncpoint changes, work perfectly. The actual implementation is
rather novel as rather than using a per-class state machine for all
methods we only use it for those that are known to be split across
GpEntry boundaries, as a result only a single bounds check is added to
the hot path of contiguous method execution and the performance loss is
negligible.
* Fix `AddClearColorSubpass` bug where it would not generate a `VkCmdNextSubpass` when an attachment clear was utilized
* Fix `AddSubpass` bug where the Depth Stencil texture would not be synced
* Respect `VkCommandPool` external synchronization requirements by making it thread-local with a custom RAII wrapper
* Fix linear RT width calculation as it's provided in terms of bytes rather than format units
* Fix `AllocateStagingBuffer` bug where it would not supply `eTransferDst` as a usage flag
* Fix `AllocateMappedImage` where `VkMemoryPropertyFlags` were not respected resulting in non-`eHostVisible` memory being utilized
* Change feature requirement in `AndroidManifest.xml` to Vulkan 1.1 from OGL 3.1 as this was incorrect
Allows the execution of multiple channels at the same time, with locking
being performed on the host GPU scheduler layer, address spaces can be
bound to one or more channels.
Infrastructure for always syncing textures has been introduced now, they will be synced prior to and after every execution. This does considerably reduce the performance alongside waiting on GPU execution to finish but it will be partially recouped once conditional syncing is performed.
* Move Shared Font TTFs to AAsset storage + Support external shared font loading from `/data/data/skyline.emu/data/fonts`
* Fix bug in `IApplicationFunctions::PopLaunchParameter` caused by ignoring `LaunchParameterKind`
* Fix bug with Choreographer causing it to be awoken and exit prior to the destruction of `PresentationEngine`
* Fix bug with `IDirectory::Read` where it used `inputBuf` for the output buffer rather than `outputBuf`
* Improve `GetFunctionStackTrace` logs when `dli_sname` or `dli_fname` are missing
* Support more RT Formats
Support for subpasses was added by reworking attachment reuse code to account for preserved attachments and subpass dependencies. A lot of RT formats were also added to allow SMO to boot up entirely, it should be noted that it doesn't render anything.
`FenceCycle` had a cyclic dependency which broke clean exit, we now utilize `std::weak_ptr<FenceCycle>` inside the `Texture` object. A minor fix for broken stack traces was also made caused by supplying a `nullptr` C-string to libfmt when a symbol was unresolved which caused an `abort` due to invocation of `strlen` with it.
This commit introduces the `CommandExecutor` which is responsible for creating and orchestrating a Vulkan command graph comprising of `CommandNode`s that construct all the objects required for rendering. As a result of the infrastructure provided by `CommandExecutor`, `ClearBuffers` could be implemented and be appropriately utilized.
A bug regarding scissors was also determined and fixed in the PR, the extent of them were previously inaccurate and this has now been fixed.
Note: We don't synchronize any textures from the guest for now as this would override the contents on the host, this'll be fixed with the appropriate write tracking but it also results in a black screen for anything that writes to FB
This commit fixes a major issue with command buffer allocation which would result in only being able to utilize a command buffer slot on the 2nd attempt to use it after it's freed, this would lead to a significantly larger amount of command buffers being created than necessary. It also fixes an issue with the command buffers not being reset after they were utilized which results in UB eventually.
Another issue was fixed with `FenceCycle` where all dependencies are only destroyed on destruction of the `FenceCycle` itself rather than the function where the `VkFence` was found to be signalled.
This commit implements a filter by type for any validation layer output, this allows filtering out any logs which may be unnecessary and additionally triggering a breakpoint as required.
An issue concerning the `NDEBUG` flag never being set was fixed, it's now supplied as a release compiler flag. The issue can manifest itself by always relying on a validation layer even though it shouldn't on release, this is why the validation layer was mistakenly disabled entirely previously by using `#ifndef` rather than `#ifdef`.
An issue with the initial layout of a texture being supplied as neither `VK_IMAGE_LAYOUT_UNDEFINED` or `VK_IMAGE_LAYOUT_PREINITIALIZED` was fixed, these cases are now handled by transitioning to those layouts after creating the image rather than supplying it within `initialLayout`.
Another issue was fixed regarding not maintaining a transformation after a surface has been destroyed and recreated existed and manifested itself when the user would go out of the app and come back in, they would see the surface having an identity transformation rather than the desired one.
Fixes bugs with the Texture Manager lookup, fixes `RenderTarget` address extraction (`low`/`high` were flipped prior), refactors `Maxwell3D::CallMethod` to utilize a case for the register being modified + preventing redundant method calls when no new value is being written to the register, and fixes the behavior of shadow RAM which was broken previously and would lead to incorrect arguments being utilized for methods.
Implement the groundwork for the texture manager to be able to report basic overlaps and be extended to support more in the future. The Maxwell3D registers `RenderTargetControl`, `RenderTarget` and a stub for `ClearBuffers` were implemented.
A lot of changes were also made to `GuestTexture`/`Texture` for supporting mipmapping and multiple array layers alongside significant architectural changes to `GuestTexture` effectively disconnecting it from `Texture` with it no longer being a parent rather an object that can be used to create a `Texture` object.
Note: Support for fragmented CPU mappings hasn't been added for texture synchronization yet
Utilize Boost Container's `small_vector` for optimizing allocations, fix certain implicit casting issues and make `ILogger` not output an additional newline in the log when the application supplies one at the end of the log
The format provided in `GraphicBuffer` can be misleading and is supplied as `None` by the Deko3D Swapchain, it instead supplies the real format in the `NvGraphicHandle` which we now utilize instead of the one in `GraphicBuffer`.
KTransferMemory needs to be reset to RW on destruction unlike KSharedMemory which is simply freed on destruction, not emulating this behavior accurately leads to `deko_examples` from `switch-examples` to lead to a SEGFAULT on selecting an example as it expects the memory to be R/W while it ends up being freed instead
The unique pointer to a device in the map was simply reset rather than deleting the entry from the map, this resulted in the device not being properly closed and when the device was reopened then the `emplace` was a NOP as the entry already existed. This resulted in a `nullptr` dereference down the line when an application attempted to issue an IOCTL to a device that was previously closed and reopened. This is known to occur in Deko3D as it recreates the context when loading an example which includes closing and reopening devices.
A hotfix for a nvdrv bug where an IOCTL with no input/output buffers would crash and a major `nvmap` bug which broke the `FromId` IOCTL as it didn't write back the handle ID, another minor bug existed in `nvhost` where the `ZCullGetInfo` IOCTL was `INOUT` rather than `OUT`.
A bug with NACPs was also fixed caused by incorrect padding for the `NacpData` structure which resulted in the `saveDataOwnerId` member being read incorrectly and as a result the save data folder being incorrect.
Co-authored-by: artem8086 <artemsvyatoha@gmail.com>
Swapped out the usages of frozen constexpr unordered maps for switch statements, which are very likely to be turned into jump tables given the nature of the enums used, resulting in better performance than a map
Unlike `ListPreference`, this preference class uses integers as its values instead of strings, avoiding unnecessary casting. It also doesn't require an array for values: in that case it will be using the clicked entry position as its value.
Using a u32 for the loop index prevents masking on all increments,
giving a moderate performance increase.
Passing methods as u32 parameters and stopping subChannel being passed
gives quite a significant increase when combined with the inlining
allowed by subchannel based engine selection.
The implementation of GPU channels and Host1X channels will be split up
as it allows a much cleaner implementation and less undefined behaviour
potential.
This will be required later for NVDEC/SMMU support and fixes many
significant issues in the previous implementation.
Based off of my 2.0.0/12.0.0 nvdrv REs.
The syncpoint manager has beeen given convinience functions for fences
which remove the need to access the raw id/threshold most of the time
and various accuracy fixes and cleanups to match HOS 12.0.0 have also
been done.
Implements the 'csrng:' service using C++ <random>'s Mersenne Twister, this does make it insecure for cryptographic purposes but it is pointless to attempt to do this regardless as we cannot ensure that the guest will run in a secure environment which cannot be mutated by an attacker. Used by Prison Princess, Pokemon Cafe Mix, Paint your Pet and more.
Encountered in 不如帰大乱 when `HOS-3` is awoken at the same time as `HOS-0` called `SvcSetThreadCoreMask` resulting in a deadlock where `HOS-0` owns `HOS-3`'s `coreMigrationMutex` while `HOS-3` owns the core mutex with the both of them attempting to lock the other mutex
We've moved from using an AAR for Mbed TLS to a submodule as the AAR was packaged manually and used from a local repository which ended up being very hacky and resulted in Linter errors, it could also not be updated with ease as it would need to be repackaged. All of these issues have been solved by moving to a git submodule tied to the official Mbed TLS GitHub repository.
Sticky transforms have been stubbed, as they are on HOS/Android. Certain titles like Xenoblade Chronicles end up setting the sticky transform even if it doesn't do anything, as a result of this we cannot throw an exception for it and stub it without an exception (Aside from the cases where the value isn't recognized).
The following GraphicBufferProducer transactions were implemented:
* `SetBufferCount`
* `DetachBuffer`
* `DetachNextBuffer`
* `AttachBuffer`
It should be noted that `preallocatedBufferCount` (previously `hasBufferCount`) and `activeSlotCount` were adapted accordingly with how they were effectively the same value as all active buffers were preallocated prior but now there can be a non-preallocated active slot.
Additionally, a bug has been fixed where `SetPreallocatedBuffer` has the graphic buffer as an optional argument whereas it was treated as a mandatory argument prior and could lead to a SEGFAULT if an application were to not pass in a buffer.
VI/IHOSBinder suffered from major inaccuracies in their function due to being quickly thrown together initially with little concern for accuracy, this has now been fixed with them being substantially more accurate now.
`ENUM_STRING` now has a unified implementation in <common/macros.h> with a documented format and can be used throughout the codebase.
A major performance regression was added in the Host1X Syncpoint revamp as it did a syscall if there were any waiters during `Increment` even if they would just be woken up and go back to sleep as the threshold wasn't hit. It has now been optimized to only do a wake if any waiting thread needs to be awoken.
There was also a bug concerning increment where it would perform actions corresponding to the previous increment rather than the current one which has also been fixed.
We used instantaneous values for FPS previously which led to a lot of variation in it and the inability to determine a proper FPS value due to constant fluctuations. All FPS values are now averaged to allow reading out a stable value and a deviation statistic has been added for the frame-time to judge judder and frame-pacing which allows for a significantly better measure of overall performance. The formatting for all the floating-point numbers is now fixed-point to prevent shifting of position due to decimal digits becoming 0.
Support for the following parameters was added to `QueueBuffer`:
* Earliest Present Timestamp
* Swap Interval
* Crop
* Scaling Mode
* Transform
* Frame ID (Not returned to guest yet)
We utilize ANativeWindow APIs directly to achieve all of this in an efficient manner since HWC will be used directly for it, we do plan to introduce Vulkan equivalents for all of these operations later down the line for a port to non-Android platforms.
We had issues when combining host and guest presentation since certain configurations in guest presentation such as double buffering were very unoptimal for the host and would significantly affect the FPS. As a result of this, we've now made host presentation have its own presentation textures which are copied into from the guest at presentation time, allowing us to change parameters of the host presentation independently of the guest.
We've implemented the infrastructure for this which includes being able to create images from host GPU memory using VMA, an optimized linear texture sync and a method to do on-GPU texture-to-texture copies.
We've also moved to driving the V-Sync event using AChoreographer on its on thread in this PR, which more accurately encapsulates HOS behavior and allows games such as ARMS to boot as they depend on the V-Sync event being signalled even when the game isn't presenting.
This commit reworks the `Texture` class to include a Vulkan Image backing that can be optionally owning or non-owning and swapped in with consideration for Vulkan image layout, it also adds CPU-sided synchronization for the texture objects with FenceCycle. It also makes the appropriate changes to `PresentationEngine` and `GraphicBufferProducer` to work with the new `Texture` class while setting the groundwork for supporting swapchain recreation. It also fixes a log in `IpcResponse` and improves the display mode selection algorithm by further weighing refresh rate.
Implements a wrapper over fences to track a single cycle of activation, implement a Vulkan memory manager that wraps the Vulkan-Memory-Allocator library and a command scheduler for scheduling Vulkan command buffers
This commit makes GraphicBufferProducer significantly more accurate by matching the behavior of AOSP alongside mirroring the tweaks made by Nintendo.
It eliminates a lot of the magic structures and enumerations used prior and replaces them with the correct values from AOSP or HOS.
There was a lot of functional inaccuracy as well which was fixed, we emulate the exact subset of HOS behavior that we need to. A lot of the intermediate layers such as GraphicBufferConsumer or Gralloc/Sync are not emulated as they're pointless abstractions here.
This commit adds in VkSurface/VkSwapchain initialization and recreation. It also adapts GraphicsBuffferProducer and Texture to fit in with those changes but it doesn't yet implement presenting those buffers nor uploading guest buffers onto the host.
Vulkan Device initialization is handled now, it supports required extensions but support for optional extensions/features/properties will come in later when we require those. In addition, we now correctly report the version of Skyline to Vulkan which can be accessed from debugging tools.
There's also a minor change regarding the search pattern for `SkylineLibraries` which now only searches in headers of libraries and it also explicitly excludes the redundant `vulkan.hpp` from the `Vulkan-Headers` repository.
The GPU class has been extended in this for Vulkan initialization, this is done to the point of initializing the instance alongside loading in `VK_LAYER_KHRONOS_validation` which is also now packed into all Debug APKs for Skyline. In addition, `VK_EXT_debug_report` is also initialized and it's output is piped directly into the Logger.
A minor change regarding the type of the `Fps` and `Frametime` globals was changed to `skyline::i32`s which is a more suitable type due to those having a smaller chance of overflowing while being signed as Java doesn't have unsigned integral types.
As both of these are in the same memory segment they have no individual
alignment requirements, this created a bug in
にゃんらぶ~私の恋の見つけ方~ where the data segment would be larger
than the game expected and invalid command line arguments would be read.
armed.
It was discovered during testing of 'Hatsune Miku Project DIVA: Mega Mix'
that if a thread was starting while preemption was being enabled a NULL
pointer dereference could occur in the timer_settime call as
timer_create may not have been called yet.
This is used by games before calling into nvdec in order to clock up the
HW module, it can also be used to request a RAM frequency. Since we
obviously don't emulate the hardware down to this level a basic stub
that provides the correct reponses is enough.
Fixes a crash on first level of Super Mario Odyssey.
We used a custom version of Vulkan-Hpp which split the files a lot prior to avoid any developers needing to manually set IDE settings for IntelliJ to work but this wasn't practical due to how it required modifications to Vulkan-Hpp's generator which would make maintenance extremely difficult. It was determined that we should just add the requirement for changing the IDE settings and use Vulkan-Hpp directly.