* Fix `AddClearColorSubpass` bug where it would not generate a `VkCmdNextSubpass` when an attachment clear was utilized
* Fix `AddSubpass` bug where the Depth Stencil texture would not be synced
* Respect `VkCommandPool` external synchronization requirements by making it thread-local with a custom RAII wrapper
* Fix linear RT width calculation as it's provided in terms of bytes rather than format units
* Fix `AllocateStagingBuffer` bug where it would not supply `eTransferDst` as a usage flag
* Fix `AllocateMappedImage` where `VkMemoryPropertyFlags` were not respected resulting in non-`eHostVisible` memory being utilized
* Change feature requirement in `AndroidManifest.xml` to Vulkan 1.1 from OGL 3.1 as this was incorrect
Allows the execution of multiple channels at the same time, with locking
being performed on the host GPU scheduler layer, address spaces can be
bound to one or more channels.
Infrastructure for always syncing textures has been introduced now, they will be synced prior to and after every execution. This does considerably reduce the performance alongside waiting on GPU execution to finish but it will be partially recouped once conditional syncing is performed.
* Move Shared Font TTFs to AAsset storage + Support external shared font loading from `/data/data/skyline.emu/data/fonts`
* Fix bug in `IApplicationFunctions::PopLaunchParameter` caused by ignoring `LaunchParameterKind`
* Fix bug with Choreographer causing it to be awoken and exit prior to the destruction of `PresentationEngine`
* Fix bug with `IDirectory::Read` where it used `inputBuf` for the output buffer rather than `outputBuf`
* Improve `GetFunctionStackTrace` logs when `dli_sname` or `dli_fname` are missing
* Support more RT Formats
Support for subpasses was added by reworking attachment reuse code to account for preserved attachments and subpass dependencies. A lot of RT formats were also added to allow SMO to boot up entirely, it should be noted that it doesn't render anything.
`FenceCycle` had a cyclic dependency which broke clean exit, we now utilize `std::weak_ptr<FenceCycle>` inside the `Texture` object. A minor fix for broken stack traces was also made caused by supplying a `nullptr` C-string to libfmt when a symbol was unresolved which caused an `abort` due to invocation of `strlen` with it.
This commit introduces the `CommandExecutor` which is responsible for creating and orchestrating a Vulkan command graph comprising of `CommandNode`s that construct all the objects required for rendering. As a result of the infrastructure provided by `CommandExecutor`, `ClearBuffers` could be implemented and be appropriately utilized.
A bug regarding scissors was also determined and fixed in the PR, the extent of them were previously inaccurate and this has now been fixed.
Note: We don't synchronize any textures from the guest for now as this would override the contents on the host, this'll be fixed with the appropriate write tracking but it also results in a black screen for anything that writes to FB
This commit fixes a major issue with command buffer allocation which would result in only being able to utilize a command buffer slot on the 2nd attempt to use it after it's freed, this would lead to a significantly larger amount of command buffers being created than necessary. It also fixes an issue with the command buffers not being reset after they were utilized which results in UB eventually.
Another issue was fixed with `FenceCycle` where all dependencies are only destroyed on destruction of the `FenceCycle` itself rather than the function where the `VkFence` was found to be signalled.
This commit implements a filter by type for any validation layer output, this allows filtering out any logs which may be unnecessary and additionally triggering a breakpoint as required.
An issue concerning the `NDEBUG` flag never being set was fixed, it's now supplied as a release compiler flag. The issue can manifest itself by always relying on a validation layer even though it shouldn't on release, this is why the validation layer was mistakenly disabled entirely previously by using `#ifndef` rather than `#ifdef`.
An issue with the initial layout of a texture being supplied as neither `VK_IMAGE_LAYOUT_UNDEFINED` or `VK_IMAGE_LAYOUT_PREINITIALIZED` was fixed, these cases are now handled by transitioning to those layouts after creating the image rather than supplying it within `initialLayout`.
Another issue was fixed regarding not maintaining a transformation after a surface has been destroyed and recreated existed and manifested itself when the user would go out of the app and come back in, they would see the surface having an identity transformation rather than the desired one.
Fixes bugs with the Texture Manager lookup, fixes `RenderTarget` address extraction (`low`/`high` were flipped prior), refactors `Maxwell3D::CallMethod` to utilize a case for the register being modified + preventing redundant method calls when no new value is being written to the register, and fixes the behavior of shadow RAM which was broken previously and would lead to incorrect arguments being utilized for methods.
Implement the groundwork for the texture manager to be able to report basic overlaps and be extended to support more in the future. The Maxwell3D registers `RenderTargetControl`, `RenderTarget` and a stub for `ClearBuffers` were implemented.
A lot of changes were also made to `GuestTexture`/`Texture` for supporting mipmapping and multiple array layers alongside significant architectural changes to `GuestTexture` effectively disconnecting it from `Texture` with it no longer being a parent rather an object that can be used to create a `Texture` object.
Note: Support for fragmented CPU mappings hasn't been added for texture synchronization yet
Utilize Boost Container's `small_vector` for optimizing allocations, fix certain implicit casting issues and make `ILogger` not output an additional newline in the log when the application supplies one at the end of the log
The format provided in `GraphicBuffer` can be misleading and is supplied as `None` by the Deko3D Swapchain, it instead supplies the real format in the `NvGraphicHandle` which we now utilize instead of the one in `GraphicBuffer`.
KTransferMemory needs to be reset to RW on destruction unlike KSharedMemory which is simply freed on destruction, not emulating this behavior accurately leads to `deko_examples` from `switch-examples` to lead to a SEGFAULT on selecting an example as it expects the memory to be R/W while it ends up being freed instead
The unique pointer to a device in the map was simply reset rather than deleting the entry from the map, this resulted in the device not being properly closed and when the device was reopened then the `emplace` was a NOP as the entry already existed. This resulted in a `nullptr` dereference down the line when an application attempted to issue an IOCTL to a device that was previously closed and reopened. This is known to occur in Deko3D as it recreates the context when loading an example which includes closing and reopening devices.
A hotfix for a nvdrv bug where an IOCTL with no input/output buffers would crash and a major `nvmap` bug which broke the `FromId` IOCTL as it didn't write back the handle ID, another minor bug existed in `nvhost` where the `ZCullGetInfo` IOCTL was `INOUT` rather than `OUT`.
A bug with NACPs was also fixed caused by incorrect padding for the `NacpData` structure which resulted in the `saveDataOwnerId` member being read incorrectly and as a result the save data folder being incorrect.
Co-authored-by: artem8086 <artemsvyatoha@gmail.com>
Swapped out the usages of frozen constexpr unordered maps for switch statements, which are very likely to be turned into jump tables given the nature of the enums used, resulting in better performance than a map
Unlike `ListPreference`, this preference class uses integers as its values instead of strings, avoiding unnecessary casting. It also doesn't require an array for values: in that case it will be using the clicked entry position as its value.
Using a u32 for the loop index prevents masking on all increments,
giving a moderate performance increase.
Passing methods as u32 parameters and stopping subChannel being passed
gives quite a significant increase when combined with the inlining
allowed by subchannel based engine selection.
The implementation of GPU channels and Host1X channels will be split up
as it allows a much cleaner implementation and less undefined behaviour
potential.
This will be required later for NVDEC/SMMU support and fixes many
significant issues in the previous implementation.
Based off of my 2.0.0/12.0.0 nvdrv REs.
The syncpoint manager has beeen given convinience functions for fences
which remove the need to access the raw id/threshold most of the time
and various accuracy fixes and cleanups to match HOS 12.0.0 have also
been done.