To allow for caching of pipelines on the host a `VkPipelineCache` has been added, it is entirely in-memory and is not flushed to the disk which'll be done in the future alongside caching guest shaders to further avoid translation where possible.
Uses all Maxwell3D state converted into Vulkan state to do an equivalent draw on the host GPU, it sets up RT/Vertex Buffer/Vertex Attribute/Shader state and creates a stubbed out `VkPipelineLayout` for the draw. Any descriptor state isn't currently handled and is yet to be implemented, currently there's no Vulkan pipeline cache supplied which will be implemented subsequently.
We require a handle to the current renderpass and the index of the subpass in certain cases, this is now tracked by the `CommandExecutor` and passed in as a parameter to `NextSubpassFunctionNode` and the newly-introduced `SubpassFunctionNode`.
Switch from `SubmitWithCycle` to manually allocating the active command buffer to tag dependencies with the `FenceCycle` that prevents them from being mutated prior to execution. This new paradigm could also allow eager recording of commands with only submission being deferred.
`CommandScheduler` API users can now directly allocate an active command buffer that they need to manage alongside its fence, this can allow for more efficient recording as it doesn't need to be immediately submitted after, it can also allow attaching objects to a `FenceCycle` prior to submission that can be useful for locking resources.
Compiles shaders supplied by the guest with caching and automatic invalidation, the size of the shader is also automatically determined by looking for `BRA $` instructions which cause an infloop, it should be noted that we have a maximum shader bytecode size, any shader above this size will not be supported.
Graphics shaders can now be compiled using the shader compiler and emit SPIR-V that can be used on the host. The binding state isn't currently handled alongside constant buffers and textures support in `GraphicsEnvironment` yet.
The operands of the subtraction in the X/Y translation calculation were the wrong way around which led to negative translations that would translate the viewport off the screen.
The default color write mask should mask no channels and write all of them and should be mutated to mask out certain channels as required by the guest.
We cannot statically construct the vertex buffer/attribute arrays for Vulkan due to inactive attributes or buffers which isn't possible on Vulkan, we also cannot just change the count dynamically as there might be disabled buffers or attributes in the middle. We just have a `static_array` which should dynamically be filled in with buffer binding/attribute Vulkan structures before submission.
Buffers generally don't have formats that are fundamentally associated with them unless they're texel buffers, if that is the case it can be manually set in `BufferView`.
The Buffer Manager handles mapping of guest buffers to host buffer views with automatic handling of sub-buffers and eventually supporting recreation of overlapping buffers to create a single larger buffer.
Implements infrastructure for using guest buffers on the host for rendering, a `BufferManager` is still missing which'd handle mapping from guest buffers to host buffers and will be subsequently committed. It should be noted that `BufferView` is also disconnected from `Buffer` and shared for every instance with the same properties like `TextureView` is now.
We want `TextureView`(s) to be disconnected from the backing on the host and instead represent a specific texture on the guest with a backing that can change depending on mapping of new textures which'd invalidate the backing but should now be automatically repointed to an appropriate new backing. This approach also requires locking of the backing to function as it is mutable till it has been locked or the backing has an attached `FenceCycle` that hasn't been signaled which will be added for `CommandExecutor` in a subsequent commit.
Introduces the `supportsShaderViewportIndexLayer` quirk and sets `Shader::Profile::support_int64_atomics` depending on if the `supportsAtomicInt64` quirk is available.
Introduces the `floatControls`, `supportsSubgroupVote` and `subgroupSize` quirks for the shader compiler which are based on Vulkan `PhysicalDevice` properties.
Vulkan has officially deprecated `VK_VERSION_*` macros for versioning as it has introduced the variant into the version. It should however be `0` for the Vulkan APi and doesn't need to be printed.
Introduces several quirks for optional features used by the shader compiler which are now reported in the `Shader::HostTranslateInfo` and `Shader::Profile` structure. There are still property-related quirks for the shader compiler which haven't been implemented in this commit.
A `Buffer` class was created to hold any generic Vulkan buffer object with `span` semantics, `StagingBuffer` was implemented atop it as a wrapper for `Buffer` that inherits from `FenceCycleDependency` and can be used as such.
It was determined that `backing` wasn't a very descriptive name and that it conflicted with the texture's own backing, the name was changed to `texture` to make it more apparent that it was specifically the `Texture` object backing the view.
A memory manager function to read into a vector till it satisfies the supplied function or hits an early stop condition like hitting the end of vector or reaching an unmapped region. This can be used to efficiently scan for values in GPU VA.
When `VK_EXT_vertex_attribute_divisor` is not available, `VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT` is unlinked from the device enabled feature list as it is undefined behavior to link a structure provided by an extension without enabling that extension.
`EXT_SET_V` would enable the extension regardless of if it was actually the correct extension or if the version was high enough as long as the hash matched.
Co-authored-by: Billy Laws <blaws05@gmail.com>
`shaderImageGatherExtended` is required by the shader compiler, to avoid complications associated with making it optional and considering that it's supported by the vast majority of Vulkan mobile devices, it was made a mandatory feature.
This class will be entirely responsible for any interop with the shader compiler, it is also responsible for caching and compilation of shaders in itself.
We want to utilize features from C++ 20 ranges but they haven't been entirely implemented in libc++ so in the meantime we use the reference implementation for it which is Ranges v3.
Any primitive topologies that are directly supported by Vulkan were implemented but the rest were not and will be implemented with conversions as they are used by applications, they are:
* LineLoop
* QuadList
* QuadStrip
* Polygon
Translates all Maxwell3D vertex attributes to Vulkan with the exception of `isConstant` which causes the vertex attribute to return a constant value `(0,0,0,X)` which was trivial in OpenGL with `glDisableVertexAttribArray` and `glVertexAttrib4(..., 0, 0, 0, 1)` but we don't have access to this in Vulkan and might need to depend on undefined behavior or manually emulate it in a shader. This'll be revisited in the future after checking host GPU behavior.
`ENUM_STRING` can be used inside a `class`/`struct`/`union` for `enum`s contained within them. Making the function `static` allows doing this and doesn't require supplying a `this` pointer of the enclosing class for usage.
This being made implicit removes any confusion that all cases would need to be implemented and explicitly define that the CF should continue onto the 2nd switch-case when it cannot find any matches in the first one.
Implements the `isVertexInputRatePerInstance` register array which controls if the vertex input rate is either per-vertex or per-instance. This works in conjunction with the vertex attribute divisor for per-instance attribute repetition of attributes.
We order all registers in ascending order, a few registers namely `colorLogicOp`, `colorWriteMask`, `clearBuffers` and `depthBiasClamp` were erroneously not following this order which has now been fixed.
We inconsistently utilized `typeof` and `decltype` all over the codebase, this has now been fixed by uniformly using `decltype` as `typeof` is a GCC extension and not in the C++ standard alongside having the hidden side effect of removing references from the determined type.
Check for `vertexAttributeInstanceRateZeroDivisor` in `VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT` when the Maxwell3D register corresponding to the vertex attribute divisor is set to 0. If it isn't then it logs a warning and sets the value anyway which could result in UB since the only alternative is an exception that stops emulation which might not be optimal if the game mostly works fine without this, we will add a user-facing warning when we intentionally allow UB like this in the future.
Implement the infrastructure to depend on `VkPhysicalDeviceFeatures2` extended feature structures which can be utilized to retrieve the specifics of features from extensions. It is implemented in the form of `vk::StructureChain` with `vk::PhysicalDeviceFeatures2` that can be extended with any extension feature structures.
This implements everything in Maxwell3D vertex buffer bindings including vertex attribute divisors which require the extension `VK_EXT_vertex_attribute_divisor` to emulate them correctly, this has been implemented in the form of of a quirk. It is dynamically enabled/disabled based on if the host GPU supports it and a warning is provided when it is used by the guest but the host GPU doesn't support it.
The Maxwell3D `Address` class follows the big-endian register ordering for addresses while on the host we consume them in little-endian, the `IOVA` class is the host equivalent to the `Address` class with implicitly flipped 32-bit register ordering. It shares implicit decomposition semantics from `Address` due to similar requirements with a minor difference of being returned by reference rather than value as we want to have value setting semantics with implicit decomposition while we don't for `Address`.
The semantics of implicitly decomposing the `Address` class into a `u64` were determined to be appropriate for the class. As it is an integer type this effectively retains all semantics from using an integer directly for the most part.
Maxwell3D supports both independent and common color write masks like color blending but for common color write masks rather than having register state specifically for it, the state from RT 0 is extended to all RTs. It should be noted that color write masks are included in blending state for Vulkan while being entirely independent from each other for Maxwell, it forces us to use the `independentBlend` feature even when we are doing common blending unless the color write mask is common as well but to simplify all this logic the feature was made required as it supported by effectively all targeted devices.
Maxwell3D supports independent blending which has different blending per-RT and common blending which has the same blending for all RTs. There is a register determining which mode to utilize and we simply have two arrays of `VkPipelineColorBlendAttachmentState` for the RTs that we toggle between to make the transition between them extremely cheap.
Independent blending is supported by effectively every Vulkan 1.1 Android GPU, it gives us the ability to architecture Maxwell3D blending emulation better as we can avoid additional checks for independent blending state and having a fallback path for when the host doesn't support the feature.
A prior commit added the ability to utilize features with quirks but this implements the ability to require a feature be present on the host or an exception will be thrown. It allows us to make useful assumptions that result in a better architecture in certain cases.
Implements the infrastructure required to enable optional extensions set in `QuirkManager` alongside the required extensions in the `GPU` class. All extensions should be correctly resolved now and according to what the device supports.
The offset was incorrectly set to `0x4D` rather than `0x4ED` which is what it should be. This would've led to bugs in line width determination and likely broken any aliased line rendering entirely.
We selectively enable GPU features that we require as enabling all of them might result in extra driver overhead in certain circumstances. Setting them is handled by `QuirkManager` with the new `FEAT_SET` function that ties a quirk with a feature.
We stub alpha testing as it doesn't exist in Vulkan and few titles use it, it can be emulated in the future using a shader patch with manually discarding fragments failing the alpha test function but this'll be added in later as it isn't high priority at the moment and has associated overhead with it so other options might be explored at the time.
It is essential to know what quirks a certain GPU may have to debug an issue, these are now printed at startup into the log alongside all other GPU information. A new `QuirkManager::Summary` function was implemented to provide this functionality.
Implements a basic part of Vulkan blending state which are color logic operations applied on the framebuffer after running the fragment shader. It is an optional feature in Vulkan and not supported on any mobile GPU vendor aside from ImgTec/NVIDIA by default.
Any signals that lead to exception handling being triggered now attempt to flush all logs given that the log mutex is unoccupied, this is to mostly help logs be more complete when exiting isn't graceful.
A lot of calls in Maxwell3D register initialization ended up setting the register to 0 which should be implicit behavior and most calls would be eliminated by the redundancy check which had to be manually disabled. It was determined to be better to move this responsibility to the called function to initialize to state equivalent to the corresponding register being 0. All initialization calls with the argument as 0 have been removed now due to this, it was the vast majority of calls.
Maxwell3D Registers weren't initialized to the correct values prior, this commit fixes that by doing `HandleMethod` calls with all the register values being initialized. This is in contrast to the registers being set without calling the methods in `GraphicsContext` or otherwise resulting in bugs.
The function `GetFormat` was seemingly no longer required due to us never converting from a Vulkan format to a Skyline format, most conversions only went from Skyline to Vulkan and were generally lossy due to certain formats being missing in Vulkan and approximated using channel swizzles. As a result of this, it was pointless to maintain and has now been removed.
Maxwell3D registers relevant to the Vulkan Rasterizer state have been implemented aside from certain features such as per-face polygon modes that cannot be implemented due to Vulkan limitations. A quirk was utilized to dynamically support the provoking vertex being the last vertex as opposed to the first as well.
We require a way to track certain host GPU features that are optional such as Vulkan extensions, this is what the `QuirkManager` class does as it checks for all quirks and holds them allowing other components to branch based off these quirks.
Due to compiler alignment issues, the bitfield member `increment` of `MacroInterpreter::MethodAddress` was mistakenly padded and moved to the next byte. This has now been fixed by making its type `u16` like the member prior to it to prevent natural alignment from kicking in.
This commit added basic shader program registers, they simply track the address a shader is pointed to at the moment. No parsing of the shader program is done within them.
A thread local LoggerContext is now used to hold the output file stream instead of the `Logger` class. Before doing any logging operations, a LoggerContext must be initialized.
This commit will not build successfully on purpose.
Dividers after titles were missing in `ControllerActivity` which made it look inconsistent with `SettingsActivity` which did have them. They have now been added by extending `DividerItemDecoration` to be drawn before any `ControllerHeaderItem`.
The icons in these FABs had the same color as the FAB prior which led them to being invisible. This has been fixed by setting a white tint on them which makes the icons clearly visible.
Additional padding has been added to the text alongside making it be left-aligned rather than center-aligned and justified. A newline has also been added to the copyright notice for Skyline to make it look nicer.
We wanted the color of the modals used by the dialogs to be the same as our regular background color rather than a lighter grey. This has now been enforced with style attributes in the case of `AlertDialog` and `setBackground` in the case of `BottomSheetDialog`.
We inconsistently used `AppCompat`'s `AlertDialog` theme in Settings while using `MaterialComponents`'s theme in Controller Configuration. This has now been fixed by universally using the `MaterialComponents` theme.
The Skyline logo was added to the title area but it ended up being too distracting with a light theme as the logo was designed purely for a white background. Ultimately, even though it looked good with the dark theme we had to remove it.
Aligning the buttons to the bottom of the game image was determined to look odd due to the amount of padding between the title and buttons. They are now back to being below the title but the buttons have been resized with "Play" being a wide button while "Pin" has been replaced with Google Material Icons's "Add To Home Screen" icon and sized down to an icon-only button.
- Logo is now displayed next to the app name
- Remove search bar animation
- New color accent
- Improve visibility of controller binding setting's glyphs
This pushes a set of command buffers into the Host1x command FIFO allocated for the channel, returning fence thresholds that can be waited on for completion,
The Host1x block of the TX1 supports 14 separate channels to which commands can be issued, these all run asynchronously so are emulated the same way as GPU channels with one FIFO emulation thread each. The command FIFO itself is very similar to the GPFIFO found in the GPU however there are some differences, mainly the introduction of classes (similar to engines) and the Mask opcode (which allows writing to a specific set of offsets much more efficiently).
There is an internal Host1x class which functions similar to the GPFIFO class in the GPU, handling general operations such as syncpoint waits, this is accessed via the simple method interface. Other channels such as NVDEC and VIC are behind the 'Tegra Host Interface' (THI) in HW, this abstracts out the classes internal details and provides a uniform method interface ontop of the Host1x method one. We emulate the THI as a templated wrapper for the underlying class.
Syncpoint increments in Host1x are different to GPU, the THI allows submitting increment requests that will be queued up and only be applied after a specific condition in the associated engine is met; however the option to for immediate increments is also available.