This still requires usagetracker to avoid redundantly performing indirect draws when the memory isn't dirty, and to allow for using it with direct memory, but it's a start.
This is necessary for the upcoming direct buffer support, as in order to use guest buffers directly without trapping we need to recreate any guest GPU sync on the host GPU. This avoids the guest thinking work is done that isn't and overwriting in-use buffer contents.
This will be shared with the compute engine implementation, the only thing of note with this is that the binding register is now passed as a param since it is part of the compute QMD which can't be dirty tracked.
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
Removes all usage of graphics_context.h from the codebase, exclusively using the new interconnect and its dirty tracking system. While porting the code a number of bugs were discovered such as not respecting the base instance or primitive type override, which have all been fixed. Currently only clears and constant buffer updates are implemented but due to the dirty state system allowing register handling on the interconnect end there shouldn't end up being many more changes.
All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.
At some point we will call Submit within draws or constant buffer updates, to avoid any infinite recursion mark draw/cbuf pending as false before performing any operation
In the Maxwell 3D engine, instanced draws are implemented by repeating the exact same draw in sequence with special flag set in vertexBeginGl. This flag allows either incrementing the instance counter or resetting it, since we need to supply an instance count to the host API we defer all draws until state changes occur. If there are no state changes between draws we can skip them and count the occurences to get the number of instances to draw.
Maxwell3D has a register for linking the TIC/TSC index in bindless texture handles, this is used by games to implement bindless combined texture-sampler handles.
This implements all Maxwell3D registers and HLE Vulkan state for Tessellation including invalidation of the TCS (Tessellation Control Shader) state during state changes.
This controls the depth range used by the shader, hades already has support for the necessary patching so we only need to pass the current mode over to it and it'll do the necessary work.
Stencil operations are configurable to be the same for both sides or have independent stencil state for both sides. It is controlled via the previously unimplemented `stencilTwoSideEnable`.
Fermi2D supports macros in addition to Maxwell3D, these both share code memory. To support this we rework the macro interpreter to support passing in a target engine and abstract the communications out into an interface that can be implemented by applicable engines.
```
GPFIFO <-> MME <-> Maxwell3D
^ ^---> Fermi2D
X------------> I2M
X------------> MaxwellComputeB
X--Flush-----> MaxwellDMA
```
Implements the entirety of Maxwell3D Depth/Stencil state for both faces including compare/write masks and reference value. Maxwell3D register `stencilTwoSideEnable` is ignored as its behavior is unknown and could mean the same behavior for both stencils or the back facing stencil being disabled as a result of this it is unimplemented.
Implements inline constant buffer updates that are written to the CPU copy of the buffer rather than generating an actual inline buffer write, this works for TIC/TSC index updates but won't work when the buffer is expected to actually be updated inline with regard to sequence rather than just as a buffer upload prior to rendering.
GPU-sided constant buffer updates will be implemented later with optimizations for updating an entire range by handling GPFIFO `Inc`/`NonInc`directly and submitting it as a host inline buffer update.
Adds support for index buffers including U8 index buffers via the `VK_EXT_index_type_uint8` extension which has been added as an optional quirk but an exception will be thrown if the guest utilizes it but the host doesn't support it.
Shader compilation is now broken into shader program parsing and pipeline shader compilation which will allow for supporting dual vertex shaders and more atomic invalidation depending on runtime state limiting the amount of work that is redone.
Bindings are now properly handled allowing for bound UBOs to be converted to the appropriate host UBO as designated by the shader compiler by creating Vulkan Descriptor Sets that match it.
We need this to make the distinction between a shader and pipeline stage in as shader programs are bound at a different rate than that of pipeline stage resources such as UBO.
Support for clearing the depth/stencil RT has been added as its own function via either optimized `VkAttachmentLoadOp`-based clears or `vkCmdClearAttachments`. A bit of cleanup has also been done for color RT clears with the lambda for the slow-path purely calling the command rather than creating the parameter structures.
Support the Maxwell3D Depth RT for Z-buffering, this just creates an equivalent `RenderTarget` object with no support on the API-user side (IE: `Draw` and `ClearBuffers`).