Using a u32 for the loop index prevents masking on all increments,
giving a moderate performance increase.
Passing methods as u32 parameters and stopping subChannel being passed
gives quite a significant increase when combined with the inlining
allowed by subchannel based engine selection.
This commit makes GraphicBufferProducer significantly more accurate by matching the behavior of AOSP alongside mirroring the tweaks made by Nintendo.
It eliminates a lot of the magic structures and enumerations used prior and replaces them with the correct values from AOSP or HOS.
There was a lot of functional inaccuracy as well which was fixed, we emulate the exact subset of HOS behavior that we need to. A lot of the intermediate layers such as GraphicBufferConsumer or Gralloc/Sync are not emulated as they're pointless abstractions here.
An RAII scoped trace was used for SvcWaitSynchronization but it was placed within a condition scope which led to an incorrect lifetime for the traces. Minor changes regarding the CR not affecting functionality were made aside from that.
We decided to restructure Skyline to draw a layer of separation between guest and host GPU. We're reserving the `gpu` namespace and directory for purely host GPU and creating a new `soc` directory and namespace for emulation of parts of the X1 SoC which is currently limited to guest GPU but will be expanded to contain components like the audio DSP down the line.