skyline

mirror of https://github.com/skyline-emu/skyline.git synced 2024-06-18 13:58:48 +02:00

Author	SHA1	Message	Date
Billy Laws	3901ecbf49	Hook up indirect draws into usagetracker Now usagetracker is properly in place, indirect draw HLE can be used without requiring any hacks. Dirtiness is now ignored when fetching macro arguments, and it's now the duty of the HLE impls themselves to perform flushing if they require it.	2023-03-19 13:52:15 +00:00
Billy Laws	b313dcbdca	Avoid dereferencing macro argument pointers in memory where possible Indirect draws are implemented by having the macro arguments overflow into a seperate GP Entry that points directly to the indirect argument buffer. To HLE indirect draws a buffer needs to be created from this pointer, and it cannot be dereferenced on the CPU at any point to avoid hitting traps.	2023-03-19 13:52:15 +00:00
Billy Laws	4bb2a41594	Use usagetracker to determine if pushbuffers need to flush the GPU	2023-03-19 13:52:15 +00:00
Billy Laws	45bbf3bb2a	Fix indirect draws with direct buffers We need to wait on the GPFIFO manually as we won't hit the traps when accesing the indirect params with direct as we usually would.	2023-01-08 19:30:52 +00:00
Billy Laws	f4f658e3b7	Fix typo	2022-12-03 22:50:56 +00:00
Billy Laws	d849875656	Only unlock GPU channel state on queue wait if it was previously locked	2022-12-03 22:50:56 +00:00
Billy Laws	2ce146e28f	Don't crash on the Grp0SetSubDevMask TertOp Used by Vulkan games to set the SLI mask, not applicable to the switch.	2022-11-02 17:46:07 +00:00
Billy Laws	3e8bd26978	Add a global gm20b channel lock Allowing for parallel execution of channels never really benefitted many games and prevented optimisations such as keeping frequently used resources always locked to avoid the constant overhead of locking on the hot path.	2022-11-02 17:46:07 +00:00
Billy Laws	bd7eee8e2b	Optimise GPFIFO command processing GPFIFO code is very high throughput due to the sheer number of commands used for rendering. Adjust some types and switch to a if statement with hints to slightly increase processing speed.	2022-11-02 17:46:07 +00:00
Billy Laws	d810619203	Drop 3D engine method calling fast path in GPFIFO This ended up actually turning out to be a slow path when Maxwell 3D method handling code was inlined.	2022-11-02 17:46:07 +00:00
Billy Laws	30ec844a1b	Use GPFIFO pushbuffer contents in-place if possible The memcpy within `Read()` was taking up a fair amount of time, avoid this by using the mapped range in-place when the mapping isn't split.	2022-11-02 17:46:07 +00:00
Billy Laws	06053d3caf	Rewrite Fermi 2D engine to use the blit helper shader Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock	2022-08-08 17:40:35 +01:00
Billy Laws	561103d3da	Submit GPFIFO work prior to `CircularQueue` waiting The position at which we call submit is a significant factor in performance and we did so at the end of PBs (PushBuffers), this isn't optimal as there could be multiple PBs queued up that would benefit from being in the same submission. We now delay the submission of the workload till we run out of PBs.	2022-08-06 22:20:54 +05:30
PixelyIon	ffaefc82d3	Call all flush callbacks prior to `CommandExecutor` submission The flush callbacks inside `CommandExecutor` weren't being called prior to submission as they should've been, this fixes that by calling them. It additionally removes the requirement to manually flush Maxwell3D at the end of `ChannelGpfifo` pushbuffers as it's a flush callback and will automatically be called by `Submit`. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	e65707cd9d	Handle `CommandExecutor` submission at end of `ChannelGpfifo` PB Any work that was done in a `ChannelGpfifo` pushbuffer needs to be submitted at the end of it, if it isn't done then the work might incorrectly be not done till the next submission. This commit fixes it by calling `CommandExecutor::Submit` at the end of a pushbuffer, submitting any buffers that would've been left over. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-08-06 22:20:54 +05:30
PixelyIon	6e09dc5204	Fix thread name setting We utilize `pthread_setname_np` to set the thread names but didn't check for any errors which resulted in the `Skyline-Choreographer` and `ChannelCmdFifo` not having proper names as they exceeded the 16 character limit on thread names for the pthread function. This has now been fixed by changing the names and introducing error checking to invocations of this function.	2022-08-06 22:18:42 +05:30
Robin Kertels	0a3cf25823	Implement the Fermi 2D blitting engine The Fermi 2D engine implements both image blit and resolve operations, supporting subpixel sampling with both linear and point filtering. Resolve operations are performed by sampling from the center of each pixel in order to resolve the final image from the MSAA samples MSAA images are stored in memory like regular images but each pixels dimensions are scaled: e.g for 2x2 MSAA ``` 112233 112233 445566 445566 ``` These would be sampled with both duDx and duDy as 2 (integer part), resolving to the following: ``` 123 456 ``` Blit operations are performed by sampling from the corner of each pixel, scaling the image as one would expect. This implementation isn't fully complete as Vulkan blit doesn't support some combinations which Fermi does, most notably between colour and depth stencil. These will be implemented properly at a later date, likely after the texture manager rework. Out of Bounds Blit, used by some OpenGL games is also missing since supporting it requires texture aliasing, this will also be supported after the texture manager rework. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-05-13 22:37:37 +01:00
Billy Laws	03594a081c	Ensure correct flushing for batched constant buffer updates Cbufs could be read by non-maxwell3D engines so force a flush when switching to them or before Execute.	2022-05-07 13:56:09 +01:00
PixelyIon	41b98c7daa	Add stack tracing to `skyline::exception` Skyline's `exception` class now stores a list of all stack frames during the invocation of the exception. These can later be parsed by the exception handler to generate a human-readable stack trace. To assist with more complete stack traces, `-fno-omit-frame-pointer` is now passed on debug builds which forces the inclusion of frames on function calls.	2022-04-14 14:14:52 +05:30
PixelyIon	ea00f1bb82	Flush emulation logs after exceptions A lot of logs are incomplete due to being unable to flush inside the signal handler, now we flush after any exceptions so that there is a guarantee of any exceptions being logged as this is crucial for proper debugging.	2022-04-14 14:14:52 +05:30
Billy Laws	3c26921d54	Implement the Maxwell DMA engine The DMA engine is used to perform DMA buffer/texture copies directly on the GPU. It can deswizzle arbritary regions of input textures, perform component remapping and swizzle into output textures. This impl only supports 1D buffer copies, 2D ones will come later.	2022-04-14 14:14:52 +05:30
PixelyIon	cb2614f80e	Handle host accesses for NCE Memory Trapping API We cannot ignore accesses from the host to a region protected by the NCE Memory Trapping API, there's often access to regions which have overlap with a protected region unintentionally and those accesses need to be handled correctly rather than leading to a crash. This is done by implementing an additional signal handler `NCE::HostSignalHandler` to lookup any potential traps on a `SIGSEGV` and handle them correctly or when there isn't a corresponding trap raise a `SIGTRAP` when debugger is connected or delegate to `signal::ExceptionalSignalHandler` when it isn't.	2022-04-14 14:14:52 +05:30
Billy Laws	ae41ddf4f0	Implement a skeleton compute engine The Kepler compute engine is used to run compute jobs encapsulated in to QMDs on the GPU, this commit doesn't implement compute itself but adds the register and QMD structs that will be needed for it in the future.	2022-04-14 14:14:52 +05:30
Billy Laws	0298a7b1f6	Implement the actual inline to memory engine on subch 2 Used mostly by OGL games for copying stuff around.	2022-04-14 14:14:52 +05:30
Billy Laws	a5dd961f01	Add support for batched method sending Important for constbuf updates which would be very slow if done one at a time.	2022-04-14 14:14:52 +05:30
Billy Laws	7e16c1f989	Heavily optimise GPFIFO command dispatch to reduce redundant checks Previously for methods with count > 1 the subchannel and engine would be looked up for each part of the method rather than only doing so at the start. Each call also needed to be looked up to see if it touched a macro or GPFIFO method. Fix this by doing checks outside of the main dispatch loop with templated helper lambdas to avoid needing to repeat lots of code. Maxwell3D is the only subchannel with a fast path for now but more can be added later if needed.	2022-04-14 14:14:52 +05:30
Billy Laws	62db21fb78	Rework GPFIFO method distribution and macros to support multiple engines Fermi2D supports macros in addition to Maxwell3D, these both share code memory. To support this we rework the macro interpreter to support passing in a target engine and abstract the communications out into an interface that can be implemented by applicable engines. ``` GPFIFO <-> MME <-> Maxwell3D ^ ^---> Fermi2D X------------> I2M X------------> MaxwellComputeB X--Flush-----> MaxwellDMA ```	2022-04-14 14:14:52 +05:30
Billy Laws	8d5463ef28	Drop engine base class usage from GPFIFO This class does nothing since we made stopped GPFIFO submits from using virtual functions so it can be dropped.	2022-04-14 14:14:52 +05:30
lynxnb	5cd1f01690	Refactor all logger calls	2021-11-11 16:13:24 +01:00
Billy Laws	b7d0f2fafa	Implement support for pushbuffer methods split across multiple GpEntries These are used heavily in OpenGL games, which now, together with the previous syncpoint changes, work perfectly. The actual implementation is rather novel as rather than using a per-class state machine for all methods we only use it for those that are known to be split across GpEntry boundaries, as a result only a single bounds check is added to the hot path of contiguous method execution and the performance loss is negligible.	2021-10-16 12:13:30 +01:00
Billy Laws	fc017e1e95	Implement pre-wait and post-increment syncpoint operations in submit These are used by both OpenGL and Vulkan games as opposed to including the operations inside the main commandbuffer.	2021-10-16 12:13:30 +01:00
Billy Laws	eb25f60033	Implement multichannel support for GPU Allows the execution of multiple channels at the same time, with locking being performed on the host GPU scheduler layer, address spaces can be bound to one or more channels.	2021-10-16 12:13:30 +01:00
PixelyIon	d45193874e	Fix NvDrv `CloseDevice` Bug The unique pointer to a device in the map was simply reset rather than deleting the entry from the map, this resulted in the device not being properly closed and when the device was reopened then the `emplace` was a NOP as the entry already existed. This resulted in a `nullptr` dereference down the line when an application attempted to issue an IOCTL to a device that was previously closed and reopened. This is known to occur in Deko3D as it recreates the context when loading an example which includes closing and reopening devices.	2021-10-03 18:02:32 +05:30
Billy Laws	39faa739b9	Optimise GPFIFO command processing for higher throughput Using a u32 for the loop index prevents masking on all increments, giving a moderate performance increase. Passing methods as u32 parameters and stopping subChannel being passed gives quite a significant increase when combined with the inlining allowed by subchannel based engine selection.	2021-09-30 02:11:19 +05:30
Billy Laws	d03b288db6	NEEDS CLEANUP: Reimplement GPU VMM and rewrite nvdrv VM impl	2021-09-30 02:11:19 +05:30
PixelyIon	3f7373209a	Move Guest GPU into SoC Directory We decided to restructure Skyline to draw a layer of separation between guest and host GPU. We're reserving the `gpu` namespace and directory for purely host GPU and creating a new `soc` directory and namespace for emulation of parts of the X1 SoC which is currently limited to guest GPU but will be expanded to contain components like the audio DSP down the line.	2021-03-25 22:57:26 +05:30

36 Commits