mirror of
https://github.com/dolphin-emu/dolphin.git
synced 2025-06-11 16:49:28 +02:00
TextureDecoder.cpp: new SSE2 optimized GX_TF_I8 decoder. Probably not ultimately optimal SSE2 code, but provably better (on my machine) than the memset version. Tested with __rdtsc counts in an independent project. I get about 6-7 FPS more on average during the intro movie playback in Mario Kart Wii. Hope this compiles for GCC okay.
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods. OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case. git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6692 8ced0084-cf51-0410-be5f-012b33b47a6e
This commit is contained in:
@ -307,8 +307,7 @@ u8 AnalyzeAndRunDisplayList(u32 address, int size, CachedDisplayList *dl)
|
||||
u32 xf_address = Cmd2 & 0xFFFF;
|
||||
// TODO - speed this up. pshufb?
|
||||
u32 data_buffer[16];
|
||||
for (int i = 0; i < transfer_size; i++)
|
||||
data_buffer[i] = DataReadU32();
|
||||
DataReadU32xFuncs[transfer_size-1](data_buffer);
|
||||
LoadXFReg(transfer_size, xf_address, data_buffer);
|
||||
INCSTAT(stats.thisFrame.numXFLoads);
|
||||
num_xf_reg++;
|
||||
@ -462,8 +461,7 @@ bool CompileAndRunDisplayList(u32 address, int size, CachedDisplayList *dl)
|
||||
NewRegion->hash = 0;
|
||||
dl->InsertRegion(NewRegion);
|
||||
u32 *data_buffer = (u32*)NewRegion->start_address;
|
||||
for (int i = 0; i < transfer_size; i++)
|
||||
data_buffer[i] = DataReadU32();
|
||||
DataReadU32xFuncs[transfer_size-1](data_buffer);
|
||||
LoadXFReg(transfer_size, xf_address, data_buffer);
|
||||
INCSTAT(stats.thisFrame.numXFLoads);
|
||||
// Compile
|
||||
|
Reference in New Issue
Block a user