TextureDecoder.cpp: new SSE2 optimized GX_TF_I8 decoder. Probably not ultimately optimal SSE2 code, but provably better (on my machine) than the memset version. Tested with __rdtsc counts in an independent project. I get about 6-7 FPS more on average during the intro movie playback in Mario Kart Wii. Hope this compiles for GCC okay.

TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6692 8ced0084-cf51-0410-be5f-012b33b47a6e
This commit is contained in:
james.jdunne
2010-12-30 19:17:08 +00:00
parent 6cf9b3688d
commit b038df64bf
5 changed files with 164 additions and 66 deletions

View File

@ -307,8 +307,7 @@ u8 AnalyzeAndRunDisplayList(u32 address, int size, CachedDisplayList *dl)
u32 xf_address = Cmd2 & 0xFFFF;
// TODO - speed this up. pshufb?
u32 data_buffer[16];
for (int i = 0; i < transfer_size; i++)
data_buffer[i] = DataReadU32();
DataReadU32xFuncs[transfer_size-1](data_buffer);
LoadXFReg(transfer_size, xf_address, data_buffer);
INCSTAT(stats.thisFrame.numXFLoads);
num_xf_reg++;
@ -462,8 +461,7 @@ bool CompileAndRunDisplayList(u32 address, int size, CachedDisplayList *dl)
NewRegion->hash = 0;
dl->InsertRegion(NewRegion);
u32 *data_buffer = (u32*)NewRegion->start_address;
for (int i = 0; i < transfer_size; i++)
data_buffer[i] = DataReadU32();
DataReadU32xFuncs[transfer_size-1](data_buffer);
LoadXFReg(transfer_size, xf_address, data_buffer);
INCSTAT(stats.thisFrame.numXFLoads);
// Compile