117 Commits

Author SHA1 Message Date
xsacha
9efa62b0ed Add SSSE3 implementation for RGBA8 texture decode. It is 25% faster (3/4 of the cycles) than the SSE2 version.
Remove a bit of redundancy in CMPR.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6773 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 17:17:26 +00:00
xsacha
9cb3340754 An extra 5-10% speedup for I4 texture decoding with SSE4.1 intrinsics.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6772 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 16:00:39 +00:00
xsacha
dcbfd4ea4c SSSE3 implementation of IA8 texture decode. Roughly 50% faster than SSE2 version on my computer (SSSE3: 77%, SSE2: 57% vs reference C on Core2 Duo). About half as many cycles.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6770 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 14:55:05 +00:00
xsacha
a6acc99a89 Last commit only requires SSSE3, not SSE4.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6769 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 13:54:18 +00:00
xsacha
53474403e2 An SSE4 implementation for I8 texture decode. Slightly faster than SSE2 version on my computer (SSE4: 60%, SSE2: 55% vs reference C on Core2 Duo).
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6768 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 13:40:32 +00:00
james.jdunne
1671917472 Missed one MSVC-ism. Should fix build for Linux. Last revision should still work for Windows. No functionality changes this time.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6762 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 16:50:09 +00:00
james.jdunne
2841d67ce3 Faster SSE2 optimized GX_TF_CMPR texture decoder which gets ~40% speed improvement on x64 and ~50% improvement on x86 as compared to reference C code.
The code now uses direct pointer access from C code to write the colors to the destination texture instead of trying to force them back up into an __m128i and a single write call. This is what produces the major speed-up.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6761 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 16:41:20 +00:00
Soren Jorvang
95b6d3f445 Kill HAVE_OPENCL.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6756 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 01:11:32 +00:00
james.jdunne
5ca3adde3c Enabled SSE2 optimization of GX_TF_CMPR decoder only for x86 builds. It can't compete with the x64 optimized reference C code.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6755 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 00:32:52 +00:00
james.jdunne
8f9f1b64ff Fixed Linux build.
Fixed small undiscovered bug in WII_IPC_HLE_Device_FileIO.cpp when looking at 0-length strings.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6745 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-05 03:19:53 +00:00
james.jdunne
6eff71b893 Fixed S3TC DXT1 decoder implementation. I see ~40% speed improvements running on x86 Intel hardware and 0% improvements running on x64 AMD hardware. Strange. More investigation to follow!
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6744 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-05 02:48:32 +00:00
james.jdunne
dd7d325453 Temporarily reverting to unoptimized until I can figure this out. Apologies for the SNAFU :)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6741 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-04 22:30:58 +00:00
james.jdunne
99b4f4703d ~40% speed improvement for decoding GX_TF_CMPR (S3TC) textures using SSE2 intrinsics.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6740 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-04 19:54:30 +00:00
james.jdunne
b7f7a248c5 Fix for bit reduction regression in GX_TF_RGB565 textures from previous commit.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6726 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-03 01:24:35 +00:00
james.jdunne
4b15325acd GX_TF_RGB565 texture decoder optimized with SSE2 producing a ~78% speed increase over reference C implementation.
Fixed crash in debugger when attempting to enable profiler before having run any game.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6724 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-02 22:53:25 +00:00
james.jdunne
7703018632 Possibly fixed game crash issues by switching to unaligned SSE2 loads/stores.
Removed unnecessary work being done in the file system when logging is disabled.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6714 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-01 20:54:15 +00:00
james.jdunne
60082853ec GX_TF_I4 texture decoder optimized with SSE2 producing a ~76% speed increase over reference C implementation.
GX_TF_RGBA8 texture decoder optimized with SSE2 producing a ~68% speed increase over reference C implementation.
TABified the entire document per NeoBrainX. :)

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6706 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-01 03:52:32 +00:00
james.jdunne
134fca9b82 ~68% increase in GX_TF_IA8 decoding speed. Not an oft-used texture format. An example use is the Wii cursor in MKWii in the menus.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6699 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-31 10:20:43 +00:00
james.jdunne
343e3f7c75 ~80% speed improvement in decoding GX_TF_I8 textures. Yes, EIGHTY PERCENT. However, for MKWii movie playback I still can't break the fluffin' 48 FPS boundary on my machine! There's something else at play here because this decoder is ridonkulously fast.
~25% speed improvement in decoding GX_TF_RGB5A3 textures which aren't used very much. I thought it would help for movie playback but I misled myself. Video playback has nothing to do with this texture format.
Next I'll see if I can knock out some of these other texture decoders. Byte swizzling I'm sure can somehow be accomplished using _mm_unpacklo_epi8 trickery, so that'd be another big win I hope.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6698 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-31 07:23:17 +00:00
james.jdunne
b038df64bf TextureDecoder.cpp: new SSE2 optimized GX_TF_I8 decoder. Probably not ultimately optimal SSE2 code, but provably better (on my machine) than the memset version. Tested with __rdtsc counts in an independent project. I get about 6-7 FPS more on average during the intro movie playback in Mario Kart Wii. Hope this compiles for GCC okay.
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6692 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-30 19:17:08 +00:00
Shawn Hoffman
444854601c CMPR texel blocks are 8x8 in hardware, even though the source format says 4x4.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6655 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-24 11:15:51 +00:00
nodchip
d888f13bb9 VideoCommon: An experimental fix for Issue 3493. Changed _mm_load_si128 to _mm_loadu_si128. I could not test the bug because I don't have Sonic Colors.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6402 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-11-14 04:29:20 +00:00
Rodolfo Osvaldo Bogado
9b0357b5e2 sometimes to advance you have to make a step back.
use plain vertex arrays instead of VBOs to render in Opengl plugin as the nature of the data make VBOs slower. This must bring, depending on the implementation, a good speedup in opengl.
in my system now opengl and d3d9 have a difference of 1 to 5 fps depending of the game.
some cleanup and a little work pointing to future improvements in the way of rendering.
please test and check for any errors.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6139 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-08-28 15:09:42 +00:00
Soren Jorvang
453f7c67cd Newer versions of GCC's <tmmintrin.h> check for __SSSE3__ (-mssse3).
No matter. We don't actually need it for our purposes.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6016 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-07-31 15:26:46 +00:00
Soren Jorvang
824b509d2e Make the SSE3.1 VideoCommon code available in GCC builds.
The GCC model for extended instructions like these is that you compile
with -msse3 etc. These affect code generation for whole compilation units,
so the idea is that you have a separate .c file for each instruction set
class and then indirect to the desired one at runtime.

Without e.g. -msse4.1, the GCC built-ins used by <foointrin.h> are not
available. However, in our specific case of compiling with -msse2 and
wanting to use SSE3.1 code, enough built-ins are available that we only
need to provide a little hack for pshufb.

Upgrading this to also use SSE4.1 instructions doesn't appear feasible
without a lot of undesirable duplication of GCC built-in functions and
headers, so we'd probably have to move to the GCC model of separate
source files for that.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6014 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-07-31 14:40:01 +00:00
xsacha
21fb4cb96c Add a toggle option for OpenCL in Config (in Advanced Settings). Default is off.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5768 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-22 13:17:01 +00:00
Orphis
c2e32371f6 Refactor and prepare the OpenCL texture decoder for decoding textures to RGBA format required by DX11.
Fix the decoder codepath when OpenCL is enabled and the DX11 plugin is used.
Added the DX11 plugin to the Dolphin project dependencies.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5764 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-22 00:52:17 +00:00
Rodolfo Osvaldo Bogado
4ab0e4b8a0 fix for rbga8 decoding that causes problems in nsmbw
fix for screen clearing in opengl and d3d

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5749 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-19 21:12:09 +00:00
Rodolfo Osvaldo Bogado
8c6ae1f6f4 add a path to texture decoder to produce only rgba textures, this will make texture loading in dx11 a lot easier and give a little performance boost to.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5746 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-19 13:31:40 +00:00
Soren Jorvang
f2609d1af3 #if 0 work-in-progress code.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5488 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-05-26 20:52:44 +00:00
nitsuja-
22551a0a8a a few minor code fixes.
also added a user file that should simplify running from VS for newcomers

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5338 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-12 02:00:15 +00:00
nodchip
32794fc028 VideoCommon: Fixed the bug that some texture become black in SSSE3.1 codes.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5309 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-10 01:37:51 +00:00
nodchip
956b8eb54d VideoCommon: Added automatic selection routines for SSSE3/SSE4.1 codes. It selects SSSE3/SSE4.1 codes only if a proper preprocessor definition is defined and the target cpu supports SSSE3/SSE4.1. The selection routines in VertexLoader_* use function pointers. TextureDecoder uses a combination of "#if" and "if" statements.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5302 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-09 15:13:42 +00:00
nodchip
6136c94de5 VideoCommon: merged SSSE3/SSE4.1 codes. Added some additional SSSE3/SSE4.1 codes which will be used in "The Legend of Zelda: Twilight Princess".
These codes don't work unless "_M_SSE=0x301", for SSSE3, or "_M_SSE=0x401", for SSE4.1, is defined as a preprocessor definition.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5300 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-09 03:02:12 +00:00
Rodolfo Osvaldo Bogado
a4736f7f6b back to limit vps instead of fps as this fix fps limit and now it works correctly as now the sync between the plug in and the core is almost correct.
fixed fps display in the top bar, now it shows the real fps of the game.
some code clean up and some corrections to make everything work right in the reference renderer.
multiples xfb now is broken even in single core as is was not an error caused by dual core, i really dono where the error is,everything looks correct but if you test a game with multiples xfb or the ipl you will see the error.
ector if you can take a look at he code and throw me some ideas i'll thank you.
please test.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5272 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-03 22:22:55 +00:00
nodchip
609151c6e8 Reverted because of some processor and performance issue. I will develop in a branch about SSSE3/SSE4.1.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5123 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-24 23:58:48 +00:00
nodchip
47fb73b71a Added SSSE3/SSE4.1 code for speed up. The code does not work in this revision because cpu_info is not initialized properly. I will fix the issue in another commit.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5119 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-24 09:56:41 +00:00
Rodolfo Osvaldo Bogado
3cc5d8ce6f First a bugfix:
fixed a misbehavior in the clear code that causes depth clear problems in reference hardware (Intel as example).
add 6 parameters to optimize Safe Texture Cache:
SafeTextureCacheColorSamples, SafeTextureCacheIndexedSamples, SafeTextureCacheTlutSamples:

this 3 parameters gives the number of samples taken to calculate the final hash value, less samples = more speed, more samples = more accuracy
if 0 is specified the hash is calculated using all the data in the texture.
SafeTextureCacheColorMaxSize, SafeTextureCacheIndexedMaxSize, SafeTextureCacheTlutMaxSize:
this parameters limits the amount of data used for the hash calculation, it could appear as redundant but in some games is better to make a full hash of the first bytes instead of some samples of all the texture.

color, indexed, tlut : define the texture type, full color data, indexed, and the tlut memory.

the parameters are available in the config , no GUI at this time, if the test are OK will add it to the GUI.
if someone needs it will give more examples on how to configure the values for specific games.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5116 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-23 21:52:12 +00:00
j4ck.fr0st
e4dcdf796f OSX build fix.
Fixes Issue 2260

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5039 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-11 17:54:16 +00:00
Rodolfo Osvaldo Bogado
6200c99dd4 test commit: please test if this improve the performance with the safe texture cache
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5038 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-10 15:55:24 +00:00
Rodolfo Osvaldo Bogado
e93e777ffb second try to implement a more correct safe texture cache, implemented full hashing of tlut textures as they are the more problematic, this should solve virtually all the problems with characters in all the games that have them.
sorry to tell but this will bring a speed drop, so let you decide if this change stay or not.( used the fastest open source hash algorithm i know) 
do not apply full hashing to other format because it kills the performance.
for popular request added 9x SSAA believe me will kill your graphic card even if is the best but the image quality is exceptional.
as always please test and let me know the results.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5034 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-08 23:23:04 +00:00
Rodolfo Osvaldo Bogado
e98a273b54 fixes for my last commit:
chage the global format of the screenshots to bmp, is correctly supported by both plugins and is faster.
reverted the changes in safe texture cache, will try to make them more stable then commit them, this should fix compilation in linux and macand error introduced in MP games
corrected all the issues commented by ector, thanks for the comments alway is good that the code is revised by others to find missed spots.
please report any remaining issue to solve them.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5001 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-03 14:13:03 +00:00
Rodolfo Osvaldo Bogado
9e2bbec47f a lot of modifications here :)
first fixed scaling when updating backbuffer to make it friendly with encoders, now frame dumping must work without errors in any codec.
clean screenshot and frame dumping code now is more correct, faster and stable.
improve safe texture cache, improving the distribution of the hash algorithm, including tlut hash in the final hash of the texture, and making use of a 64 bit hash to make it more accurate.
clean a lot of code and corrected some missused vertex formats when drawing full screen quads.
and biggest change last:
implemented pseudo antialiasing: a image post-process algorithm that mimics antialiazing and is fare more easier to implement in this scenario.
you can change the intensity of the effect changing the values of the antialiasing combo. the right value depends on the game.
for example mkwii looks awesome with 8x.
please try all the changes and let me know the results.
if something is broken, please let me know and will fix it asap.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5000 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-02-03 03:52:50 +00:00
hrydgard
576990c5a3 JitIL is no longer a separate .exe/binary - it's now a simple option, Dolphin.exe now contains both cores.
Advantages:
* Less confusion for users
* No need to build twice to make sure you didn't break something
* Easier to switch between the cores for testing

Disadvantages:
* None, as far as I can tell :) Maybe some extra code complexity, but not much.

Also break some include chains that caused <windows.h> to get included into everything, slowing down the build on Windows. There's more to do here though, there's still a lot of files that get it included that don't need it at all.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4891 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-01-19 19:28:27 +00:00
Rodolfo Osvaldo Bogado
8de76f8fe8 ok big changes here:
in videocommon little fix for the alpha test values, return to the original values as they are more accurate.
in D3D:
huge change in state management, now all the state management is centralized and redundant state changes are eliminated.
Fixed the overlapped viewport error in non ati cards:
the error was caused by this: when a viewport is defined larger than the current rendertarget, an error is thrown and the last valid viewport is used, this is the reference behavior, in ati cards if a larger viewport is defined, no eror is returned, the rendering is valid and is rendered using the projection defined by the viewport but limited to the rendertarget are, exactly like opengl or the GC hardware.
to solve this in reference drivers defined a large rendertarget (2x the size of the original) and proceed to render in a centered quad insithe the larger rendertarget, in this way larger viewports always falls inside a valid rendertarget size, the drawback of this is the waste of resources. it can be dynamized, depending or games or changed at runtime when a oversized viewport is detected, but i live that to future commits.
please test this and let me know the results.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4841 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-01-15 15:52:08 +00:00
Rodolfo Osvaldo Bogado
0bc7fa7bf5 Small fix for the last commit, and a little fix for disable fog to really disable it :)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4662 8ced0084-cf51-0410-be5f-012b33b47a6e
2009-12-09 13:51:28 +00:00
Rodolfo Osvaldo Bogado
d02426a8e9 mixed commit:
in D3D and Opengl:
fixed one nasty bug in texture loading where if a dynamic texture keeps his format but the tlut format is changed, the try or reloading the texture in the same texture could cause a hang if the size of the resulting texture is different than the original (size in bytes)
Applied a ugly temporal hack to the texture conversor to solve efb to ram misalignments and effect distortions.
in D3D:
Pseudo implementation of logic ops using basic blending: the first 8 operations are "good approximations", the remaining 8 are bullshit :) if someone have a better approximation to emulate this logic please let me know.
please test if i don't break anything in the process and test Mario kart wee you will get a nice surprise.:)
 

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4656 8ced0084-cf51-0410-be5f-012b33b47a6e
2009-12-07 18:48:31 +00:00
donkopunchstania
e6d51e8ff4 Fixed 24 bit depth copies to RAM. 24 and 16 bit depth copies are now more accurate. Added an offset to DX copies to RAM and made half sized copies to a texture linearly filtered.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4635 8ced0084-cf51-0410-be5f-012b33b47a6e
2009-12-02 04:17:18 +00:00
donkopunchstania
f992fc6949 EFB to RAM in OGL and software plugin now work correctly when texture in RAM is a different size than the source. Corrected some block heights in texture decoder to fix copying certain EFB formats.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4580 8ced0084-cf51-0410-be5f-012b33b47a6e
2009-11-15 20:14:03 +00:00
Shawn Hoffman
4622bd0c8b reapply changes from 4550-4551,4556-4559 correctly...sigh...some files were ignored by tortoisesvn, and myself :|
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@4570 8ced0084-cf51-0410-be5f-012b33b47a6e
2009-11-14 17:50:51 +00:00