Real xfb didn't provide any read_stride, so there is a division by zero.
This commit calculates the correct read_stride for real_xfb, so there is also no hack for texture vs xfb needed.
This is done with a pixel buffer object. We still have to stall the GPU, but
we only do it once per efb2ram call.
As the cpu can't access the vram, it has to queue a memcpy for the gpu and
wait for the gpu to finish this copy. We did this for every cache line which
is just stupid. Now we copy the complete texture into a pbo and readback this
at once. So we don't have to wait for lots of round-trip-times.
We use attributeless rendering, so officially we have to bind _any_ VAO.
As the state of this VAO doesn't matter, we don't have to switch it.
Also fix an AMD issue as they don't like to render from an empty VAO.
We neither scale nor render from subimages, so we by using gl_Position, we don't have to generate _any_ vertices for this converting.
Also remove the glTexSubImage optimization as every driver does it when needed. But there are some workflows (eg on APU) where it's better to realloc this texture instead of a second memcpy or stall.
This fixes Real XFB Jaggies in OpenGL on games which use yscaling, such
as most PAL games.
This fixes the last of the "Real XFB Macroblocking" issues for opengl,
see issue #6503
Seems OpenGL ES 3 Requires you must have an lod argument, while Desktop
versions require you must not have a lod argument if you are using a
Sampler2DRect (which doesn't do Mipmapping).