2014-11-13 23:26:49 +01:00
|
|
|
// Copyright 2014 Dolphin Emulator Project
|
|
|
|
// Licensed under GPLv2+
|
|
|
|
// Refer to the license.txt file included.
|
|
|
|
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
#include <algorithm>
|
|
|
|
#include <array>
|
2016-05-26 12:42:07 -05:00
|
|
|
#include <cstring>
|
|
|
|
|
2015-09-19 04:40:00 +12:00
|
|
|
#include "Common/GL/GLUtil.h"
|
|
|
|
|
2014-11-13 23:26:49 +01:00
|
|
|
#include "VideoBackends/OGL/BoundingBox.h"
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
#include "VideoBackends/OGL/FramebufferManager.h"
|
2014-11-13 23:26:49 +01:00
|
|
|
|
2016-05-11 22:19:59 +10:00
|
|
|
#include "VideoCommon/DriverDetails.h"
|
2014-11-13 23:26:49 +01:00
|
|
|
#include "VideoCommon/VideoConfig.h"
|
|
|
|
|
|
|
|
static GLuint s_bbox_buffer_id;
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
static GLuint s_pbo;
|
|
|
|
|
|
|
|
static std::array<int, 4> s_stencil_bounds;
|
|
|
|
static bool s_stencil_updated;
|
|
|
|
static bool s_stencil_cleared;
|
|
|
|
|
|
|
|
static int s_target_width;
|
|
|
|
static int s_target_height;
|
2014-11-13 23:26:49 +01:00
|
|
|
|
|
|
|
namespace OGL
|
|
|
|
{
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
void BoundingBox::SetTargetSizeChanged(int target_width, int target_height)
|
|
|
|
{
|
|
|
|
if (g_ActiveConfig.backend_info.bSupportsFragmentStoresAndAtomics)
|
|
|
|
return;
|
|
|
|
|
|
|
|
s_target_width = target_width;
|
|
|
|
s_target_height = target_height;
|
|
|
|
s_stencil_updated = false;
|
|
|
|
|
|
|
|
glBindBuffer(GL_PIXEL_PACK_BUFFER, s_pbo);
|
|
|
|
glBufferData(GL_PIXEL_PACK_BUFFER, s_target_width * s_target_height, nullptr, GL_STREAM_READ);
|
|
|
|
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
void BoundingBox::Init(int target_width, int target_height)
|
2014-11-13 23:26:49 +01:00
|
|
|
{
|
2017-03-05 15:17:54 -08:00
|
|
|
if (g_ActiveConfig.backend_info.bSupportsFragmentStoresAndAtomics)
|
2016-06-24 10:43:46 +02:00
|
|
|
{
|
|
|
|
int initial_values[4] = {0, 0, 0, 0};
|
|
|
|
glGenBuffers(1, &s_bbox_buffer_id);
|
|
|
|
glBindBuffer(GL_SHADER_STORAGE_BUFFER, s_bbox_buffer_id);
|
|
|
|
glBufferData(GL_SHADER_STORAGE_BUFFER, 4 * sizeof(s32), initial_values, GL_DYNAMIC_DRAW);
|
2016-08-13 00:40:18 +10:00
|
|
|
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, s_bbox_buffer_id);
|
2016-06-24 10:43:46 +02:00
|
|
|
}
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
else
|
|
|
|
{
|
|
|
|
s_stencil_bounds = {{0, 0, 0, 0}};
|
|
|
|
glGenBuffers(1, &s_pbo);
|
|
|
|
SetTargetSizeChanged(target_width, target_height);
|
|
|
|
}
|
2014-11-13 23:26:49 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
void BoundingBox::Shutdown()
|
|
|
|
{
|
2017-03-05 15:17:54 -08:00
|
|
|
if (g_ActiveConfig.backend_info.bSupportsFragmentStoresAndAtomics)
|
|
|
|
{
|
2016-06-24 10:43:46 +02:00
|
|
|
glDeleteBuffers(1, &s_bbox_buffer_id);
|
2017-03-05 15:17:54 -08:00
|
|
|
}
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
else
|
|
|
|
{
|
|
|
|
glDeleteBuffers(1, &s_pbo);
|
|
|
|
}
|
2014-11-13 23:26:49 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
void BoundingBox::Set(int index, int value)
|
|
|
|
{
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
if (g_ActiveConfig.backend_info.bSupportsFragmentStoresAndAtomics)
|
|
|
|
{
|
|
|
|
glBindBuffer(GL_SHADER_STORAGE_BUFFER, s_bbox_buffer_id);
|
|
|
|
glBufferSubData(GL_SHADER_STORAGE_BUFFER, index * sizeof(int), sizeof(int), &value);
|
|
|
|
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
s_stencil_bounds[index] = value;
|
|
|
|
|
|
|
|
if (!s_stencil_cleared)
|
|
|
|
{
|
|
|
|
// Assumes that the EFB framebuffer is currently bound
|
|
|
|
glClearStencil(0);
|
|
|
|
glClear(GL_STENCIL_BUFFER_BIT);
|
|
|
|
s_stencil_updated = false;
|
|
|
|
s_stencil_cleared = true;
|
|
|
|
}
|
|
|
|
}
|
2014-11-13 23:26:49 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
int BoundingBox::Get(int index)
|
|
|
|
{
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
if (g_ActiveConfig.backend_info.bSupportsFragmentStoresAndAtomics)
|
2016-06-24 10:43:46 +02:00
|
|
|
{
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
int data = 0;
|
|
|
|
glBindBuffer(GL_SHADER_STORAGE_BUFFER, s_bbox_buffer_id);
|
|
|
|
if (!DriverDetails::HasBug(DriverDetails::BUG_SLOW_GETBUFFERSUBDATA))
|
|
|
|
{
|
|
|
|
// Using glMapBufferRange to read back the contents of the SSBO is extremely slow
|
|
|
|
// on nVidia drivers. This is more noticeable at higher internal resolutions.
|
|
|
|
// Using glGetBufferSubData instead does not seem to exhibit this slowdown.
|
|
|
|
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, index * sizeof(int), sizeof(int), &data);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
// Using glMapBufferRange is faster on AMD cards by a measurable margin.
|
|
|
|
void* ptr = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, index * sizeof(int), sizeof(int),
|
|
|
|
GL_MAP_READ_BIT);
|
|
|
|
if (ptr)
|
|
|
|
{
|
|
|
|
memcpy(&data, ptr, sizeof(int));
|
|
|
|
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
|
|
|
|
return data;
|
2016-06-24 10:43:46 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
if (s_stencil_updated)
|
2016-06-24 10:43:46 +02:00
|
|
|
{
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
s_stencil_updated = false;
|
|
|
|
|
|
|
|
FramebufferManager::ResolveEFBStencilTexture();
|
|
|
|
glBindFramebuffer(GL_READ_FRAMEBUFFER, FramebufferManager::GetResolvedFramebuffer());
|
|
|
|
glBindBuffer(GL_PIXEL_PACK_BUFFER, s_pbo);
|
|
|
|
glPixelStorei(GL_PACK_ALIGNMENT, 1);
|
|
|
|
glReadPixels(0, 0, s_target_width, s_target_height, GL_STENCIL_INDEX, GL_UNSIGNED_BYTE, 0);
|
|
|
|
glBindFramebuffer(GL_READ_FRAMEBUFFER, FramebufferManager::GetEFBFramebuffer());
|
|
|
|
|
|
|
|
// Eke every bit of performance out of the compiler that we can
|
|
|
|
std::array<int, 4> bounds = s_stencil_bounds;
|
|
|
|
|
|
|
|
u8* data = static_cast<u8*>(glMapBufferRange(
|
|
|
|
GL_PIXEL_PACK_BUFFER, 0, s_target_height * s_target_width, GL_MAP_READ_BIT));
|
|
|
|
|
|
|
|
for (int row = 0; row < s_target_height; row++)
|
|
|
|
{
|
|
|
|
for (int col = 0; col < s_target_width; col++)
|
|
|
|
{
|
|
|
|
if (data[row * s_target_width + col] == 0)
|
|
|
|
continue;
|
|
|
|
bounds[0] = std::min(bounds[0], col);
|
|
|
|
bounds[1] = std::max(bounds[1], col);
|
|
|
|
bounds[2] = std::min(bounds[2], row);
|
|
|
|
bounds[3] = std::max(bounds[3], row);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
s_stencil_bounds = bounds;
|
|
|
|
|
|
|
|
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
|
|
|
|
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
|
2016-06-24 10:43:46 +02:00
|
|
|
}
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
|
|
|
|
return s_stencil_bounds[index];
|
2016-06-24 10:43:46 +02:00
|
|
|
}
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
}
|
2016-05-11 22:19:59 +10:00
|
|
|
|
OGL: implement Bounding Box on systems w/o SSBO
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
2017-03-05 15:34:30 -08:00
|
|
|
void BoundingBox::StencilWasUpdated()
|
|
|
|
{
|
|
|
|
s_stencil_updated = true;
|
|
|
|
s_stencil_cleared = false;
|
2014-11-13 23:26:49 +01:00
|
|
|
}
|
|
|
|
};
|