Early CPU-side client mapblock backface culling

For people working on the C++ code.
Post Reply
User avatar
Desour
Member
Posts: 1469
Joined: Thu Jun 19, 2014 19:49
GitHub: Desour
IRC: Desour
In-game: DS
Location: I'm scared that if this is too exact, I will be unable to use my keyboard.

Early CPU-side client mapblock backface culling

by Desour » Post

TL;DR: It seems useless.

Code:
https://github.com/Desour/minetest/tree/backface_cull

The idea:

Most node faces are axis aligned (what I meant with that here is, that they point in either -x,+x,... or +z direction, and have backface-culling enabled. Also, vertices are never shared between different tiles (=sides of nodes, basically).

This means, we can group the faces in a mapblock in corresponding meshes. And we don't have to render the +x mesh if the mapblock that the camera is in is strictly in -x direction (that is, it's x coord is smaller and not equal), for example.
(This also works with shadow mapping. (We could even cull away the sides that are perpendicular to the sun direction.))

Results:

I've tried testing the performance by placing myself in the air with very high view range (such that I render all blocks possibly sent by the server). Then I started the game and didn't move the camera direction. I did this both in master and with the patch applied.

The fps were much lower with the patch than in master.
I thought this might be because there are now about (=in the same order of magnitude as) 4 times as many mesh buffers to draw. So, I tried making meshes for all possible side-combinations (indexed by a bit-mask) (Yes, there are many, and this causes massive amounts of memory overhead, but I wanted to try it.), and choosing the one correct mesh when rendering.

The result was even slower. But I noticed that the water looked way too opaque. I've drawn all semi-transparent faces, but duplicated.
After fixing this bug (just a quick ugly fix), I've reached exactly the same fps as in master. So it was completely useless.

Conclusions:

If I'm not wrong, the results show that the bottlenecks of mapblock rendering do (on my machine at least) not include the number of vertices (and hence vertex shader invocations). It's rather just the number of draw calls, and buffer binds and co..

(offtopic begins here)
This can also be seen in via the following scenario:
screenshot_20220820_193316.png
screenshot_20220820_193316.png (125.45 KiB) Viewed 2272 times
screenshot_20220820_193313.png
screenshot_20220820_193313.png (130.5 KiB) Viewed 2272 times
One quarter is wood and stone arranged such that fast faces don't work. Then there's one quarter with wood and stone with fully sized fast faces. And there's a quarter with a mix of nodes, where always 8 of the same are in a row, creating half-of-max fast faces.
The non-fast-face quarter is a bit slower, but the mixed quarter is drastically slower.
(ROllerozxa also once made a short video about this, but interpreted the results wrongly (praising fastfaces): https://www.youtube.com/watch?v=nQVMtMybmbw)

What can we do? (aka. More offtopic)

Cpu-side backface culling might be useful at some point in the future, maybe, but currently not.

For better performance, I'd propose to store the texture to use in the vertex attributes as an index (=> one mesh buffer (I'd say "VAO", but irrlicht doesn't use them yet, so that would be imprecise.) per shader type and map block).
AFAIK, this doesn't work with array of samplers, see https://www.khronos.org/opengl/wiki/Dat ... que_arrays. And using a large sequence of if-elses (or switch-case) might end up in sampling each texture in each fragment (depending on whether the driver and gpu can optimize when a condition is true/false in the whole shader invocation group / wavefront / whatever, and on the number of textures used per screen space).
It could, however, be done with array textures (basically texture atlases without the difficulties and bugs). This requires at least OpenGL 3.0.
(Bindless textures would also work, but they are not available on many gpus.)
he/him; Codeberg; GitHub; ContentDB; public personal TODO list; "DS" is preferred (but often too short)

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest