I am trying to get minetest to run on a raspi 3 at an acceptable framerate. Doing a gprof run it seems that a considerable amount of time is spent in updateAllFastFaceRows, which is in mablock_mesh.cpp and is called as part of the MapBlockMesh constructor.
I tried a somewhat naive approach of multithreading the 3 loops in the method which sequentially process 16x16 faces via updateFastFaceRow. That is, assigning each of the loops that look like this:
Code: Select all
for(s16 y = 0; y < MAP_BLOCKSIZE; y++) {
for(s16 z = 0; z < MAP_BLOCKSIZE; z++) {
updateFastFaceRow(data,
v3s16(0,y,z),
v3s16(1,0,0), //dir
v3f (1,0,0),
v3s16(0,1,0), //face dir
v3f (0,1,0),
dest);
}
}
I also tried another apporach, where I make each MapBlockMesh instantiation async / detached. That is, for the mesh update loop:
Code: Select all
void MeshUpdateThread::doUpdate()
{
QueuedMeshUpdate *q;
while ((q = m_queue_in.pop())) {
if (m_generation_interval)
sleep_ms(m_generation_interval);
ScopeProfiler sp(g_profiler, "Client: Mesh making");
MapBlockMesh *mesh_new = new MapBlockMesh(q->data, m_camera_offset);
MeshUpdateResult r;
r.p = q->p;
r.mesh = mesh_new;
r.ack_block_to_server = q->ack_block_to_server;
m_queue_out.push_back(r);
delete q;
}
}
Code: Select all
void MeshUpdateThread::doUpdate()
{
QueuedMeshUpdate *q;
while ((q = m_queue_in.pop())) {
if (m_generation_interval)
sleep_ms(m_generation_interval);
ScopeProfiler sp(g_profiler, "Client: Mesh making");
MapBlockMesh::createMesh(q->data, m_camera_offset, [=](MapBlockMesh* mesh_new) {
MeshUpdateResult r;
r.p = q->p;
r.mesh = mesh_new;
r.ack_block_to_server = q->ack_block_to_server;
m_queue_out.push_back(r);
delete q;
});
}
}
I am wondering if anyone:
1. knows if the mesh creation is actually cpu bound or it really is gpu bound; gprof only tells me cpu time and I can confirm that commenting out updateAllFastFaceRows does indeed bump up the framerate by aprox 40%.
2. Has any ideas or hints on what might I be able to optimize.