Quest for max performance on 5.4.0 server with Pentium 4

Post Reply
User avatar
Minix
Member
Posts: 146
Joined: Thu Nov 12, 2020 13:51
In-game: Minix

Quest for max performance on 5.4.0 server with Pentium 4

by Minix » Post

Ok, I am really tired, I have been compiling and benchmarking minetest for the last 13 hours, it was fun though, and I got very interesting results, so I guess it would nice to share them. This is going to be a long post, so get ready.

A few days ago I was reading about compiler optimization flags and architecture specific optimizations on gcc, and I came up with the idea of trying to squeeze out the maximum performace out of my pentium 4 server by compiling minetest with custom settings, since I have been running a dev version on my server and Debian has not updated minetest to 5.4.0 on their repositories, I thought it was a good idea.

To begin with, these are my specs:

Hardware:
CPU: Intel Pentium 4 HT 524 3066 MHz 1 MiB L2 Cache x86_64 Hyperthreaded 84 W TDP (not sure if it is a 524 though, but it is very likely)
RAM: 1220 MiB

Software: GNU/Linux Debian 10 amd64 using gcc 8.3.0 and g++ 8.3.0

First, I benchmarked my dev build of commit d1ec5117d9095c75aca26a98690e4fcc5385e98c, the one used on the announcement of 5.4.0 reaching feature freeze. This build was compiled with the same OS running on a VirtualBox virtual machine under a Core 2 Duo machine, client build disabled, spatialindex support enabled and I am pretty sure I forgot to install the libraries for gettext so that is not enabled. My goal was to compare hyperthreading enabled vs disabled, these are the results of running

Code: Select all

$ time minetestserver --run-unittests
5 times and averaging the user value:

Code: Select all

3.5246 s average (HT on)
3.4804 s average (HT off)
And after that I performed a map generation test, the conditions for all subsequent map generation tests are the following unless otherwise noted:

Code: Select all

static_spawnpoint= 0,0,0
mapgen v7 default flags
seed: testserver
Procedure: wait for server to initialize and idle, then join, wait for server to idle, execute /emergeblocks here 50, log results, delete world folder and repeat
mods:
airtanks
ambience
backpacks
basic_materials
bonemeal
border
clean
commons
compassgps
craftguide
dfcaverns
i3
3darmor
dmobs
doc
doors
ethereal
farming
footprints
hangglider
hazmat_suit
hbhunger_1.0.1
hot_air_balloons
hudbars_2.3.2
lucky_block
magma_conduits
mapgen_helper
mesecons
mob_horse
mobs_animal
mobs_monster
mobs_redo
mobs_water
moreblocks
moreores
multi_ip
orienteering_1.6
painting
pipeworks
radiant_damage
rangedweapons
redef
regional_weather_modpack
ropes
subterrane
technic
travelnet
vote
wardrobe
wielded_light
xpanes
Out of all those mods the ones that matter the most are technic, dfcaverns and ethereal since they are related to mapgen. The reason I am using all those mods is because I am targeting the specific configuration of my server.

The following results are the average of three runs:

Code: Select all

55619 ms average (HT on)
50261 ms average (HT off)
Both of this benchmarks make it clear that hyperthreading hurts minetest's performance, which makes sense considering that most of the game is single-threaded and the mapgen I am using is heavily based on lua which runs in a single thread as well.

So for this reason, all benchmakrs from now on are run without hyperthreading.

Now here comes the gcc/g++ stuff.

In total I compiled minetest 5.4.0 release version (commit f3e51dca155ce1d1062a339cf925f41d7c751df8) 5 times, the average compilation time on this CPU is 23 minutes, everytime I compiled it I swear I felt the room was getting hotter.

The minetest_game version used was commit 0a90bd8a0ec530f48e1bd9a438e24bd85cc9cd66.

All builds I made have spatialindex support and the only database backend is sqlite3.

These are the settings for all my setups and their respective benchmarks:

Normal minetest 5.4.0

Code: Select all

$ cmake . -DRUN_IN_PLACE=TRUE -DBUILD_SERVER=TRUE -DBUILD_CLIENT=FALSE -DIRRLICHT_INCLUDE_DIR=path

Code: Select all

3.468 s average unittests
52157 ms average mapgen test
Normal minetest 5.4.0 without gettext

Code: Select all

[code]$ cmake . -DRUN_IN_PLACE=TRUE -DBUILD_SERVER=TRUE -DBUILD_CLIENT=FALSE -DIRRLICHT_INCLUDE_DIR=path
Only difference on compilation is that I uninstalled gettext before compiling

Code: Select all

3.4652 s average unittests
48619 ms average mapgen test
Size optimized minetest 5.4.0

Code: Select all

[code]$ cmake . -DRUN_IN_PLACE=TRUE -DBUILD_SERVER=TRUE -DBUILD_CLIENT=FALSE -DIRRLICHT_INCLUDE_DIR=path -DCMAKE_BUILD_TYPE=MinSizeRel
Binary size was 3.9 MiB instead of the typical 6.3 MiB, this build has gettext support.

Code: Select all

19.1106 s average unittests
59755 ms executed only once mapgen test
Pentium 4 optimized minetest 5.4.0

To optimize for this CPU I used the same cmake flags

Code: Select all

$ cmake . -DRUN_IN_PLACE=TRUE -DBUILD_SERVER=TRUE -DBUILD_CLIENT=FALSE -DIRRLICHT_INCLUDE_DIR=path
Then, after the CMakeFiles have been generated I went to src/CMakeFiles/minetestserver.dir/flags.make and added -march=native to C_FLAGS and CXX_FLAGS, I noticed that -O3, which is the maximum optimization level was already there so I left it at that value.

The size of the minetestserver binary was 6.2 MiB for this one and the next build.

Code: Select all

3.7758 s average unittests
48673 ms average mapgen test
So far this was the best result I got with 5.4.0, it seems like the architecture optimization worked. Even though the unittests is worse, the real world performance is better, weird.

Pentium 4 optimized minetest 5.4.0 without gettext

Same flags

Code: Select all

$ cmake . -DRUN_IN_PLACE=TRUE -DBUILD_SERVER=TRUE -DBUILD_CLIENT=FALSE -DIRRLICHT_INCLUDE_DIR=path
Same procedure for architecture optimization, except I now had uninstalled gettext.

Code: Select all

3.7832 s average unittests
50554 ms average mapgen test
Minetest 5.3.0 from Debian 10 backports

This is the version which comes as the minetest-server package.

Code: Select all

30710 ms one run mapgen test
I did not bother to try the unittests but it is obvious this version performs way better at map generation at least, keep in mind I removed i3 and 3darmor because they are not compatible with this version.

UPDATE: I tried a new configuration which is the best so far

Pentium 4 optimized with gettext and native luajit and jsoncpp libraries Minetest 5.4.0

This time I forced the compiler to use jsoncpp from the system and installed the libraries for luajit and the size of the minetestserver binary is 5.4 MiB, perfect.

Code: Select all

cmake . -DRUN_IN_PLACE=TRUE -DBUILD_SERVER=TRUE -DBUILD_CLIENT=FALSE -DIRRLICHT_INCLUDE_DIR=path -DENABLE_SYSTEM_JSONCPP=ON
And the benchmarks are awesome

Code: Select all

29336 ms average mapgen test
So far this even beats minetest 5.3.0 from debian repositories

So this is all, this is my best try without hyperthreading, I will be using this on my server for now, it gives me 90% more performance over my dev build running hyperthreading, so it is really worth it. The reason I was fiddling around with gettext is because I was trying to figure out why my dev build performed better than my normal 5.4.0 release build, and it seems that disabling it improves performance, but I would not be so sure, if someone, maybe a core dev, knows if gettext internationalisation support is really needed on a server build please let me know.

I tried to be as specific as possible so that if you want to compare with my results you can perform your own tests, however, this is not an appropiate test because it involves many specific mods, it would be best to try with only devtest or minetest_game if we wanted to make an official benchmark database, which is not the goal of this post.

User avatar
Festus1965
Member
Posts: 4181
Joined: Sun Jan 03, 2016 11:58
GitHub: Festus1965
In-game: Festus1965 Thomas Thailand Explorer
Location: Thailand ChiangMai
Contact:

Re: Quest for max performance on 5.4.0 server with Pentium 4

by Festus1965 » Post

this tests might be good for a starting, basic forward look
but this doesn't tell much about its real performance with a few real gamer on ... and there are three things counting: (I know so far)
* CPU Speed (good to have 2 maybe better 4 of them)
* a lot of Memory to keep the actual needed data fast accessible
* HDD, better SSD to get new needed data fast to work
and at the end, with mods it changes all, and the world need to be attractive that gamer join and like it - otherwise an empty server might not need any performance ...
Human has no future (climate change)
If urgend, you find me in Roblox (as CNXThomas)

Post Reply

Who is online

Users browsing this forum: No registered users and 10 guests