Page 1 of 1

Compressing PostgreSQL dumps with zstd

PostPosted: Sun Jun 09, 2019 11:57
by uwu
If you're running a Minetest server, you're maybe using the Postgres backend for its reliability and speed. You're hopefully a good admin who makes regular dumps of the database. But what is the best way to do so?

Postgres gives us four types of output formats with its built-in pg_dump command:
p (plain)
c (custom)
d (directory)
t (tar)

The second one, custom, sounds like the best one:
Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default.


It's "the most flexible" and who doesn't like flexibility? It's also compressed and we like saving space as well. To do so, it uses zlib, a rock-solid library using the Deflate algorithm also used by gzip and zip. Unfortunately, this algorithm isn't state-of-the-art anymore compared to modern algorithms like zstd. Compared to zstd, zlib is single-threaded which means if you have a nice multicore server, most of your cores will be picking their nose while one is doing all the work. Furthermore, even if running the new zstd in single-threaded mode, it's still way faster than zlib. So how can we take advantage of it?

First of all, create an uncompressed dump using the custom format with pg_dump:
Code: Select all
sudo -u minetest pg_dump -F c --compress=0 coolserver > /home/user/coolserver_current-date.pgsql


Then, compress it with zstd in multithreaded mode and compression level 1:
Code: Select all
nice zstdmt --rsyncable -1 coolserver_current-date.pgsql


"nice" decreases the priority of the zstd process, so it doesn't interfere with normal server operation.

"--rsyncable" makes the compressed file behave nicer with rsync, quote from the github page:
a difference in the input data will only impact a small localized amount of compressed data

It costs almost nothing in compression ratio so I always enable it.

On my 4-core VPS, compression of a small 858 MiB map takes about 3.5 seconds.

We now compressed the Postgres dump with multithreaded zstd at level 1, which is blazing fast. Why just level 1? I don't know the exact reason myself, but I found that the first level will yield excellent compression ratios already compared to higher levels. Only at level 16, which takes huge amounts of time and RAM, it starts being better than level 1 again. I haven't tested custom zstd options yet (there are lots), but I suspect that level 1 somehow works really great with the type of data a Minetest map in PostgreSQL creates.

Image
Yes, level 10 did indeed compress slightly worse than 9 in my test.

Here is a github thread related to this phenomenon:
https://github.com/facebook/zstd/issues/466

Re: Compressing PostgreSQL dumps with zstd

PostPosted: Sun Jun 16, 2019 23:00
by fwhcat
Thanks for that post ! very useful.