ExeterDad wrote:<-- Snip -->
Let me try to give as much information as I can about the process that I am considering/proposing.
Overview of the process
The basic idea (as I have already explained in brief) is to use the logs in order to reconstruct the positions of enough nodes for it to be possible to match the reconstructed map blocks against the jumbled-up ones in the backup. Much data cannot be reconstructed from the logs as not everything is logged with enough detail (tree growth, crop growth, and flowers spreading to name a few). But hopefully it will be possible to reconstruct enough data in order to find the correct map block from the backup in the areas that matter.
That last part, "in the areas that matter", is important. The more data that's available for a particular area of the map, the more successful the results will be. Areas that have large buildings will be more successful than areas with a few random blocks here and there (which will get lost among all the "noise" created by stuff that isn't logged such as plant growth). "Success" is defined as the likelihood of the correct block being chosen.
Current status
I'd estimate that development work on the tools to carry out the process is at least 50% complete. The tools should be ready to use early next week (and that's allowing for a much-needed break...). So far the results are looking promising - surprising, even. I was able to recover a building that I had significantly "griefed". That's worse than any builds that will be thrown at it by the HOMETOWN dataset, because with the entire logs available the reconstructed copies of most builds should be more-or-less complete.
However, I am not expecting the results to be perfect. Things never work like that. But they should be more than good enough to recover specific builds (but probably not reliable enough to rely on for recovering the entire map without errors). And in the event that an incorrect block is chosen, the results will be noticeably wrong so no more than a quick inspection should be required to confirm success. And even if an error does creep in, it's still better than nothing right? (Errors that are found can be replaced with air, while errors that are missed can be left for the owner of the build to fix up.)
Impact and system requirements
I cannot yet give an estimate for how long the process will take, as my current test datasets have so far been in no way comparable to the scale of HOMETOWN's data, and I am still working on heavily optimising the code.
My current aim is to recover builds based on their co-ordinates instead of attempting to recover the entire map (most of which would go to waste). This will dramatically reduce the workload. If the images from the overhead map are still available, this could be used to determine the locations of builds, otherwise the logs could be used to produce a heat-map of which areas have had the most building activity. It will also be possible to run the recovery process against the saved data that players have contributed, which will allow saved builds to be "updated" (at least to some extent) even without reading through the logs.
32 GB of RAM will be more than enough. 8 cores/threads is on the low side but probably manageable. In terms of disk space, the process will require:
- 85 GB for the jumbled-up backup
- 20 GB for the logs
- As much as required to reconstruct (from the logs) whatever builds should be recovered (this could range from a few hundred megabytes to a few gigabytes depending on the size and number of the builds, and probably won't exceed 50 GB even if every ground-level block was included)
- As much as required to store the recovered builds (about the same size as the previous item)
With careful consideration of what areas of the map are recovered, this could fit in 200 GB. If offline storage is also available, more areas could be recovered in parts (although this will require running the process multiple times against the 85 GB file, which will take long).
It will certainly not be possible to run a HOMETOWN instance during the process, nor would that be desirable (how would the recovered data be integrated with whatever has changed in the meantime?).
Conclusion
I am confident that my recovery process will be able to recover usable (if not 100% perfect) data for builds, particularly larger ones, that have not been saved elsewhere. At the very least, this will give players something to work from with repairing their builds (although I honestly expect the results to be much better than that, although I will continue doing more testing to get a better idea of the success rate). The human interaction required will be minimal. The process will probably take a few days, but hopefully not more than a week (put it this way: if it takes more than a week, I'll call it quits). The server's hardware is not outstanding but should be adequate to do the job (including the disk space, if no other database files are present at the time).