Find the set of largest contiguous rectangles to cover multiple areas
- by joelpt
I'm working on a tool called Quickfort for the game Dwarf Fortress. Quickfort turns spreadsheets in csv/xls format into a series of commands for Dwarf Fortress to carry out in order to plot a "blueprint" within the game.
I am currently trying to optimally solve an area-plotting problem for the 2.0 release of this tool.
Consider the following "blueprint" which defines plotting commands for a 2-dimensional grid. Each cell in the grid should either be dug out ("d"), channeled ("c"), or left unplotted ("."). Any number of distinct plotting commands might be present in actual usage.
. d . d c c
d d d d c c
. d d d . c
d d d d d c
. d . d d c
To minimize the number of instructions that need to be sent to Dwarf Fortress, I would like to find the set of largest contiguous rectangles that can be formed to completely cover, or "plot", all of the plottable cells. To be valid, all of a given rectangle's cells must contain the same command. 
This is a faster approach than Quickfort 1.0 took: plotting every cell individually as a 1x1 rectangle.  This video shows the performance difference between the two versions.
For the above blueprint, the solution looks like this:
. 9 . 0 3 2
8 1 1 1 3 2
. 1 1 1 . 2
7 1 1 1 4 2
. 6 . 5 4 2
Each same-numbered rectangle above denotes a contiguous rectangle. The largest rectangles take precedence over smaller rectangles that could also be formed in their areas. The order of the numbering/rectangles is unimportant.
My current approach is iterative. In each iteration, I build a list of the largest rectangles that could be formed from each of the grid's plottable cells by extending in all 4 directions from the cell. After sorting the list largest first, I begin with the largest rectangle found, mark its underlying cells as "plotted", and record the rectangle in a list. Before plotting each rectangle, its underlying cells are checked to ensure they are not yet plotted (overlapping a previous plot). We then start again, finding the largest remaining rectangles that can be formed and plotting them until all cells have been plotted as part of some rectangle.
I consider this approach slightly more optimized than a dumb brute-force search, but I am wasting a lot of cycles (re)calculating cells' largest rectangles and checking underlying cells' states.
Currently, this rectangle-discovery routine takes the lion's share of the total runtime of the tool, especially for large blueprints. I have sacrificed some accuracy for the sake of speed by only considering rectangles from cells which appear to form a rectangle's corner (determined using some neighboring-cell heuristics which aren't always correct). As a result of this 'optimization', my current code doesn't actually generate the above solution correctly, but it's close enough.
More broadly, I consider the goal of largest-rectangles-first to be a "good enough" approach for this application. However I observe that if the goal is instead to find the minimum set (fewest number) of rectangles to completely cover multiple areas, the solution would look like this instead:
. 3 . 5 6 8
1 3 4 5 6 8
. 3 4 5 . 8
2 3 4 5 7 8
. 3 . 5 7 8
This second goal actually represents a more optimal solution to the problem, as fewer rectangles usually means fewer commands sent to Dwarf Fortress. However, this approach strikes me as closer to NP-Hard, based on my limited math knowledge.
Watch the video if you'd like to better understand the overall strategy; I have not addressed other aspects of Quickfort's process, such as finding the shortest cursor-path that plots all rectangles. Possibly there is a solution to this problem that coherently combines these multiple strategies.
Help of any form would be appreciated.