How can I keep directories in sync

Posted by Guillaume Boudreau on Programmers See other posts from Programmers or by Guillaume Boudreau
Published on 2011-01-15T12:44:00Z Indexed on 2011/01/15 12:59 UTC
Read the original article Hit count: 289

Filed under:
|

I have a directory, dirA, that users can work in: they can create, modify, rename and delete files & sub-directores in dirA.
I want to keep another directory, dirB, in sync with dirA.

What I'd like, is a discussion on finding a working algorithm that would achieve the above, with the limitations listed below.

Requirements:
1. Something asynchronous - I don't want to stop file operations in dirA while I work in dirB.
2. I can't assume that I can just blindly rsync dirA to dirB on regular interval - dirA could contain millions of files & directories, and terrabytes of data. Completely walking the dirA tree could take hours.

Those two requirements makes this really difficult.
Having it asynchronous means that when I start working on a specific file from dirA, it might have moved a lot since it appeared.
And the second limitation means that I really need to watch dirA, and work on atomic file operations that I notice.

Current (broken) implementation:
1. Log all file & directory operations in dirA.
2. Using a separate process, read that log, and 'repeat' all the logged operations in dirB.

Why is it broken:

echo 1 > dirA/file1
# Allow the 'log reader' process to create dirB/file1:
    log = "write dirA/file1"; action = cp dirA/file1 dirB/file1; result = OK
echo 1 > dirA/file2
mv dirA/file1 dirA/file3
mv dirA/file2 dirA/file1
rm dirA/file3
# End result: file1 contains '1'
# 'log reader' process starts working on the 4 above file operations:
    log = "write file2"; action = cp dirA/file2 dirB/file2; result = failed: there is no dirA/file2
    log = "rename file1 file3"; action = mv dirB/file1 dirB/file3; result = OK
    log = "rename file2 file1"; action = mv dirB/file2 dirB/file1; result = failed: there is no dirB/file2
    log = "delete file3"; action = rm dirB/file3; result = OK
# End result in dirB: no more files!

Another broken example:

echo 1 > dirA/dir1/file1
mv dirA/dir1 dirA/dir2
# 'log reader' process starts working on the 2 above file operations:
    log = "write file1"; action = cp dirA/dir1/file1 dirB/dir1/file1; result = failed: there is no dirA/dir1/file1
    log = "rename dir1 dir2"; action = mv dirB/dir1 dirB/dir2; result = failed: there is no dirA/dir1
# End result if dirB: nothing!

© Programmers or respective owner

Related posts about design

Related posts about algorithms