Diff Algorithm

Posted by Daniel Magliola on Stack Overflow See other posts from Stack Overflow or by Daniel Magliola
Published on 2009-04-30T06:37:33Z Indexed on 2010/04/28 21:57 UTC
Read the original article Hit count: 445

Filed under:
|
|
|

I've been looking like crazy for an explanation of a diff algorithm that works and is efficient.

The closest I got is this link to RFC 3284 (from several Eric Sink blog posts), which describes in perfectly understandable terms the data format in which the diff results are stored. However, it has no mention whatsoever as to how a program would reach these results while doing a diff.

I'm trying to research this out of personal curiosity, because I'm sure there must be tradeoffs when implementing a diff algorithm, which are pretty clear sometimes when you look at diffs and wonder "why did the diff program chose this as a change instead of that?"...

Does anyone know where I can find a description of an efficient algorithm that'd end up outputting VCDIFF?
By the way, if you happen to find a description of the actual algorithm used by SourceGear's DiffMerge, that'd be even better.

NOTE: longest common subsequence doesn't seem to be the algorithm used by VCDIFF, it looks like they're doing something smarter, given the data format they use.

Thanks!

© Stack Overflow or respective owner

Related posts about diff

Related posts about algorithm