Shortest Common Superstring: find shortest string that contains all given string fragments

Posted by occulus on Programmers See other posts from Programmers or by occulus
Published on 2012-09-25T10:52:46Z Indexed on 2012/09/25 15:49 UTC
Read the original article Hit count: 281

Filed under:
|

Given some string fragments, I would like to find the shortest possible single string ("output string") that contains all the fragments. Fragments can overlap each other in the output string.

Example:

For the string fragments:

BCDA
AGF
ABC

The following output string contains all fragments, and was made by naive appending:

BCDAAGFABC

However this output string is better (shorter), as it employs overlaps:

ABCDAGF
^
ABC
 ^
 BCDA
    ^ 
    AGF

I'm looking for algorithms for this problem. It's not absolutely important to find the strictly shortest output string, but the shorter the better. I'm looking for an algorithm better than the obvious naive one that would try appending all permutations of the input fragments and removing overlaps (which would appear to be NP-Complete).

I've started work on a solution and it's proving quite interesting; I'd like to see what other people might come up with. I'll add my work-in-progress to this question in a while.

© Programmers or respective owner

Related posts about algorithms

Related posts about strings