Search Results

Search found 1 results on 1 pages for 'user2964780'.

Page 1/1 | 1 

  • String manipulation appears to be inefficient

    - by user2964780
    I think my code is too inefficient. I'm guessing it has something to do with using strings, though I'm unsure. Here is the code: genome = FASTAdata[1] genomeLength = len(genome); # Hash table holding all the k-mers we will come across kmers = dict() # We go through all the possible k-mers by index for outer in range (0, genomeLength-1): for inner in range (outer+2, outer+22): substring = genome[outer:inner] if substring in kmers: # if we already have this substring on record, increase its value (count of num of appearances) by 1 kmers[substring] += 1 else: kmers[substring] = 1 # otherwise record that it's here once This is to search through all substrings of length at most 20. Now this code seems to take pretty forever and never terminate, so something has to be wrong here. Is using [:] on strings causing the huge overhead? And if so, what can I replace it with? And for clarity the file in question is nearly 200mb, so pretty big.

    Read the article

1