How can I apply a PSSM efficiently?

Posted by flies on Stack Overflow See other posts from Stack Overflow or by flies
Published on 2012-05-29T15:02:33Z Indexed on 2012/06/28 21:16 UTC
Read the original article Hit count: 367

Filed under:

I am fitting for position specific scoring matrices (PSSM aka Position Specific Weight Matrices). The fit I'm using is like simulated annealing, where I the perturb the PSSM, compare the prediction to experiment and accept the change if it improves agreement. This means I apply the PSSM millions of times per fit; performance is critical. In my particular problem, I'm applying a PSSM for an object of length L (~8 bp) at every position of a DNA sequence of length M (~30 bp) (so there are M-L+1 valid positions).

I need an efficient algorithm to apply a PSSM. Can anyone help improve performance?

My best idea is to convert the DNA into some kind of a matrix so that applying the PSSM is matrix multiplication. There are efficient linear algebra libraries out there (e.g. BLAS), but I'm not sure how best to turn an M-length DNA sequence into a matrix M x 4 matrix and then apply the PSSM at each position. The solution needs to work for higher order/dinucleotide terms in the PSSM - presumably this means representing the sequence-matrix for mono-nucleotides and separately for dinucleotides.

My current solution iterates over each position m, then over each letter in word from m to m+L-1, adding the corresponding term in the matrix. I'm storing the matrix as a multi-dimensional STL vector, and profiling has revealed that a lot of the computation time is just accessing the elements of the PSSM (with similar performance bottlenecks accessing the DNA sequence).

If someone has an idea besides matrix multiplication, I'm all ears.

Developer IT

How can I apply a PSSM efficiently? - Developer IT

How can I apply a PSSM efficiently?

c++

Performance

bioinformatics

blas

gsl

Related posts about c++

C++ : C++ Primer (Stanley Lipmann) or The C++ programming language (special edition)

Which C++ book shold I get between "C++ Primer" vs "C++ Primer Plus"

Managed c++ std::string not accessible in unmanaged c++

I need help on my C++ assignment using MS Visual C++

The Definitive C++ Book Guide and List

Related posts about Performance

Improving VPN performance - stronger encryption = more performance?

Inaccurate performance counter timer values in Windows Performance Monitor

Excel-based Performance Reviews transformed into Web Application for Performance Management

How to save a perfmon Performance Counter as a textfile (Reliability and Performance Monitor Version

SQLAuthority News – A Successful Performance Tuning Seminar at Pune – Dec 4-5, 2010

Categories cloud