seafoid - Developer IT

Python - Nested List to Tab Delimited File?

- by Seafoid

Hi, I have a nested list comprising ~30,000 sub-lists, each with three entries, e.g., nested_list = [['x', 'y', 'z'], ['a', 'b', 'c']]. I wish to create a function in order to output this data construct into a tab delimited format, e.g., x y z a b c Any help greatly appreciated! Thanks in advance, Seafoid.

Read the article

Python - compare nested lists and append matches to new list?

- by Seafoid

Hi, I wish to compare to nested lists of unequal length. I am interested only in a match between the first element of each sub list. Should a match exist, I wish to add the match to another list for subsequent transformation into a tab delimited file. Here is an example of what I am working with: x = [['1', 'a', 'b'], ['2', 'c', 'd']] y = [['1', 'z', 'x'], ['4', 'z', 'x']] match = [] def find_match(): for i in x: for j in y: if i[1] == j[1]: match.append(j) return match This results in a series of empty lists. Is it better to use tuples and/or tuples of tuples for the purposes of comparison? Any help is greatly appreciated. Regards, Seafoid.

Read the article

Python - Calling a non python program from python?

- by Seafoid

Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2 -o3 To date I have tried: import os cwd = os.getcwd() files = os.listdir(cwd) required_files = [] for i in file: if i.endswith('.ttp'): required_files.append(i) So, I have an array of the neccesary files. My problem - how do I iterate over the array and for each entry, pass it to the above command (program_name) as an argument and specify a unique output id for each file? Much appreciated, S :-)

Read the article

Python - calculate multinomial probability density functions on large dataset?

- by Seafoid

Hi, I originally intended to use MATLAB to tackle this problem but the inbuilt functions has limitations that do not suit my goal. The same limitation occurs in NumPy. I have two tab-delimited files. The first is a file showing amino acid residue, frequency and count for an in-house database of protein structures, i.e. A 0.25 1 S 0.25 1 T 0.25 1 P 0.25 1 The second file consists of quadruplets of amino acids and the number of times they occur, i.e. ASTP 1 Note, there are 8,000 such quadruplets. Based on the background frequency of occurence of each amino acid and the count of quadruplets, I aim to calculate the multinomial probability density function for each quadruplet and subsequently use it as the expected value in a maximum likelihood calculation. The multinomial distribution is as follows: f(x|n, p) = n!/(x1!*x2!*...*xk!)*((p1^x1)*(p2^x2)*...*(pk^xk)) where x is the number of each of k outcomes in n trials with fixed probabilities p. n is 4 four in all cases in my calculation. I have created three functions to calculate this distribution. # functions for multinomial distribution def expected_quadruplets(x, y): expected = x*y return expected # calculates the probabilities of occurence raised to the number of occurrences def prod_prob(p1, a, p2, b, p3, c, p4, d): prob_prod = (pow(p1, a))*(pow(p2, b))*(pow(p3, c))*(pow(p4, d)) return prob_prod # factorial() and multinomial_coefficient() work in tandem to calculate C, the multinomial coefficient def factorial(n): if n <= 1: return 1 return n*factorial(n-1) def multinomial_coefficient(a, b, c, d): n = 24.0 multi_coeff = (n/(factorial(a) * factorial(b) * factorial(c) * factorial(d))) return multi_coeff The problem is how best to structure the data in order to tackle the calculation most efficiently, in a manner that I can read (you guys write some cryptic code :-)) and that will not create an overflow or runtime error. To data my data is represented as nested lists. amino_acids = [['A', '0.25', '1'], ['S', '0.25', '1'], ['T', '0.25', '1'], ['P', '0.25', '1']] quadruplets = [['ASTP', '1']] I initially intended calling these functions within a nested for loop but this resulted in runtime errors or overfloe errors. I know that I can reset the recursion limit but I would rather do this more elegantly. I had the following: for i in quadruplets: quad = i[0].split(' ') for j in amino_acids: for k in quadruplets: for v in k: if j[0] == v: multinomial_coefficient(int(j[2]), int(j[2]), int(j[2]), int(j[2])) I haven'te really gotten to how to incorporate the other functions yet. I think that my current nested list arrangement is sub optimal. I wish to compare the each letter within the string 'ASTP' with the first component of each sub list in amino_acids. Where a match exists, I wish to pass the appropriate numeric values to the functions using indices. Is their a better way? Can I append the appropriate numbers for each amino acid and quadruplet to a temporary data structure within a loop, pass this to the functions and clear it for the next iteration? Thanks, S :-)

Read the article

Python - from file to data structure?

- by Seafoid

Hi, I have large file comprising ~100,000 lines. Each line corresponds to a cluster and each entry within each line is a reference i.d. for another file (protein structure in this case), e.g. 1hgn 1dju 3nmj 8kfn 9opu 7gfb 4bui I need to read in the file as a list of lists where each line is a sublist, thus preserving the integrity of the cluster, e.g. nested_list = [['1hgn', '1dju', '3nmj', '8kfn'], ['9opu', '7gfb'], ['4bui']] My current code creates a nested list but the entries within each list are a single string and not comma separated. Therefore, I cannot splice the list with indices so easily. Any help greatly appreciated. Thanks, S :-)

Read the article

Python - output from functions?

- by Seafoid

Hi I have a very rudimentary question. Assume I call a function, e.g., def foo(): x = 'hello world' How do I get the function to return x in such a way that I can use it as the input for another function or use the variable within the body of a program? When I use return and call the variable within another functions I get a NameError. Thanks, S :-)

Read the article

Python - a clean approach to this problem?

- by Seafoid

Hi, I am having trouble picking the best data structure for solving a problem. The problem is as below: I have a nested list of identity codes where the sublists are of varying length. li = [['abc', 'ghi', 'lmn'], ['kop'], ['hgi', 'ghy']] I have a file with two entries on each line; an identity code and a number. abc 2.93 ghi 3.87 lmn 5.96 Each sublist represents a cluster. I wish to select the i.d. from each sublist with the highest number associated with it, append that i.d. to a new list and ultimately write it to a new file. What data structure should the file with numbers be read in as? Also, how would you iterate over said data structure to return the i.d. with the highest number that matches the i.d. within a sublist? Thanks, S :-)

Read the article

Python - alternative to list.remove(x)?

- by Seafoid

Hi, I wish to compare two lists. Generally this is not a problem as I usually use a nested for loop and append the intersection to a new list. In this case, I need to delete the intersection of A and B from A. A = [['ab', 'cd', 'ef', '0', '567'], ['ghy5'], ['pop', 'eye']] B = [['ab'], ['hi'], ['op'], ['ej']] My objective is to compare A and B and delete A intersection B from A, i.e., delete A[0][0] in this case. I tried: def match(): for i in A: for j in i: for k in B: for v in k: if j == v: A.remove(j) list.remove(x) throws a ValueError.

Developer IT