Search Results

Search found 58 results on 3 pages for 'user248237'.

Page 1/3 | 1 2 3  | Next Page >

  • vectorizing a for loop in numpy/scipy?

    - by user248237
    I'm trying to vectorize a for loop that I have inside of a class method. The for loop has the following form: it iterates through a bunch of points and depending on whether a certain variable (called "self.condition_met" below) is true, calls a pair of functions on the point, and adds the result to a list. Each point here is an element in a vector of lists, i.e. a data structure that looks like array([[1,2,3], [4,5,6], ...]). Here is the problematic function: def myClass: def my_inefficient_method(self): final_vector = [] # Assume 'my_vector' and 'my_other_vector' are defined numpy arrays for point in all_points: if not self.condition_met: a = self.my_func1(point, my_vector) b = self.my_func2(point, my_other_vector) else: a = self.my_func3(point, my_vector) b = self.my_func4(point, my_other_vector) c = a + b final_vector.append(c) # Choose random element from resulting vector 'final_vector' self.condition_met is set before my_inefficient_method is called, so it seems unnecessary to check it each time, but I am not sure how to better write this. Since there are no destructive operations here it is seems like I could rewrite this entire thing as a vectorized operation -- is that possible? any ideas how to do this?

    Read the article

  • Specifying formatting for csv.writer in Python

    - by user248237
    I am using csv.DictWriter to output csv files from a set of dictionaries. I use the following function: def dictlist2file(dictrows, filename, fieldnames, delimiter='\t', lineterminator='\n'): out_f = open(filename, 'w') # Write out header header = delimiter.join(fieldnames) + lineterminator out_f.write(header) # Write out dictionary data = csv.DictWriter(out_f, fieldnames, delimiter=delimiter, lineterminator=lineterminator) data.writerows(dictrows) out_f.close() where dictrows is a list of dictionaries, and fieldnames provides the headers that should be serialized to file. Some of the values in my dictionary list (dictrows) are numeric -- e.g. floats, and I'd like to specify the formatting of these. For example, I might want floats to be serialized with "%.2f" rather than full precision. Ideally, I'd like to specify some kind of mapping that says how to format each type, e.g. {float: "%.2f"} that says that if you see a float, format it with %.2f. Is there an easy way to do this? I don't want to subclass DictWriter or anything complicated like that -- this seems like very generic functionality. How can this be done? The only other solution I can think of is: instead of messing with the formatting of DictWriter, just use the decimal package to specify the decimal precision of floats to be %.2 which will cause to be serialized as such. Don't know if this is a better solution? thanks very much for your help.

    Read the article

  • error when plotting log'd array in matplotlib/scipy/numpy

    - by user248237
    I have two arrays and I take their logs. When I do that and try to plot their scatter plot, I get this error: File "/Library/Python/2.6/site-packages/matplotlib-1.0.svn_r7892-py2.6-macosx-10.6-universal.egg/matplotlib/pyplot.py", line 2192, in scatter ret = ax.scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, faceted, verts, **kwargs) File "/Library/Python/2.6/site-packages/matplotlib-1.0.svn_r7892-py2.6-macosx-10.6-universal.egg/matplotlib/axes.py", line 5384, in scatter self.add_collection(collection) File "/Library/Python/2.6/site-packages/matplotlib-1.0.svn_r7892-py2.6-macosx-10.6-universal.egg/matplotlib/axes.py", line 1391, in add_collection self.update_datalim(collection.get_datalim(self.transData)) File "/Library/Python/2.6/site-packages/matplotlib-1.0.svn_r7892-py2.6-macosx-10.6-universal.egg/matplotlib/collections.py", line 153, in get_datalim offsets = transOffset.transform_non_affine(offsets) File "/Library/Python/2.6/site-packages/matplotlib-1.0.svn_r7892-py2.6-macosx-10.6-universal.egg/matplotlib/transforms.py", line 1924, in transform_non_affine self._a.transform(points)) File "/Library/Python/2.6/site-packages/matplotlib-1.0.svn_r7892-py2.6-macosx-10.6-universal.egg/matplotlib/transforms.py", line 1420, in transform return affine_transform(points, mtx) ValueError: Invalid vertices array. the code is simply: myarray_x = log(my_array[:, 0]) myarray_y = log(my_array[:, 1]) plt.scatter(myarray_x, myarray_y) any idea what could be causing this? thanks.

    Read the article

  • Fitting Gaussian KDE in numpy/scipy in Python

    - by user248237
    I am fitting a Gaussian kernel density estimator to a variable that is the difference of two vectors, called "diff", as follows: gaussian_kde_covfact(diff, smoothing_param) -- where gaussian_kde_covfact is defined as: class gaussian_kde_covfact(stats.gaussian_kde): def __init__(self, dataset, covfact = 'scotts'): self.covfact = covfact scipy.stats.gaussian_kde.__init__(self, dataset) def _compute_covariance_(self): '''not used''' self.inv_cov = np.linalg.inv(self.covariance) self._norm_factor = sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n def covariance_factor(self): if self.covfact in ['sc', 'scotts']: return self.scotts_factor() if self.covfact in ['si', 'silverman']: return self.silverman_factor() elif self.covfact: return float(self.covfact) else: raise ValueError, \ 'covariance factor has to be scotts, silverman or a number' def reset_covfact(self, covfact): self.covfact = covfact self.covariance_factor() self._compute_covariance() This works, but there is an edge case where the diff is a vector of all 0s. In that case, I get the error: File "/srv/pkg/python/python-packages/python26/scipy/scipy-0.7.1/lib/python2.6/site-packages/scipy/stats/kde.py", line 334, in _compute_covariance self.inv_cov = linalg.inv(self.covariance) File "/srv/pkg/python/python-packages/python26/scipy/scipy-0.7.1/lib/python2.6/site-packages/scipy/linalg/basic.py", line 382, in inv if info>0: raise LinAlgError, "singular matrix" numpy.linalg.linalg.LinAlgError: singular matrix What's a way to get around this? In this case, I'd like it to return a density that's essentially peaked completely at a difference of 0, with no mass everywhere else. thanks.

    Read the article

  • script unable to find directories/files when running from qsub cluster script

    - by user248237
    I'm calling several unix commands and python on a python script from a qsub shell script, meant to run on a cluster. The trouble is that when the script executes, something seems to go awry in the shell, so that directories and files that exist are not found. For example, in the .out output files of qsub I see the following errors: cd: /valid/dir/name: No such file or directory python valid/script/name.py python: can't open file 'valid/script/name.py': [Errno 2] No such file or directory so the script cannot cd into a dir that definitely exist. Similarly, calling python on a python script that definitely exists yields an error. any idea what might be going wrong here, or how I could try to debug this? thanks very much.

    Read the article

  • removing pairs of elements from numpy arrays that are NaN (or another value) in Python

    - by user248237
    I have an array with two columns in numpy. For example: a = array([[1, 5, nan, 6], [10, 6, 6, nan]]) a = transpose(a) I want to efficiently iterate through the two columns, a[:, 0] and a[:, 1] and remove any pairs that meet a certain condition, in this case if they are NaN. The obvious way I can think of is: new_a = [] for val1, val2 in a: if val2 == nan or val2 == nan: new_a.append([val1, val2]) But that seems clunky. What's the pythonic numpy way of doing this? thanks.

    Read the article

  • elegant way to make directory recursively in Python?

    - by user248237
    I would like to make a directory in a "recursive" way, i.e. have a function make_directory() that behaves as follows: make_directory("a/b/c/d/") should create directory a, then child b, then c, then d. If any of the parent directories exist it should not make them. E.g. if "a" exists, then b should be a subdir of that, and then c should be made inside b, etc. how can I do this in python? os.mkdir does not have this behavior.

    Read the article

  • computing z-scores for 2D matrices in scipy/numpy in Python

    - by user248237
    How can I compute the z-score for matrices in Python? Suppose I have the array: a = array([[ 1, 2, 3], [ 30, 35, 36], [2000, 6000, 8000]]) and I want to compute the z-score for each row. The solution I came up with is: array([zs(item) for item in a]) where zs is in scipy.stats.stats. Is there a better built-in vectorized way to do this? Also, is it always good to z-score numbers before using hierarchical clustering with euclidean or seuclidean distance? Can anyone discuss the relative advantages/disadvantages? thanks.

    Read the article

  • plotting results of hierarchical clustering ontop of a matrix of data in python

    - by user248237
    How can I plot a dendrogram right on top of a matrix of values, reordered appropriately to reflect the clustering, in Python? An example is in the bottom of the following figure: http://www.coriell.org/images/microarray.gif I use scipy.cluster.dendrogram to make my dendrogram and perform hierarchical clustering on a matrix of data. How can I then plot the data as a matrix where the rows have been reordered to reflect a clustering induced by the cutting the dendrogram at a particular threshold, and have the dendrogram plotted alongside the matrix? I know how to plot the dendrogram in scipy, but not how to plot the intensity matrix of data with the right scale bar next to it. Any help on this would be greatly appreciated.

    Read the article

  • Making TeX tables smaller?

    - by user248237
    I have a LaTeX table that looks like this: \begin{table}[!ht] \centering \small \caption{ \bf{Caption}} \begin{tabular}{l|c|c|l|c|c|c|c|c} field1 & field 2 & ... \\ \hline ... the problem is that even with "\small" the table is too big, since I use: \usepackage{setspace} \doublespacing in the header. How can I: 1) Make the table single spaced? and 2) Make the table smaller? I'd like it to fit on an entire page. thanks.

    Read the article

  • Fast JSON serialization (and comparison with Pickle) for cluster computing in Python?

    - by user248237
    I have a set of data points, each described by a dictionary. The processing of each data point is independent and I submit each one as a separate job to a cluster. Each data point has a unique name, and my cluster submission wrapper simply calls a script that takes a data point's name and a file describing all the data points. That script then accesses the data point from the file and performs the computation. Since each job has to load the set of all points only to retrieve the point to be run, I wanted to optimize this step by serializing the file describing the set of points into an easily retrievable format. I tried using JSONpickle, using the following method, to serialize a dictionary describing all the data points to file: def json_serialize(obj, filename, use_jsonpickle=True): f = open(filename, 'w') if use_jsonpickle: import jsonpickle json_obj = jsonpickle.encode(obj) f.write(json_obj) else: simplejson.dump(obj, f, indent=1) f.close() The dictionary contains very simple objects (lists, strings, floats, etc.) and has a total of 54,000 keys. The json file is ~20 Megabytes in size. It takes ~20 seconds to load this file into memory, which seems very slow to me. I switched to using pickle with the same exact object, and found that it generates a file that's about 7.8 megabytes in size, and can be loaded in ~1-2 seconds. This is a significant improvement, but it still seems like loading of a small object (less than 100,000 entries) should be faster. Aside from that, pickle is not human readable, which was the big advantage of JSON for me. Is there a way to use JSON to get similar or better speed ups? If not, do you have other ideas on structuring this? (Is the right solution to simply "slice" the file describing each event into a separate file and pass that on to the script that runs a data point in a cluster job? It seems like that could lead to a proliferation of files). thanks.

    Read the article

  • Counting longest occurence of repeated sequence in Python

    - by user248237
    What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string: my_str = "abcdefgfaabbbffbbbbbbfgbb" would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python? thanks.

    Read the article

  • serializing JSON files with newlines in Python

    - by user248237
    I am using json and jsonpickle sometimes to serialize objects to files, using the following function: def json_serialize(obj, filename, use_jsonpickle=True): f = open(filename, 'w') if use_jsonpickle: import jsonpickle json_obj = jsonpickle.encode(obj) f.write(json_obj) else: simplejson.dump(obj, f) f.close() The problem is that if I serialize a dictionary for example, using "json_serialize(mydict, myfilename)" then the entire serialization gets put on one line. This means that I can't grep the file for entries to be inspected by hand, like I would a CSV file. Is there a way to make it so each element of an object (e.g. each entry in a dict, or each element in a list) is placed on a separate line in the JSON output file? thanks.

    Read the article

  • problem plotting on logscale in matplotlib in python

    - by user248237
    I am trying to plot the following numbers on a log scale as a scatter plot in matplotlib. Both the quantities on the x and y axes have very different scales, and one of the variables has a huge dynamic range (nearly 0 to 12 million roughly) while the other is between nearly 0 and 2. I think it might be good to plot both on a log scale. I tried the following, for a subset of the values of the two variables: fig = plt.figure(figsize(8, 8)) ax = fig.add_subplot(1, 1, 1) ax.set_yscale('log') ax.set_xscale('log') plt.scatter([1.341, 0.1034, 0.6076, 1.4278, 0.0374], [0.37, 0.12, 0.22, 0.4, 0.08]) The x-axes appear log scaled but the points do not appear -- only two points appear. Any idea how to fix this? Also, how can I make this log scale appear on a square axes, so that the correlation between the two variables can be interpreted from the scatter plot? thanks.

    Read the article

  • Elegant way to take basename of directory in Python?

    - by user248237
    I have several scripts that take as input a directory name, and my program creates files in those directories. Sometimes I want to take the basename of a directory given to the program and use it to make various files in the directory. For example, # directory name given by user via command-line output_dir = "..." # obtained by OptParser, for example my_filename = output_dir + '/' + os.path.basename(output_dir) + '.my_program_output' # write stuff to my_filename The problem is that if the user gives a directory name with a trailing slash, then os.path.basename will return the empty string, which is not what I want. What is the most elegant way to deal with these slash/trailing slash issues in python? I know I can manually check for the slash at the end of output_dir and remove it if it's there, but there seems like there should be a better way. Is there? Also, is it OK to manually add '/' characters? E.g. output_dir + '/' os.path.basename() or is there a more generic way to build up paths? Thanks.

    Read the article

  • hierarchical clustering with gene expression matrix in python

    - by user248237
    how can I do a hierarchical clustering (in this case for gene expression data) in Python in a way that shows the matrix of gene expression values along with the dendrogram? What I mean is like the example here: http://www.mathworks.cn/access/helpdesk/help/toolbox/bioinfo/ug/a1060813239b1.html shown after bullet point 6 (Figure 1), where the dendrogram is plotted to the left of the gene expression matrix, where the rows have been reordered to reflect the clustering. How can I do this in Python using numpy/scipy or other tools? Also, is it computationally practical to do this with a matrix of about 11,000 genes, using euclidean distance as a metric? thanks.

    Read the article

  • deciding between subprocess, multiprocesser and thread in Python?

    - by user248237
    I'd like to parallelize my Python program so that it can make use of multiple processors on the machine that it runs on. My parallelization is very simple, in that all the parallel "threads" of the program are independent and write their output to separate files. I don't need the threads to exchange information but it is imperative that I know when the threads finish since some steps of my pipeline depend on their output. Portability is important, in that I'd like this to run on any Python version on Mac, Linux and Windows. Given these constraints, which is the most appropriate Python module for implementing this? I am tryign to decide between thread, subprocess and multiprocessing, which all seem to provide related functionality. Any thoughts on this? I'd like the simplest solution that's portable. Thanks.

    Read the article

  • unevenly centered subplots in matplotlib in Python?

    - by user248237
    I am plotting a simple pair of subplots in matplotlib that are for some reason unevenly centered. I plot them as follows: plt.figure() # first subplot s1 = plt.subplot(2, 1, 1) plt.bar([1, 2, 3], [4, 5, 6]) # second subplot s2 = plt.subplot(2, 1, 2) plt.pcolor(rand(5,5)) # add colorbar plt.colorbar() # square axes axes_square(s1) axes_square(s2) where axes_square is simply: def axes_square(plot_handle): plot_handle.axes.set_aspect(1/plot_handle.axes.get_data_ratio()) The plot I get is attached. The top and bottom plots are unevenly centered. I'd like their yaxis to be aligned and their boxes to be aligned. If I remove the plt.colorbar() call, the plots become centered. How can I have the plots centered while the colorbar of pcolor is still shown? I want the axes to be centered and have the colorbar be outside of that alignment, either to the left or to the right of the pcolor matrix. image of plots link thanks.

    Read the article

  • getting sound on .asf video on Mac OS X

    - by user248237
    When I try to open the following video: http://streaming1.hss-win.rpi.edu/ondemand/cogsci/issues_in_cs/issues_in_cogsci_02_04_2009.asf I can get it but without any sound... I tried it both inside the browser (Firefox) and in QuickTime Media player but still it does not work. Any idea how to get this to work on Mac OS X? Using version 10.6. thank you.

    Read the article

  • Easiest way to automatically download required modules in Python?

    - by user248237
    I would like to release a python module I wrote which depends on several packages. What's the easiest way to make it so these packages are programmatically downloaded just in case they are not available on the system that's being run? Most of these modules should be available by easy_install or pip or something like that. I simply want to avoid having the user install each module separately. thanks.

    Read the article

1 2 3  | Next Page >