Search Results

Search found 14007 results on 561 pages for 'python embedding'.

Page 159/561 | < Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166  | Next Page >

  • Efficient file buffering & scanning methods for large files in python

    - by eblume
    The description of the problem I am having is a bit complicated, and I will err on the side of providing more complete information. For the impatient, here is the briefest way I can summarize it: What is the fastest (least execution time) way to split a text file in to ALL (overlapping) substrings of size N (bound N, eg 36) while throwing out newline characters. I am writing a module which parses files in the FASTA ascii-based genome format. These files comprise what is known as the 'hg18' human reference genome, which you can download from the UCSC genome browser (go slugs!) if you like. As you will notice, the genome files are composed of chr[1..22].fa and chr[XY].fa, as well as a set of other small files which are not used in this module. Several modules already exist for parsing FASTA files, such as BioPython's SeqIO. (Sorry, I'd post a link, but I don't have the points to do so yet.) Unfortunately, every module I've been able to find doesn't do the specific operation I am trying to do. My module needs to split the genome data ('CAGTACGTCAGACTATACGGAGCTA' could be a line, for instance) in to every single overlapping N-length substring. Let me give an example using a very small file (the actual chromosome files are between 355 and 20 million characters long) and N=8 import cStringIO example_file = cStringIO.StringIO("""\ header CAGTcag TFgcACF """) for read in parse(example_file): ... print read ... CAGTCAGTF AGTCAGTFG GTCAGTFGC TCAGTFGCA CAGTFGCAC AGTFGCACF The function that I found had the absolute best performance from the methods I could think of is this: def parse(file): size = 8 # of course in my code this is a function argument file.readline() # skip past the header buffer = '' for line in file: buffer += line.rstrip().upper() while len(buffer) = size: yield buffer[:size] buffer = buffer[1:] This works, but unfortunately it still takes about 1.5 hours (see note below) to parse the human genome this way. Perhaps this is the very best I am going to see with this method (a complete code refactor might be in order, but I'd like to avoid it as this approach has some very specific advantages in other areas of the code), but I thought I would turn this over to the community. Thanks! Note, this time includes a lot of extra calculation, such as computing the opposing strand read and doing hashtable lookups on a hash of approximately 5G in size. Post-answer conclusion: It turns out that using fileobj.read() and then manipulating the resulting string (string.replace(), etc.) took relatively little time and memory compared to the remainder of the program, and so I used that approach. Thanks everyone!

    Read the article

  • Python metaprogramming help

    - by Timmy
    im looking into mongoengine, and i wanted to make a class an "EmbeddedDocument" dynamically, so i do this def custom(cls): cls = type( cls.__name__, (EmbeddedDocument,), cls.__dict__.copy() ) cls.a = FloatField(required=True) cls.b = FloatField(required=True) return cls A = custom( A ) and tried it on some classes, but its not doing some of the base class's init or sumthing in BaseDocument def __init__(self, **values): self._data = {} # Assign initial values to instance for attr_name, attr_value in self._fields.items(): if attr_name in values: setattr(self, attr_name, values.pop(attr_name)) else: # Use default value if present value = getattr(self, attr_name, None) setattr(self, attr_name, value) but this never gets used, thus never setting ._data, and giving me errors. how do i do this?

    Read the article

  • Python/YACC Lexer: Token priority?

    - by Rosarch
    I'm trying to use reserved words in my grammar: reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', } tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'ID', ] + list(reserved.values()) t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' if t.value in reserved.values(): t.type = reserved[t.value] return t return None However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.

    Read the article

  • How to print a dictionary in python c api function

    - by dizgam
    PyObject* dict = PyDict_New(); PyDict_SetItem(dict, key, value); PyDict_GetItem(dict, key); Bus error if i use getitem function otherwise not. So Want to confirm that the dictionary has the same values which i have set. Other than using PyDict_GetItem function, Is there any other method to print the values of the dictionary?

    Read the article

  • filtering elements from list of lists in Python?

    - by user248237
    I want to filter elements from a list of lists, and iterate over the elements of each element using a lambda. For example, given the list: a = [[1,2,3],[4,5,6]] suppose that I want to keep only elements where the sum of the list is greater than N. I tried writing: filter(lambda x, y, z: x + y + z >= N, a) but I get the error: <lambda>() takes exactly 3 arguments (1 given) How can I iterate while assigning values of each element to x, y, and z? Something like zip, but for arbitrarily long lists. thanks, p.s. I know I can write this using: filter(lambda x: sum(x)..., a) but that's not the point, imagine that these were not numbers but arbitrary elements and I wanted to assign their values to variable names.

    Read the article

  • Exporting dates properly formatted on Google Appengine in Python

    - by Chris M
    I think this is right but google appengine seems to get to a certain point and cop-out; Firstly is this code actually right; and secondly is there away to skip the record if it cant output (like an ignore errors and continue)? class TrackerExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'SearchRec', [('__key__', lambda key:key.name(), None), ('WebSite', str, None), ('DateStamp', lambda x: datetime.datetime.strptime(x, '%d-%m-%Y').date(), None), ('IP', str, None), ('UserAgent', str, None)]) Thanks

    Read the article

  • Python: Unpack arbitary length bits for database storage

    - by sberry2A
    I have a binary data format consisting of 18,000+ packed int64s, ints, shorts, bytes and chars. The data is packed to minimize it's size, so they don't always use byte sized chunks. For example, a number whose min and max value are 31, 32 respectively might be stored with a single bit where the actual value is bitvalue + min, so 0 is 31 and 1 is 32. I am looking for the most efficient way to unpack all of these for subsequent processing and database storage. Right now I am able to read any value by using either struct.unpack, or BitBuffer. I use struct.unpack for any data that starts on a bit where (bit-offset % 8 == 0 and data-length % 8 == 0) and I use BitBuffer for anything else. I know the offset and size of every packed piece of data, so what is going to be the fasted way to completely unpack them? Many thanks.

    Read the article

  • Optimizing BeautifulSoup (Python) code

    - by user283405
    I have code that uses the BeautifulSoup library for parsing, but it is very slow. The code is written in such a way that threads cannot be used. Can anyone help me with this? I am using BeautifulSoup for parsing and than save into a DB. If I comment out the save statement, it still takes a long time, so there is no problem with the database. def parse(self,text): soup = BeautifulSoup(text) arr = soup.findAll('tbody') for i in range(0,len(arr)-1): data=Data() soup2 = BeautifulSoup(str(arr[i])) arr2 = soup2.findAll('td') c=0 for j in arr2: if str(j).find("<a href=") > 0: data.sourceURL = self.getAttributeValue(str(j),'<a href="') else: if c == 2: data.Hits=j.renderContents() #and few others... c = c+1 data.save() Any suggestions? Note: I already ask this question here but that was closed due to incomplete information.

    Read the article

  • Restart logging to a new file (Python)

    - by compie
    I'm using the following code to initialize logging in my application. logger = logging.getLogger() logger.setLevel(logging.DEBUG) # log to a file directory = '/reserved/DYPE/logfiles' now = datetime.now().strftime("%Y%m%d_%H%M%S") filename = os.path.join(directory, 'dype_%s.log' % now) file_handler = logging.FileHandler(filename) file_handler.setLevel(logging.DEBUG) formatter = logging.Formatter("%(asctime)s %(filename)s, %(lineno)d, %(funcName)s: %(message)s") file_handler.setFormatter(formatter) logger.addHandler(file_handler) # log to the console console_handler = logging.StreamHandler() level = logging.INFO console_handler.setLevel(level) logger.addHandler(console_handler) logging.debug('logging initialized') How can I close the current logging file and restart logging to a new file? Note: I don't want to use RotatingFileHandler, because I want full control over all the filenames and the moment of rotation.

    Read the article

  • python: problem with dictionary get method default value

    - by goutham
    I'm having a new problem here .. CODE 1: try: urlParams += "%s=%s&"%(val['name'], data.get(val['name'], serverInfo_D.get(val['name']))) except KeyError: print "expected parameter not provided - "+val["name"]+" is missing" exit(0) CODE 2: try: urlParams += "%s=%s&"%(val['name'], data.get(val['name'], serverInfo_D[val['name']])) except KeyError: print "expected parameter not provided - "+val["name"]+" is missing" exit(0) see the diffrence in serverInfo_D[val['name']] & serverInfo_D.get(val['name']) code 2 fails but code 1 works the data serverInfo_D:{'user': 'usr', 'pass': 'pass'} data: {'par1': 9995, 'extraparam1': 22} val: {'par1','user','pass','extraparam1'} exception are raised for for data dict .. and all code in for loop which iterates over val

    Read the article

  • Python RegExp exception

    - by Jasie
    How do I split on all nonalphanumeric characters, EXCEPT the apostrophe? re.split('\W+',text) works, but will also split on apostrophes. How do I add an exception to this rule? Thanks!

    Read the article

  • Custom keys for Google App Engine models (Python)

    - by Cameron
    First off, I'm relatively new to Google App Engine, so I'm probably doing something silly. Say I've got a model Foo: class Foo(db.Model): name = db.StringProperty() I want to use name as a unique key for every Foo object. How is this done? When I want to get a specific Foo object, I currently query the datastore for all Foo objects with the target unique name, but queries are slow (plus it's a pain to ensure that name is unique when each new Foo is created). There's got to be a better way to do this! Thanks.

    Read the article

  • Python module being reloaded for each request with django and mod_wsgi

    - by Vishal
    I have a variable in init of a module which get loaded from the database and takes about 15 seconds. For django development server everything is working fine but looks like with apache2 and mod_wsgi the module is loaded with every request (taking 15 seconds). Any idea about this behavior? Update: I have enabled daemon mode in mod wsgi, looks like its not reloading the modules now! needs more testing and I will update.

    Read the article

  • I want to design a html form in python

    - by VaIbHaV-JaIn
    when user will enter details in the text box on the html from <h1>Please enter new password</h1> <form method="POST" enctype="application/json action="uid"> Password<input name="passwd"type="password" /><br> Retype Password<input name="repasswd" type="password" /><br> <input type="Submit" /> </form> </body> i want to post the data in json format through http post request and also i want to set content-type = application/json

    Read the article

  • Python 4 steps setup with progressBars

    - by Samuel Taylor
    I'm having a problem with the code below. When I run it the progress bar will pulse for around 10 secs as meant to and then move on to downloading and will show the progress but when finished it will not move on to the next step it just locks up. import sys import time import pygtk import gtk import gobject import threading import urllib import urlparse class WorkerThread(threading.Thread): def __init__ (self, function, parent, arg = None): threading.Thread.__init__(self) self.function = function self.parent = parent self.arg = arg self.parent.still_working = True def run(self): # when does "run" get executed? self.parent.still_working = True if self.arg == None: self.function() else: self.function(self.arg) self.parent.still_working = False def stop(self): self = None class MainWindow: def __init__(self): gtk.gdk.threads_init() self.wTree = gtk.Builder() self.wTree.add_from_file("gui.glade") self.mainWindows() def mainWindows(self): self.mainWindow = self.wTree.get_object("frmMain") dic = { "on_btnNext_clicked" : self.mainWindowNext, } self.wTree.connect_signals(dic) self.mainWindow.show() self.installerStep = 0 # 0 = none, 1 = preinstall, 2 = download, 3 = install info, 4 = install #gtk.main() self.mainWindowNext() def pulse(self): self.wTree.get_object("progress").pulse() if self.still_working == False: self.mainWindowNext() return self.still_working def preinstallStep(self): self.wTree.get_object("progress").set_fraction(0) self.wTree.get_object("btnNext").set_sensitive(0) self.wTree.get_object("notebook1").set_current_page(0) self.installerStep = 1 WT = WorkerThread(self.heavyWork, self) #Would do a heavy function here like setup some thing WT.start() gobject.timeout_add(75, self.pulse) def downloadStep(self): self.wTree.get_object("progress").set_fraction(0) self.wTree.get_object("btnNext").set_sensitive(0) self.wTree.get_object("notebook1").set_current_page(0) self.installerStep = 2 urllib.urlretrieve('http://mozilla.mirrors.evolva.ro//firefox/releases/3.6.3/win32/en-US/Firefox%20Setup%203.6.3.exe', '/tmp/firefox.exe', self.updateHook) self.mainWindowNext() def updateHook(self, blocks, blockSize, totalSize): percentage = float ( blocks * blockSize ) / totalSize if percentage > 1: percentage = 1 self.wTree.get_object("progress").set_fraction(percentage) while gtk.events_pending(): gtk.main_iteration() def installInfoStep(self): self.wTree.get_object("btnNext").set_sensitive(1) self.wTree.get_object("notebook1").set_current_page(1) self.installerStep = 3 def installStep(self): self.wTree.get_object("progress").set_fraction(0) self.wTree.get_object("btnNext").set_sensitive(0) self.wTree.get_object("notebook1").set_current_page(0) self.installerStep = 4 WT = WorkerThread(self.heavyWork, self) #Would do a heavy function here like setup some thing WT.start() gobject.timeout_add(75, self.pulse) def mainWindowNext(self, widget = None): if self.installerStep == 0: self.preinstallStep() elif self.installerStep == 1: self.downloadStep() elif self.installerStep == 2: self.installInfoStep() elif self.installerStep == 3: self.installStep() elif self.installerStep == 4: sys.exit(0) def heavyWork(self): time.sleep(10) if __name__ == '__main__': MainWindow() gtk.main() I have a feeling that its something to do with: while gtk.events_pending(): gtk.main_iteration() Is there a better way of doing this?

    Read the article

  • Rectangle Rotation in Python/Pygame

    - by mramazingguy
    Hey I'm trying to rotate a rectangle around its center and when I try to rotate the rectangle, it moves up and to the left at the same time. Does anyone have any ideas on how to fix this? def rotatePoint(self, angle, point, origin): sinT = sin(radians(angle)) cosT = cos(radians(angle)) return (origin[0] + (cosT * (point[0] - origin[0]) - sinT * (point[1] - origin[1])), origin[1] + (sinT * (point[0] - origin[0]) + cosT * (point[1] - origin[1]))) def rotateRect(self, degrees): center = (self.collideRect.centerx, self.collideRect.centery) self.collideRect.topleft = self.rotatePoint(degrees, self.collideRect.topleft, center) self.collideRect.topright = self.rotatePoint(degrees, self.collideRect.topright, center) self.collideRect.bottomleft = self.rotatePoint(degrees, self.collideRect.bottomleft, center) self.collideRect.bottomright = self.rotatePoint(degrees, self.collideRect.bottomright, center)

    Read the article

  • Python 3.1 - Memory Error during sampling of a large list

    - by jimy
    The input list can be more than 1 million numbers. When I run the following code with smaller 'repeats', its fine; def sample(x): length = 1000000 new_array = random.sample((list(x)),length) return (new_array) def repeat_sample(x): i = 0 repeats = 100 list_of_samples = [] for i in range(repeats): list_of_samples.append(sample(x)) return(list_of_samples) repeat_sample(large_array) However, using high repeats such as the 100 above, results in MemoryError. Traceback is as follows; Traceback (most recent call last): File "C:\Python31\rnd.py", line 221, in <module> STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY) File "C:\Python31\rnd.py", line 129, in repeat_sample list_of_samples.append(sample(x)) File "C:\Python31\rnd.py", line 121, in sample new_array = random.sample((list(x)),length) File "C:\Python31\lib\random.py", line 309, in sample result = [None] * k MemoryError I am assuming I'm running out of memory. I do not know how to get around this problem. Thank you for your time!

    Read the article

  • Python hash() can't handle long integer?

    - by Xie
    I defined a class: class A: ''' hash test class a = A(9, 1196833379, 1, 1773396906) hash(a) -340004569 This is weird, 12544897317L expected. ''' def __init__(self, a, b, c, d): self.a = a self.b = b self.c = c self.d = d def __hash__(self): return self.a * self.b + self.c * self.d Why, in the doctest, hash() function gives a negative integer?

    Read the article

  • how to send some data to the Thread module on python and google-map-engine

    - by zjm1126
    from google.appengine.ext import db class Log(db.Model): content = db.StringProperty(multiline=True) class MyThread(threading.Thread): def run(self,request): #logs_query = Log.all().order('-date') #logs = logs_query.fetch(3) log=Log() log.content=request.POST.get('content',None) log.put() def Log(request): thr = MyThread() thr.start(request) return HttpResponse('') error is : Exception in thread Thread-1: Traceback (most recent call last): File "D:\Python25\lib\threading.py", line 486, in __bootstrap_inner self.run() File "D:\zjm_code\helloworld\views.py", line 33, in run log.content=request.POST.get('content',None) NameError: global name 'request' is not defined

    Read the article

  • Python and classes

    - by Artyom
    Hello, i have 2 classes. How i call first.TQ in Second ? Without creating object First in Second. class First: def __init__(self): self.str = "" def TQ(self): pass def main(self): T = Second(self.str) # Called here class Second(): def __init__(self): list = {u"RANDINT":first.TQ} # List of funcs maybe called in first ..... ..... return data

    Read the article

  • Python: How best to parse a simple grammar?

    - by Rosarch
    Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can parse most of them. From the text, I would like to generate a graph of course pre-requisite relationships. (That part will be easy, after I have parsed the data.) Some sample inputs and outputs: "CS 2110" => ("CS", 2110) # 0 "CS 2110 and INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, 3300, 3140" => [("CS", 2110), ("CS", 3300), ("CS", 3140)] # 1 "CS 2110 or INFO 3300" => [[("CS", 2110)], [("INFO", 3300)]] # 2 "MATH 2210, 2230, 2310, or 2940" => [[("MATH", 2210), ("MATH", 2230), ("MATH", 2310)], [("MATH", 2940)]] # 3 If the entire description is just a course, it is output directly. If the courses are conjoined ("and"), they are all output in the same list If the course are disjoined ("or"), they are in separate lists Here, we have both "and" and "or". One caveat that makes it easier: it appears that the nesting of "and"/"or" phrases is never greater than as shown in example 3. What is the best way to do this? I started with PLY, but I couldn't figure out how to resolve the reduce/reduce conflicts. The advantage of PLY is that it's easy to manipulate what each parse rule generates: def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = (p[1], int(p[2])) With PyParse, it's less clear how to modify the output of parseString(). I was considering building upon @Alex Martelli's idea of keeping state in an object and building up the output from that, but I'm not sure exactly how that is best done. def addCourse(self, str, location, tokens): self.result.append((tokens[0][0], tokens[0][1])) def makeCourseList(self, str, location, tokens): dept = tokens[0][0] new_tokens = [(dept, tokens[0][1])] new_tokens.extend((dept, tok) for tok in tokens[1:]) self.result.append(new_tokens) For instance, to handle "or" cases: def __init__(self): self.result = [] # ... self.statement = (course_data + Optional(OR_CONJ + course_data)).setParseAction(self.disjunctionCourses) def disjunctionCourses(self, str, location, tokens): if len(tokens) == 1: return tokens print "disjunction tokens: %s" % tokens How does disjunctionCourses() know which smaller phrases to disjoin? All it gets is tokens, but what's been parsed so far is stored in result, so how can the function tell which data in result corresponds to which elements of token? I guess I could search through the tokens, then find an element of result with the same data, but that feel convoluted... What's a better way to approach this problem?

    Read the article

< Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166  | Next Page >