Search Results

Search found 14486 results on 580 pages for 'python idle'.

Page 364/580 | < Previous Page | 360 361 362 363 364 365 366 367 368 369 370 371  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • Iterating over key and value of defaultdict dictionaries

    - by gf
    The following works as expected: d = [(1,2), (3,4)] for k,v in d: print "%s - %s" % (str(k), str(v)) But this fails: d = collections.defaultdict(int) d[1] = 2 d[3] = 4 for k,v in d: print "%s - %s" % (str(k), str(v)) With: Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not iterable Why? How can i fix it?

    Read the article

  • How to get progress bar to time Class exectution

    - by chrissygormley
    Hello, I am trying to use progress bar to show the progress of a script. I want it increase progress after every function in a class is executed. The code I have tried is below: import progressbar from time import sleep class hello(): def no(self): print 'hello!' def yes(self): print 'No!!!!!!' def pro(): bar = progressbar.ProgressBar(widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()]) for i in Yep(): bar.update(Yep.i()) sleep(0.1) bar.finish() if __name__ == "__main__": Yep = hello() pro() Does anyone know how to get this working. Thanks

    Read the article

  • Selective emboldeing of text in a webpage

    - by Eknath Iyer
    while printing out utf-8 characters onto a webpage, if encapsulate them with they get emboldened, but anything else, the page turns blank. Why? def main(): print "Content-type: text/html\r\n\r\n"; print '<html>' print '<head>' print '<style type="text/css">' print '.highlight { background-color: yellow }' print '.color1 { color: green; }' print '.color2 { color: blue; }' print '.color3 { color: purple; }' print '.color4 { color: red; }' print '.color5 { color: teal; }' print '.color6 { color: yellow; }' print '.color7 { color: orange; }' print '.color8 { color: violet; }' print '</style></head>' print '<body>' form = cgi.FieldStorage() ch = form.getvalue('choice') if ch == 'English': in_sent = form.getvalue('f1') in_sent = in_sent.lower() cho=0 elif ch == 'Hindi': in_sent = trans_he(form.getvalue('transl1').decode("utf-8")).strip() cho=1 #cho = 0 for english #cho = 1 for hindi adict=[] print '<center><u> User Input Sentence ==> <b>', in_sent,'</b></u></center><br>' in_sent=in_sent.strip().split(' ') colordict={} counter=1 for word in in_sent: colordict[word]=counter counter = counter + 1 f = open('bidirectional.alignment.txt','rb').read() records=f.strip().split('\n\n\n') for record in records: el=[] el2 = [] #basic file processing is done here. record = record.strip().split('\n') source = record[cho] target = record[(cho+1)%2] source_sent = source.split(' # ')[1] target_sent = target.split(' # ')[1] source_words = source_sent.strip().split(' ') target_words = target_sent.strip().split(' ') trans_index = source.split(' # ')[2].strip().split(' ') for word in in_sent: if word in source_words: if int(trans_index[source_words.index(word)]) > 0: tword=target_words[(int(trans_index[source_words.index(word)])-1)] target_sent = target_sent.replace(tword+' ','<b>'+tword+' </b>') # When the <b> tag is used here(for the 'target_sent = ...' statement). it is fine. But when <b> is replaced by something like in the next line or even <i> or <u>, it doesn't show an output at all source_sent = source_sent.replace(word+' ','<span class="color1">'+word+' </span>') el2.append(source_sent) el2.append(target_sent) el.append(target_sent.count('<b>')) el.append(el2) if target_sent.count('<b>') > 0: adict.append(el) print '<table><tr><td><center><h1>SOURCE LANGUAGE</h1></center></td><td><center> <h1>TARGET LANGUAGE</h1></center></td></tr>' for entry in adict: print '<tr><td>',entry[1][0],'</td><td>',trans_eh(entry[1][1]).encode("utf-8"),'</td> </tr>' print '</table></body>' print '</html>' main()

    Read the article

  • inheritance from str or int

    - by wiso
    Why I have problem creating a class the inherite from str (or also int) class C(str): def __init__(self, a, b): str.__init__(self,a) self.b = b C("a", "B") TypeError: str() takes at most 1 argument (2 given) tha same appened if I try to use int instead of str, but it works with custom classes. I need to use __new__ instead of __init__? why?

    Read the article

  • App Engine HTTP 500s

    - by pocoa
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application. I've handled all the situations, also DeadlineExceededError too. But sometimes I see these error messages in error logs. That request took about 10k ms, so it's not exceeded the limit too. But there is no other specific message about this error. All I know is that it returned HTTP 500. Is there anyone know the reason of these error messages? Thank you.

    Read the article

  • Storing a list of objects in GAE

    - by Dominic Bou-Samra
    I need to store some data that looks a little like this: xyz 123 abc 456 hij 678 rer 838 Now I would just store it as a traditional string and integer model, and put in the datastore. But the data changes regularly, and is ONLY relevant when looked at as a COLLECTION. So it needs to be store as either a list of lists, or a list of objects, both of which can't really be done without pickling as far as I know. Can anyone help? Even storing it as a text file may work :S

    Read the article

  • SQLAlchemy - select for update example

    - by Mark
    I'm looking for a complete example of using select for update in SQLAlchemy, but haven't found one googling. I need to lock a single row and update a column, the following code doesn't work (blocks forever): s = table.select(table.c.user=="test",for_update=True) u = table.update().where(table.c.user=="test") u.execute(email="foo") Do I need a commit? How do I do that? As far as I know you need to: begin transaction select ... for update update commit

    Read the article

  • BeautifulSoup, but for CSS?

    - by MTsoul
    BeautifulSoup parses HTML and offers various ways to manipulate and search within HTML. Is there something similar for CSS? Specifically, I'd like to know if a given HTML text is rendered as bold. Either it has an ancestor that is the <strong> or the <bold> tag (which can be done with BeautifulSoup), or it has an ancestor (or itself) that has CSS attributes with font-weight: bold. Is this possible without resulting to writing my own library?

    Read the article

  • SQLAlchemy: select over multiple tables

    - by ahojnnes
    Hi, I wanted to optimize my database query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( not_(link_table.c.id.in_( select( columns=[request_table.c.recipient], whereclause=request_table.c.donator==donator.id ).as_scalar() )), link_table.c.id!=donator.id, ), limit=20, ).execute().fetchall() and tried to merge those two selects in one query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( link_table.c.active==True, link_table.c.id!=donator.id, request_table.c.donator==donator.id, link_table.c.id!=request_table.c.recipient, ), limit=20, order_by=[link_table.c.rating.desc()] ).execute().fetchall() the database-schema looks like: link_table = Table('links', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('url', Unicode(250), index=True, unique=True), Column('registration_date', DateTime), Column('donations_in', Integer), Column('active', Boolean), ) request_table = Table('requests', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('recipient', Integer, ForeignKey('links.id')), Column('donator', Integer, ForeignKey('links.id')), Column('date', DateTime), ) There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested". But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that? Thank you very much in advance!

    Read the article

  • Django database caching

    - by hekevintran
    I have a Django form that uses an integer field to lookup a model object by its primary key. The form has a save() method that uses the model object referred to by the integer field. The model's manager's get() method is called twice, once in the clean method and once in the save() method: class MyForm(forms.Form): id_a = fields.IntegerField() def clean_id_a(user_id): id_a = self.cleaned_data['id_a'] try: # here is the first call to get MyModel.objects.get(id=id_a) except User.DoesNotExist: raise ValidationError('Object does not exist') def save(self): id_a = self.cleaned_data['id_a'] # here is the second call to get my_model_object = MyModel.objects.get(id=id_a) # do other stuff I wasn't sure whether this hits the database two times or one time so I returned the object itself in the clean method so that I could avoid a second get() call. Does calling get() hit the database two times? Or is the object cached in the thread? class MyForm(forms.Form): id_a = fields.IntegerField() def clean_id_a(user_id): id_a = self.cleaned_data['id_a'] try: # here is my workaround return MyModel.objects.get(id=id_a) except User.DoesNotExist: raise ValidationError('Object does not exist') def save(self): # looking up the cleaned value returns the model object my_model_object = self.cleaned_data['id_a'] # do other stuff

    Read the article

  • web2py error while using distinct in the queries

    - by Steve
    Hi, I am using web2py with GAE. While using some of the queries which has a distinct clause, GAE throws out an error.I have pasted the Traceback. Can someone please help me out with this. In FILE: /base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py Traceback (most recent call last): File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/restricted.py", line 173, in restricted exec ccode in environment File "/base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py:profileview", line 263, in <module> File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/globals.py", line 96, in <lambda> self._caller = lambda f: f() File "/base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py:profileview", line 97, in profileview File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/contrib/gql.py", line 675, in select (items, tablename, fields) = self._select(*fields, **attributes) File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/contrib/gql.py", line 624, in _select raise SyntaxError, 'invalid select attribute: %s' % key SyntaxError: invalid select attribute: distinct Thanks

    Read the article

  • Real-time data on webpage with Django and jQuery

    - by Steven Hepting
    I would like a webpage that constantly updates a graph with new data as it arrives. Regularly, all the data you have is passed to a Django view at the beginning of the request. However, I need the page to be able to update itself with fresh information every few seconds to redraw the graph. Background The webpage will be similar to this http://www.panic.com/blog/2010/03/the-panic-status-board/. The data coming in will temperature values to be graphed measured by an Arduino and saved to the Django database (I've already done this part).

    Read the article

  • Non-global middleware in Django

    - by hekevintran
    In Django there is a settings file that defines the middleware to be run on each request. This middleware setting is global. Is there a way to specify a set of middleware on a per-view basis? I want to have specific urls use a set of middleware different from the global set.

    Read the article

  • Django: Summing values

    - by Anry
    I have a two Model - Project and Cost. class Project(models.Model): title = models.CharField(max_length=150) url = models.URLField() manager = models.ForeignKey(User) class Cost(models.Model): project = models.ForeignKey(Project) cost = models.FloatField() date = models.DateField() I must return the sum of costs for each project. view.py: from mypm.costs.models import Project, Cost from django.shortcuts import render_to_response from django.db.models import Avg, Sum def index(request): #... return render_to_response('index.html',... How?

    Read the article

  • Copy **kwargs to self?

    - by Mark
    Given class ValidationRule: def __init__(self, **kwargs): # code here Is there a way that I can define __init__ such that if I were to initialize the class with something like ValidationRule(other='email') then self.other would be "added" to class without having to explicitly name every possible kwarg?

    Read the article

  • Why is the destructor called when the CPython garbage collector is disabled?

    - by Frederik
    I'm trying to understand the internals of the CPython garbage collector, specifically when the destructor is called. So far, the behavior is intuitive, but the following case trips me up: Disable the GC. Create an object, then remove a reference to it. The object is destroyed and the __del__ method is called. I thought this would only happen if the garbage collector was enabled. Can someone explain why this happens? Is there a way to defer calling the destructor? import gc import unittest _destroyed = False class MyClass(object): def __del__(self): global _destroyed _destroyed = True class GarbageCollectionTest(unittest.TestCase): def testExplicitGarbageCollection(self): gc.disable() ref = MyClass() ref = None # The next test fails. # The object is automatically destroyed even with the collector turned off. self.assertFalse(_destroyed) gc.collect() self.assertTrue(_destroyed) if __name__=='__main__': unittest.main() Disclaimer: this code is not meant for production -- I've already noted that this is very implementation-specific and does not work on Jython.

    Read the article

  • 'NoneType' object has no attribute 'data'

    - by Bill Jordan
    Hello guys, I am sending a SOAP request to my server and getting the response back. sample of the response string is shown below: <?xml version = '1.0' ?> <env:Envelope xmlns:env=http:////www.w3.org/2003/05/soap-envelop . .. .. <env:Body> <epas:get-all-config-resp xmlns:epas="urn:organization:epas:soap"> ^M ... ... <epas:property name="Tom">12</epas:property> > > <epas:property name="Alice">34</epas:property> > > <epas:property name="John">56</epas:property> > > <epas:property name="Danial">78</epas:property> > > <epas:property name="George">90</epas:property> > > <epas:property name="Luise">11</epas:property> ... ^M </env:Body? </env:Envelop> What I noticed in the response is that there is an extra character shown in the body which is "^M". Not sure if this could be the issue. Note the ^M shown! when I tried parsing the string returned from the server to get the names and values using the code sample: elements = minidom.parseString(xmldoc).getElementsByTagName("property") myDict = {} for element in elements: myDict[element.getAttribute('name')] = element.firstChild.data But, I am getting this error: 'NoneType' object has no attribute 'data'. May be its something to do with the "^M" shown on the xml response back! Any ideas/comments would be appreciated, Cheers

    Read the article

  • In Django, how to create tables from an SQL file when syncdb is run

    - by Sidney
    Hi, How do I make syncdb execute SQL queries (for table creation) defined by me, rather then generating tables automatically. I'm looking for this solution as some particular models in my app represent SQL-table-views for a legacy-database table. So, I've created their SQL-views in my django-DB like this: CREATE VIEW legacy_series AS SELECT * FROM legacy.series; I have a reverse engineered model that represents the above view/legacytable. But whenever I run syncdb, I have to create all the views first by running sql scripts, otherwise syncdb simply creates tables for them (if a view is not found). How do I make syncdb run the above mentioned SQL?

    Read the article

  • How do I use a ListProperty(users.user) in a djangoforms.ModelForm on Google AppEngine?

    - by Gabriel
    I have been looking around a bit for info on how to do this. Essentially I have a Model: class SharableUserAsset(db.Model): name = StringProperty() users = ListProperty(users.User) My questions are: What is the best way to associate users to this value where they are not authenticated, visa vi invite from contacts list etc.? Is there a reasonable way to present a list control easily in a djangoforms.ModelForm? Once a user logs in I want to be able to check if that user is in the list for any number of SharableUserAsset class "records", how do I do that? Does user evaluate as a match to an email address or is there a way to look up a valid user against an email address?

    Read the article

  • Speeding up templates in GAE-Py by aggregating RPC calls

    - by Sudhir Jonathan
    Here's my problem: class City(Model): name = StringProperty() class Author(Model): name = StringProperty() city = ReferenceProperty(City) class Post(Model): author = ReferenceProperty(Author) content = StringProperty() The code isn't important... its this django template: {% for post in posts %} <div>{{post.content}}</div> <div>by {{post.author.name}} from {{post.author.city.name}}</div> {% endfor %} Now lets say I get the first 100 posts using Post.all().fetch(limit=100), and pass this list to the template - what happens? It makes 200 more datastore gets - 100 to get each author, 100 to get each author's city. This is perfectly understandable, actually, since the post only has a reference to the author, and the author only has a reference to the city. The __get__ accessor on the post.author and author.city objects transparently do a get and pull the data back (See this question). Some ways around this are Use Post.author.get_value_for_datastore(post) to collect the author keys (see the link above), and then do a batch get to get them all - the trouble here is that we need to re-construct a template data object... something which needs extra code and maintenance for each model and handler. Write an accessor, say cached_author, that checks memcache for the author first and returns that - the problem here is that post.cached_author is going to be called 100 times, which could probably mean 100 memcache calls. Hold a static key to object map (and refresh it maybe once in five minutes) if the data doesn't have to be very up to date. The cached_author accessor can then just refer to this map. All these ideas need extra code and maintenance, and they're not very transparent. What if we could do @prefetch def render_template(path, data) template.render(path, data) Turns out we can... hooks and Guido's instrumentation module both prove it. If the @prefetch method wraps a template render by capturing which keys are requested we can (atleast to one level of depth) capture which keys are being requested, return mock objects, and do a batch get on them. This could be repeated for all depth levels, till no new keys are being requested. The final render could intercept the gets and return the objects from a map. This would change a total of 200 gets into 3, transparently and without any extra code. Not to mention greatly cut down the need for memcache and help in situations where memcache can't be used. Trouble is I don't know how to do it (yet). Before I start trying, has anyone else done this? Or does anyone want to help? Or do you see a massive flaw in the plan?

    Read the article

< Previous Page | 360 361 362 363 364 365 366 367 368 369 370 371  | Next Page >