Search Results

Search found 16680 results on 668 pages for 'python datetime'.

Page 395/668 | < Previous Page | 391 392 393 394 395 396 397 398 399 400 401 402  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • Designing a Tag table that tells how many times it's used

    - by Satoru.Logic
    Hi, all. I am trying to design a tagging system with a model like this: Tag: content = CharField creator = ForeignKey used = IntergerField It is a many-to-many relationship between tags and what's been tagged. Everytime I insert a record into the assotication table, Tag.used is incremented by one, and decremented by one in case of deletion. Tag.used is maintained because I want to speed up answering the question 'How many times this tag is used?'. However, this seems to slow insertion down obviously. Please tell me how to improve this design. Thanks in advance.

    Read the article

  • creating message box as sheets for mac in PyQt

    - by user971306
    I used message box as seperate dialog instead of sheets for mac OS, now i m working on it to spawn a sheet as message box instead of seperate one. I have tried setting the message box as a modal one: (messagebox.setWindowModality(QtCore.Qt.WindoModal)) and setting message box, parent dialog window flags as sheet (parentDialog.setWindowFlags(QtCore.Qt.Sheet) messagebox.setWindowFlags(QtCore.Qt.Sheet)) But the above commands are not working to create a sheet instead of seperate dialog. Does anyone have an idea of how to solve?

    Read the article

  • How to preprocess a Django model field value before return?

    - by Satoru.Logic
    Hi, all. I have a Note model class like this: class Note(models.Model): author = models.ForeignKey(User, related_name='notes') content = NoteContentField(max_length=256) NoteContentField is a custom sub-class of CharField that override the to_python method in purpose of doing some twitter-text-conversion processing. class NoteContentField(models.CharField): __metaclass__ = models.SubfieldBase def to_python(self, value): value = super(NoteContentField, self).to_python(value) from ..utils import linkify return mark_safe(linkify(value)) However, this doesn't work. When I save a Note object like this: note = Note(author=request.use, content=form.cleaned_data['content']) The conversed value is saved into the database, which is not what I wanna see. Would you please tell me what's wrong with this? Thanks in advance.

    Read the article

  • pyplot: really slow creating heatmaps

    - by cvondrick
    I have a loop that executes the body about 200 times. In each loop iteration, it does a sophisticated calculation, and then as debugging, I wish to produce a heatmap of a NxM matrix. But, generating this heatmap is unbearably slow and significantly slow downs an already slow algorithm. My code is along the lines: import numpy import matplotlib.pyplot as plt for i in range(200): matrix = complex_calculation() plt.set_cmap("gray") plt.imshow(matrix) plt.savefig("frame{0}.png".format(i)) The matrix, from numpy, is not huge --- 300 x 600 of doubles. Even if I do not save the figure and instead update an on-screen plot, it's even slower. Surely I must be abusing pyplot. (Matlab can do this, no problem.) How do I speed this up?

    Read the article

  • Chipmunk warning still present with --release

    - by Kaliber64
    I'm using Python27 on Windows 7 64-bit. I downloaded the source for Chipmunk 6.2.x and compiled Pymunk with --release and -c ming32. Almost zero problems. Lots of path not found cause I'm bad. All prints seem to have disappeared but I get spammed with EPA iteration warnings. I've seen a couple discussions but no solutions. Possible chipmunk betas to fix the float errors causing the double truths causing the warning. I picked the latest stable version I think. My program is seriously bogged down with all the prints. class NullDevice(): def write(self, s): pass sys.stdout=NullDevice() Does not disable the C prints .< Any help?

    Read the article

  • Trying to set up nested while loops using a boolean switch

    - by thorn100
    First time posting. I'm trying to set up a while loop that will ask the user for the employee name, hours worked and hourly wage until the user enters 'DONE'. Eventually I'll modify the code to calculate the weekly pay and write it to a list, but one thing at a time. The problem is once the main while loop executes once, it just stops. Doesn't error out but just stops. I have to kill the program to get it to stop. I want it to ask the three questions again and again until the user is finished. Thoughts? Please note that this is just an exercise and not meant for any real world application. def getName(): """Asks for the employee's full name""" firstName=raw_input("\nWhat is your first name? ") lastName=raw_input("\nWhat is your last name? ") fullName=firstName.title() + " " + lastName.title() return fullName def getHours(): """Asks for the number of hours the employee worked""" hoursWorked=0 while int(hoursWorked)<1 or int(hoursWorked) > 60: hoursWorked=raw_input("\nHow many hours did the employee work: ") if int(hoursWorked)<1 or int(hoursWorked) > 60: print "Please enter an integer between 1 and 60." else: return hoursWorked def getWage(): """Asks for the employee's hourly wage""" wage=0 while float(wage)<6.00 or float(wage)>20.00: wage=raw_input("\nWhat is the employee's hourly wage: ") if float(wage)<6.00 or float(wage)>20.00: print ("Please enter an hourly wage between $6.00 and $20.00") else: return wage ##sentry variables employeeName="" employeeHours=0 employeeWage=0 booleanDone=False #Enter employee info print "Please enter payroll information for an employee or enter 'DONE' to quit." while booleanDone==False: while employeeName=="": employeeName=getName() if employeeName.lower()=="done": booleanDone=True break print "The employee's name is", employeeName while employeeHours==0: employeeHours=getHours() if employeeHours.lower()=="done": booleanDone=True break print employeeName, "worked", employeeHours, "this week." while employeeWage==0: employeeWage=getWage() if employeeWage.lower()=="done": booleanDone=True break print employeeName + "'s hourly wage is $" + employeeWage

    Read the article

  • Sort a list of tuples without case sensitivity

    - by dound
    How can I efficiently and easily sort a list of tuples without being sensitive to case? For example this: [('a', 'c'), ('A', 'b'), ('a', 'a'), ('a', 5)] Should look like this once sorted: [('a', 5), ('a', 'a'), ('A', 'b'), ('a', 'c')] The regular lexicographic sort will put 'A' before 'a' and yield this: [('A', 'b'), ('a', 5), ('a', 'a'), ('a', 'c')]

    Read the article

  • django newbie question : cant start a new project

    - by Moayyad Yaghi
    hello . I'm totally new to django . and I'm using its documentation to get help on how to use it but seems like something is missing. i installed django using setup.py install command and i added the ( django/bin ) to system path variable but. i still cant start a new project i use the following syntax to start a project : django-admin.py startproject myNewProject but it says Type 'django-admin.py help' for usage. 1 do i miss anything ? thank u

    Read the article

  • Which webserver to use with bottle?

    - by luc
    Bottle can use several webservers: Build-in HTTP development server and support for paste, fapws3, flup, cherrypy or any other WSGI capable server. I am using bottle for a desktop-app and I guess that the development server is enough in this case. I would like to know if some of you have experience with one of the alternative server. Which server for which purpose? Thanks in advance

    Read the article

  • How to manually create a DBRef using pymongo?

    - by Soviut
    I want to create a DBRef manually so that I can add an additional field to it. However, when I try to pass the following: {'$ref': 'projects', '$id': '1029412409721', 'project_name': 'My Project'} Pymongo raises an error: pymongo.errors.InvalidName: key '$id' must not start with '$' It would seem that pymongo reserve the $ for the special key, leading me to wonder if it is even possible to do what I'm trying to do?

    Read the article

  • Euclidian Distances between points

    - by R S
    I have an array of points in numpy: points = rand(dim, n_points) And I want to: Calculate all the l2 norm (euclidian distance) between a certain point and all other points Calculate all pairwise distances. and preferably all numpy and no for's.

    Read the article

  • Selective emboldeing of text in a webpage

    - by Eknath Iyer
    while printing out utf-8 characters onto a webpage, if encapsulate them with they get emboldened, but anything else, the page turns blank. Why? def main(): print "Content-type: text/html\r\n\r\n"; print '<html>' print '<head>' print '<style type="text/css">' print '.highlight { background-color: yellow }' print '.color1 { color: green; }' print '.color2 { color: blue; }' print '.color3 { color: purple; }' print '.color4 { color: red; }' print '.color5 { color: teal; }' print '.color6 { color: yellow; }' print '.color7 { color: orange; }' print '.color8 { color: violet; }' print '</style></head>' print '<body>' form = cgi.FieldStorage() ch = form.getvalue('choice') if ch == 'English': in_sent = form.getvalue('f1') in_sent = in_sent.lower() cho=0 elif ch == 'Hindi': in_sent = trans_he(form.getvalue('transl1').decode("utf-8")).strip() cho=1 #cho = 0 for english #cho = 1 for hindi adict=[] print '<center><u> User Input Sentence ==> <b>', in_sent,'</b></u></center><br>' in_sent=in_sent.strip().split(' ') colordict={} counter=1 for word in in_sent: colordict[word]=counter counter = counter + 1 f = open('bidirectional.alignment.txt','rb').read() records=f.strip().split('\n\n\n') for record in records: el=[] el2 = [] #basic file processing is done here. record = record.strip().split('\n') source = record[cho] target = record[(cho+1)%2] source_sent = source.split(' # ')[1] target_sent = target.split(' # ')[1] source_words = source_sent.strip().split(' ') target_words = target_sent.strip().split(' ') trans_index = source.split(' # ')[2].strip().split(' ') for word in in_sent: if word in source_words: if int(trans_index[source_words.index(word)]) > 0: tword=target_words[(int(trans_index[source_words.index(word)])-1)] target_sent = target_sent.replace(tword+' ','<b>'+tword+' </b>') # When the <b> tag is used here(for the 'target_sent = ...' statement). it is fine. But when <b> is replaced by something like in the next line or even <i> or <u>, it doesn't show an output at all source_sent = source_sent.replace(word+' ','<span class="color1">'+word+' </span>') el2.append(source_sent) el2.append(target_sent) el.append(target_sent.count('<b>')) el.append(el2) if target_sent.count('<b>') > 0: adict.append(el) print '<table><tr><td><center><h1>SOURCE LANGUAGE</h1></center></td><td><center> <h1>TARGET LANGUAGE</h1></center></td></tr>' for entry in adict: print '<tr><td>',entry[1][0],'</td><td>',trans_eh(entry[1][1]).encode("utf-8"),'</td> </tr>' print '</table></body>' print '</html>' main()

    Read the article

  • Matplotlib: plotting discrete values

    - by Arkapravo
    I am trying to plot the following ! from numpy import * from pylab import * import random for x in range(1,500): y = random.randint(1,25000) print(x,y) plot(x,y) show() However, I keep getting a blank graph (?). Just to make sure that the program logic is correct I added the code print(x,y), just the confirm that (x,y) pairs are being generated. (x,y) pairs are being generated, but there is no plot, I keep getting a blank graph. Any help ?

    Read the article

  • How to set up Atana Studio 3 Themes in Pydev

    - by willy1234x1
    I've installed the Aptana Studio 3 preview and noticed it has support for themes (such as a bespin style or Ruby envy) and I'd love to use the Bespin one in Pydev but so far I've had no luck getting it to work, anyone have a clue as to how to get it to work? Video showing the themes in action.

    Read the article

< Previous Page | 391 392 393 394 395 396 397 398 399 400 401 402  | Next Page >