Search Results

Search found 13676 results on 548 pages for 'python dude'.

Page 358/548 | < Previous Page | 354 355 356 357 358 359 360 361 362 363 364 365  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • Getting an entry before and after a given entry in a Django Queryset

    - by Vernon
    I am creating a simple blog as part of a website and I am getting stuck on something that I am assuming is simple. If I call any blog post, say by it's title, from a queryset, how can I get the entry before and after the post in it's published order. I can iterate over the whole thing, get the position of the entry I have and use that to call the one before and the one after. But that is a long bit of code for something that I am sure I can do more simply. What I want would be something like this: next_post = Posts.object.filter(title=current_title).order_by("-published")[-1] Of course because of the filter, it is not going to work, but just to give you the idea of what I am looking for.

    Read the article

  • Which webserver to use with bottle?

    - by luc
    Bottle can use several webservers: Build-in HTTP development server and support for paste, fapws3, flup, cherrypy or any other WSGI capable server. I am using bottle for a desktop-app and I guess that the development server is enough in this case. I would like to know if some of you have experience with one of the alternative server. Which server for which purpose? Thanks in advance

    Read the article

  • Euclidian Distances between points

    - by R S
    I have an array of points in numpy: points = rand(dim, n_points) And I want to: Calculate all the l2 norm (euclidian distance) between a certain point and all other points Calculate all pairwise distances. and preferably all numpy and no for's.

    Read the article

  • Help converting code using httlib2 to use urllib2

    - by ThinkCode
    What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

    Read the article

  • Sending data from one Protocol to another Protocol in Twisted?

    - by veb
    Hi! One of my protocols is connected to a server, and with the output of that I'd like to send it to the other protocol. I need to access the 'msg' method in ClassA from ClassB but I keep getting: exceptions.AttributeError: 'NoneType' object has no attribute 'write' Actual code: http://pastebin.com/MQPhduSY Any ideas please? :-)

    Read the article

  • How to set up Atana Studio 3 Themes in Pydev

    - by willy1234x1
    I've installed the Aptana Studio 3 preview and noticed it has support for themes (such as a bespin style or Ruby envy) and I'd love to use the Bespin one in Pydev but so far I've had no luck getting it to work, anyone have a clue as to how to get it to work? Video showing the themes in action.

    Read the article

  • Finding the intersection of two vector equations.

    - by Matthew Mitchell
    I've been trying to solve this and I found an equation that gives the possibility of zero division errors. Not the best thing: v1 = (a,b) v2 = (c,d) d1 = (e,f) d2 = (h,i) l1: v1 + ?d1 l2: v2 + µd2 Equation to find vector intersection of l1 and l2 programatically by re-arranging for lambda. (a,b) + ?(e,f) = (c,d) + µ(h,i) a + ?e = c + µh b +?f = d + µi µh = a + ?e - c µi = b +?f - d µ = (a + ?e - c)/h µ = (b +?f - d)/i (a + ?e - c)/h = (b +?f - d)/i a/h + ?e/h - c/h = b/i +?f/i - d/i ?e/h - ?f/i = (b/i - d/i) - (a/h - c/h) ?(e/h - f/i) = (b - d)/i - (a - c)/h ? = ((b - d)/i - (a - c)/h)/(e/h - f/i) Intersection vector = (a + ?e,b + ?f) Not sure if it would even work in some cases. I haven't tested it. I need to know how to do this for values as in that example a-i. Thank you.

    Read the article

  • Trying to set up nested while loops using a boolean switch

    - by thorn100
    First time posting. I'm trying to set up a while loop that will ask the user for the employee name, hours worked and hourly wage until the user enters 'DONE'. Eventually I'll modify the code to calculate the weekly pay and write it to a list, but one thing at a time. The problem is once the main while loop executes once, it just stops. Doesn't error out but just stops. I have to kill the program to get it to stop. I want it to ask the three questions again and again until the user is finished. Thoughts? Please note that this is just an exercise and not meant for any real world application. def getName(): """Asks for the employee's full name""" firstName=raw_input("\nWhat is your first name? ") lastName=raw_input("\nWhat is your last name? ") fullName=firstName.title() + " " + lastName.title() return fullName def getHours(): """Asks for the number of hours the employee worked""" hoursWorked=0 while int(hoursWorked)<1 or int(hoursWorked) > 60: hoursWorked=raw_input("\nHow many hours did the employee work: ") if int(hoursWorked)<1 or int(hoursWorked) > 60: print "Please enter an integer between 1 and 60." else: return hoursWorked def getWage(): """Asks for the employee's hourly wage""" wage=0 while float(wage)<6.00 or float(wage)>20.00: wage=raw_input("\nWhat is the employee's hourly wage: ") if float(wage)<6.00 or float(wage)>20.00: print ("Please enter an hourly wage between $6.00 and $20.00") else: return wage ##sentry variables employeeName="" employeeHours=0 employeeWage=0 booleanDone=False #Enter employee info print "Please enter payroll information for an employee or enter 'DONE' to quit." while booleanDone==False: while employeeName=="": employeeName=getName() if employeeName.lower()=="done": booleanDone=True break print "The employee's name is", employeeName while employeeHours==0: employeeHours=getHours() if employeeHours.lower()=="done": booleanDone=True break print employeeName, "worked", employeeHours, "this week." while employeeWage==0: employeeWage=getWage() if employeeWage.lower()=="done": booleanDone=True break print employeeName + "'s hourly wage is $" + employeeWage

    Read the article

  • Disable autocomplete on textfield in Django?

    - by tau-neutrino
    Does anyone know how you can turn off autocompletion on a textfield in Django? For example, a form that I generate from my model has an input field for a credit card number. It is bad practice to leave autocompletion on. When making the form by hand, I'd add a autocomplete="off" statement, but how do you do it in Django and still retain the form validation?

    Read the article

  • SQLAlchemy: select over multiple tables

    - by ahojnnes
    Hi, I wanted to optimize my database query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( not_(link_table.c.id.in_( select( columns=[request_table.c.recipient], whereclause=request_table.c.donator==donator.id ).as_scalar() )), link_table.c.id!=donator.id, ), limit=20, ).execute().fetchall() and tried to merge those two selects in one query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( link_table.c.active==True, link_table.c.id!=donator.id, request_table.c.donator==donator.id, link_table.c.id!=request_table.c.recipient, ), limit=20, order_by=[link_table.c.rating.desc()] ).execute().fetchall() the database-schema looks like: link_table = Table('links', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('url', Unicode(250), index=True, unique=True), Column('registration_date', DateTime), Column('donations_in', Integer), Column('active', Boolean), ) request_table = Table('requests', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('recipient', Integer, ForeignKey('links.id')), Column('donator', Integer, ForeignKey('links.id')), Column('date', DateTime), ) There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested". But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that? Thank you very much in advance!

    Read the article

  • How to get the original variable name of variable passed to a function

    - by Acorn
    Is it possible to get the original variable name of a variable passed to a function? E.g. foobar = "foo" def func(var): print var.origname So that: func(foobar) Returns: >>foobar EDIT: All I was trying to do was make a function like: def log(soup): f = open(varname+'.html', 'w') print >>f, soup.prettify() f.close() .. and have the function generate the filename from the name of the variable passed to it. I suppose if it's not possible I'll just have to pass the variable and the variable's name as a string each time.

    Read the article

  • How do I find the "concrete class" of a django model baseclass

    - by Mr Shark
    I'm trying to find the actual class of a django-model object, when using model-inheritance. Some code to describe the problem: class Base(models.model): def basemethod(self): ... class Child_1(Base): pass class Child_2(Base): pass If I create various objects of the two Child classes and the create a queryset containing them all: Child_1().save() Child_2().save() (o1, o2) = Base.objects.all() I want to determine if the object is of type Child_1 or Child_2 in basemethod, I can get to the child object via o1.child_1 and o2.child_2 but that reconquers knowledge about the childclasses in the baseclass. I have come up with the following code: def concrete_instance(self): instance = None for subclass in self._meta.get_all_related_objects(): acc_name = subclass.get_accessor_name() try: instance = self.__getattribute__(acc_name) return instance except Exception, e: pass But it feels brittle and I'm not sure of what happens when if I inherit in more levels.

    Read the article

  • How to turn a list of tuples into a string?

    - by matt
    I have a list of tuples that I'm trying to incorporate into a SQL query but I can't figure out how to join them together without adding slashes. My like this: list = [('val', 'val'), ('val', 'val'), ('val', 'val')] If I turn each tuple into a string and try to join them with a a comma I'll get something like ' (\'val\, \'val\'), ... ' What's the right way to do this, so I can get the list (without brackets) as a string?

    Read the article

  • Remove padding in wxPython's wxWizard

    - by mridang
    Hi Guys, I'm using wxPython to create a wizard using the wxWizard control. I'm trying to a draw a colored rectangle but when I run the app, there seems to be a about a 10px padding on each side of the rectangle. This goes for all other controls too. I have to offset them a bit so that they appear exactly where I want them to. Is there any way I could remove this padding? Here's the source of my base Wizard page. class SimplePage(wx.wizard.PyWizardPage): """ Simple wizard page with unlimited rows of text. """ def __init__(self, parent, title): wx.wizard.PyWizardPage.__init__(self, parent) self.next = self.prev = None #self.sizer = wx.BoxSizer(wx.VERTICAL) title = wx.StaticText(self, -1, title) title.SetFont(wx.Font(18, wx.SWISS, wx.NORMAL, wx.BOLD)) #self.sizer.AddWindow(title, 0, wx.ALIGN_LEFT|wx.ALL, padding) #self.sizer.AddWindow(wx.StaticLine(self, -1), 0, wx.EXPAND|wx.ALL, padding) # self.SetSizer(self.sizer) self.Bind(wx.EVT_PAINT, self.OnPaint) def OnPaint(self, evt): """set up the device context (DC) for painting""" self.dc = wx.PaintDC(self) self.dc.BeginDrawing() self.dc.SetPen(wx.Pen("grey",style=wx.TRANSPARENT)) self.dc.SetBrush(wx.Brush("grey", wx.SOLID)) # set x, y, w, h for rectangle self.dc.DrawRectangle(0,0,500, 500) self.dc.EndDrawing() del self.dc def SetNext(self, next): self.next = next def SetPrev(self, prev): self.prev = prev def GetNext(self): return self.next def GetPrev(self): return self.prev def Activated(self, evt): """ Executed when page is being activated. """ return def Blocked(self, evt): """ Executed when page is about to be switched. Switching can be blocked by returning True. """ return False def Cancel(self, evt): """ Executed when wizard is about to be canceled. Canceling can be blocked by returning False. """ return True Thanks guys.

    Read the article

  • pylab.savefig() and pylab.show() image difference

    - by Jack1990
    I'm making an script to automatically create plots from .xvg files, but there's a problem when I'm trying to use pylab's savefig() method. Using pylab.show() and saving from there, everything's fine. Using pylab.show() Using pylab.savefig() def producePlot(timestep, energy_values,type_line = 'r', jump = 1,finish = 100): fc = sp.interp1d(timestep[::jump], energy_values[::jump],kind='cubic') xnew = numpy.linspace(0, finish, finish*2) pylab.plot(xnew, fc(xnew),type_line) pylab.xlabel('Time in ps ') pylab.ylabel('kJ/mol') pylab.xlim(xmin=0, xmax=finish) def produceSimplePlot(timestep, energy_values,type_line = 'r', jump = 1,finish = 100): pylab.plot(timestep, energy_values,type_line) pylab.xlabel('Time in ps ') pylab.ylabel('kJ/mol') pylab.xlim(xmin=0, xmax=finish) def linearRegression(timestep, energy_values, type_line = 'g'): #, jump = 1,finish = 100): from scipy import stats import numpy #print 'fuck' timestep = numpy.asarray(timestep) slope, intercept, r_value, p_value, std_err = stats.linregress(timestep,energy_values) line = slope*timestep+intercept pylab.plot(timestep, line, type_line) def plottingTime(Title,file_name, timestep, energy_values ,loc, jump , finish): pylab.title(Title) producePlot(timestep,energy_values, 'b',jump, finish) linearRegression(timestep,energy_values) import numpy Average = numpy.average(energy_values) #print Average pylab.legend(("Average = %.2f" %(Average),'Linear Reg'),loc) #pylab.show() pylab.savefig('%s.jpg' %file_name[:-4], bbox_inches= None, pad_inches=0) #if __name__ == '__main__': #plottingTime(Title,timestep1, energy_values, jump =10, finish = 4800) def specialCase(Title,file_name, timestep, energy_values,loc, jump, finish): #print 'Working here ...?' pylab.title(Title) producePlot(timestep,energy_values, 'b',jump, finish) import numpy from pylab import * Average = numpy.average(energy_values) #print Average pylab.legend(("Average = %.2g" %(Average), Title),loc) locs,labels = yticks() yticks(locs, map(lambda x: "%.3g" % x, locs)) #pylab.show() pylab.savefig('%s.jpg' %file_name[:-4] , bbox_inches= None, pad_inches=0) Thanks in advance, John

    Read the article

  • How to draw the "trail" in a maze solving application

    - by snow-spur
    Hello i have designed a maze and i want to draw a path between the cells as the 'person' moves from one cell to the next. So each time i move the cell a line is drawn Also i am using the graphics module The graphics module is an object oriented library Im importing from graphics import* from maze import* my circle which is my cell center = Point(15, 15) c = Circle(center, 12) c.setFill('blue') c.setOutline('yellow') c.draw(win) p1 = Point(c.getCenter().getX(), c.getCenter().getY()) this is my loop if mazez.blockedCount(cloc)> 2: mazez.addDecoration(cloc, "grey") mazez[cloc].deadend = True c.move(-25, 0) p2 = Point(getX(), getY()) line = graphics.Line(p1, p2) cloc.col = cloc.col - 1 Now it says getX not defined every time i press a key is this because of p2???

    Read the article

  • Chipmunk warning still present with --release

    - by Kaliber64
    I'm using Python27 on Windows 7 64-bit. I downloaded the source for Chipmunk 6.2.x and compiled Pymunk with --release and -c ming32. Almost zero problems. Lots of path not found cause I'm bad. All prints seem to have disappeared but I get spammed with EPA iteration warnings. I've seen a couple discussions but no solutions. Possible chipmunk betas to fix the float errors causing the double truths causing the warning. I picked the latest stable version I think. My program is seriously bogged down with all the prints. class NullDevice(): def write(self, s): pass sys.stdout=NullDevice() Does not disable the C prints .< Any help?

    Read the article

  • how to format date when i load data from google-app-engine..

    - by zjm1126
    i use remote_api to load data from google-app-engine. appcfg.py download_data --config_file=helloworld/GreetingLoad.py --filename=a.csv --kind=Greeting helloworld the setting is: class AlbumExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Greeting', [('author', str, None), ('content', str, None), ('date', str, None), ]) exporters = [AlbumExporter] and i download a.csv is : the date is not readable , and the date in appspot.com admin is : so how to get the full date ?? thanks i change this : class AlbumExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Greeting', [('author', str, None), ('content', str, None), ('date', lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date(), None), ]) exporters = [AlbumExporter] but the error is :

    Read the article

  • Sort a list of tuples without case sensitivity

    - by dound
    How can I efficiently and easily sort a list of tuples without being sensitive to case? For example this: [('a', 'c'), ('A', 'b'), ('a', 'a'), ('a', 5)] Should look like this once sorted: [('a', 5), ('a', 'a'), ('A', 'b'), ('a', 'c')] The regular lexicographic sort will put 'A' before 'a' and yield this: [('A', 'b'), ('a', 5), ('a', 'a'), ('a', 'c')]

    Read the article

< Previous Page | 354 355 356 357 358 359 360 361 362 363 364 365  | Next Page >