Search Results

Search found 14486 results on 580 pages for 'python idle'.

Page 364/580 | < Previous Page | 360 361 362 363 364 365 366 367 368 369 370 371  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • Help converting code using httlib2 to use urllib2

    - by ThinkCode
    What am I trying to do? Visit a site, retrieve cookie, visit the next page by sending in the cookie info. It all works but httplib2 is giving me one too many problems with socks proxy on one site. http = httplib2.Http() main_url = 'http://mywebsite.com/get.aspx?id='+ id +'&rows=25' response, content = http.request(main_url, 'GET', headers=headers) main_cookie = response['set-cookie'] referer = 'http://google.com' headers = {'Content-type': 'application/x-www-form-urlencoded', 'Cookie': main_cookie, 'User-Agent' : USER_AGENT, 'Referer' : referer} How to do the same exact thing using urllib2 (cookie retrieving, passing to the next page on the same site)? Thank you.

    Read the article

  • Regex for Matching First Alphanumeric Character skipping (The |An? )

    - by TheLizardKing
    I have a list of artists, albums and tracks that I want to sort using the first letter of their respective name. The issue arrives when I want to ignore "The ", "A ", "An " and other various non-alphanumeric characters (Talking to you "Weird Al" Yankovic and [dialog]). Django has a nice start '^(An?|The) +' but I want to ignore those and a few others of my choice. I am doing this in Django, using a MySQL db with utf8_bin collation.

    Read the article

  • Inexpensive ways to add seek to a filetype object

    - by becomingGuru
    PdfFileReader reads the content from a pdf file to create an object. I am querying the pdf from a cdn via urllib.urlopen(), this provides me a file like object, which has no seek. PdfFileReader, however uses seek. What is the simple way to create a PdfFileReader object from a pdf downloaded via url. Now, what can I do to avoid writing to disk and reading it again via file(). Thanks in advance.

    Read the article

  • how to format date when i load data from google-app-engine..

    - by zjm1126
    i use remote_api to load data from google-app-engine. appcfg.py download_data --config_file=helloworld/GreetingLoad.py --filename=a.csv --kind=Greeting helloworld the setting is: class AlbumExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Greeting', [('author', str, None), ('content', str, None), ('date', str, None), ]) exporters = [AlbumExporter] and i download a.csv is : the date is not readable , and the date in appspot.com admin is : so how to get the full date ?? thanks i change this : class AlbumExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Greeting', [('author', str, None), ('content', str, None), ('date', lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date(), None), ]) exporters = [AlbumExporter] but the error is :

    Read the article

  • How to draw the "trail" in a maze solving application

    - by snow-spur
    Hello i have designed a maze and i want to draw a path between the cells as the 'person' moves from one cell to the next. So each time i move the cell a line is drawn Also i am using the graphics module The graphics module is an object oriented library Im importing from graphics import* from maze import* my circle which is my cell center = Point(15, 15) c = Circle(center, 12) c.setFill('blue') c.setOutline('yellow') c.draw(win) p1 = Point(c.getCenter().getX(), c.getCenter().getY()) this is my loop if mazez.blockedCount(cloc)> 2: mazez.addDecoration(cloc, "grey") mazez[cloc].deadend = True c.move(-25, 0) p2 = Point(getX(), getY()) line = graphics.Line(p1, p2) cloc.col = cloc.col - 1 Now it says getX not defined every time i press a key is this because of p2???

    Read the article

  • Geocoding non-addresses: Geopy

    - by Phil Donovan
    Using geopy to geocode alcohol outlets in NZ. The problem I have is that some places do not have street addresses but are places in Google Maps. For example, plugging: Furneaux Lodge, Endeavour Inlet, Queen Charlotte Sound, Marlborough 7250 into Google Maps via the browser GUI gives me However, using that in Geopy I get a GQueryError saying this geographic location does not exist. Here is the code for geocoding: def GeoCode(address): g=geocoders.Google(domain="maps.google.co.nz") geoloc = g.geocode(address, exactly_one=False) place, (lat, lng) = geoloc[0] GeoOut = [] GeoOut.extend([place, lat, lng]) return GeoOut GeoCode("Furneaux Lodge, Endeavour Inlet, Queen Charlotte Sound, Marlboroguh 7250") Meanwhile, I notice that "Eiffel Tower" works fine. Is there away to solve this and can someone explain the difference between The Eiffel Tower and Furneaux Lodge within Google 'locations'?

    Read the article

  • Disable autocomplete on textfield in Django?

    - by tau-neutrino
    Does anyone know how you can turn off autocompletion on a textfield in Django? For example, a form that I generate from my model has an input field for a credit card number. It is bad practice to leave autocompletion on. When making the form by hand, I'd add a autocomplete="off" statement, but how do you do it in Django and still retain the form validation?

    Read the article

  • Matplotlib: plotting discrete values

    - by Arkapravo
    I am trying to plot the following ! from numpy import * from pylab import * import random for x in range(1,500): y = random.randint(1,25000) print(x,y) plot(x,y) show() However, I keep getting a blank graph (?). Just to make sure that the program logic is correct I added the code print(x,y), just the confirm that (x,y) pairs are being generated. (x,y) pairs are being generated, but there is no plot, I keep getting a blank graph. Any help ?

    Read the article

  • How to turn a list of tuples into a string?

    - by matt
    I have a list of tuples that I'm trying to incorporate into a SQL query but I can't figure out how to join them together without adding slashes. My like this: list = [('val', 'val'), ('val', 'val'), ('val', 'val')] If I turn each tuple into a string and try to join them with a a comma I'll get something like ' (\'val\, \'val\'), ... ' What's the right way to do this, so I can get the list (without brackets) as a string?

    Read the article

  • wxPython: Change a buttons text in a wx.FileDialog

    - by Sascha
    Hello I have a wx.FileDialog (with the wx.FD_OPEN flag) & I would like to know if I can (& how) I could change the button in the bottom right of the FileDialog from "Open" to "Create" or "Delete", etc. What I am doing is I have a button with the text "Delete Portfolio", when pressed it opens a FileDialog & allows the user to select a portfolio file(.db) to delete. So instead of the File Dialog's bottom right confirm button displaying "Open" I would like to be able to change the text to "Confirm" or "Delete" or whatever. Is this possible, its a rather superficial thing to do, but if the button says open when the user wants to select a file to delete, it can be a little confusing even if the title of the dialog says "please select a file to delete"

    Read the article

  • Chipmunk warning still present with --release

    - by Kaliber64
    I'm using Python27 on Windows 7 64-bit. I downloaded the source for Chipmunk 6.2.x and compiled Pymunk with --release and -c ming32. Almost zero problems. Lots of path not found cause I'm bad. All prints seem to have disappeared but I get spammed with EPA iteration warnings. I've seen a couple discussions but no solutions. Possible chipmunk betas to fix the float errors causing the double truths causing the warning. I picked the latest stable version I think. My program is seriously bogged down with all the prints. class NullDevice(): def write(self, s): pass sys.stdout=NullDevice() Does not disable the C prints .< Any help?

    Read the article

  • Sort a list of tuples without case sensitivity

    - by dound
    How can I efficiently and easily sort a list of tuples without being sensitive to case? For example this: [('a', 'c'), ('A', 'b'), ('a', 'a'), ('a', 5)] Should look like this once sorted: [('a', 5), ('a', 'a'), ('A', 'b'), ('a', 'c')] The regular lexicographic sort will put 'A' before 'a' and yield this: [('A', 'b'), ('a', 5), ('a', 'a'), ('a', 'c')]

    Read the article

  • Why do I get rows of zeros in my 2D fft?

    - by Nicholas Pringle
    I am trying to replicate the results from a paper. "Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006) The figures published show "the log of the variance of the 2D-FT". I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array. Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude: import numpy as np from numpy import ma from matplotlib import pyplot as plt from Scientific.IO.NetCDF import NetCDFFile ### input directory indir = '/home/nicholas/data/' ### get the flux data which is in ### [time(5day ave for 10 years),latitude,longitude] nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r') cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:] cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land nc.close() cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s ### create an array that consists of the seasonal signal fro each pixel year_stack = np.split(cflux, 10, axis=0) year_stack = np.array(year_stack) signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1)) signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask ### average the array over latitude(or longitude) signal_time_lon = ma.mean(signal_array, axis=1) ### do a 2D Fourier Transform of the time/space image ft = np.fft.fft2(signal_time_lon) mgft = np.abs(ft) ps = mgft**2 log_ps = np.log(mgft) log_mgft= np.log(mgft) Every second row of the ft consists completely of zeros. Why is this? Would it be acceptable to add a randomly small number to the signal to avoid this. signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05 EDIT: Adding images and clarify meaning The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image. Here are is the starting image using following code: plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight') Here is result using following code: ft = np.fft.rfft2(signal_time_lon) mgft = np.abs(ft) ps = mgft**2 log_ps = np.log1p(mgft) plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight') It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.

    Read the article

  • Simple XML over http web service

    - by Mark
    I have a simple html service, developed in django. You enter your name - it posts this, and returns a value (male/female). I need to ofer this as a web service. I have no idea where to start. I want to accept a xml request, and provide an xml response - thats it. Can anyone give ma any pointers - Googling it is difficult when you dont know what your searching for.

    Read the article

  • How to manually create a DBRef using pymongo?

    - by Soviut
    I want to create a DBRef manually so that I can add an additional field to it. However, when I try to pass the following: {'$ref': 'projects', '$id': '1029412409721', 'project_name': 'My Project'} Pymongo raises an error: pymongo.errors.InvalidName: key '$id' must not start with '$' It would seem that pymongo reserve the $ for the special key, leading me to wonder if it is even possible to do what I'm trying to do?

    Read the article

  • 'NoneType' object has no attribute 'data'

    - by Bill Jordan
    Hello guys, I am sending a SOAP request to my server and getting the response back. sample of the response string is shown below: <?xml version = '1.0' ?> <env:Envelope xmlns:env=http:////www.w3.org/2003/05/soap-envelop . .. .. <env:Body> <epas:get-all-config-resp xmlns:epas="urn:organization:epas:soap"> ^M ... ... <epas:property name="Tom">12</epas:property> > > <epas:property name="Alice">34</epas:property> > > <epas:property name="John">56</epas:property> > > <epas:property name="Danial">78</epas:property> > > <epas:property name="George">90</epas:property> > > <epas:property name="Luise">11</epas:property> ... ^M </env:Body? </env:Envelop> What I noticed in the response is that there is an extra character shown in the body which is "^M". Not sure if this could be the issue. Note the ^M shown! when I tried parsing the string returned from the server to get the names and values using the code sample: elements = minidom.parseString(xmldoc).getElementsByTagName("property") myDict = {} for element in elements: myDict[element.getAttribute('name')] = element.firstChild.data But, I am getting this error: 'NoneType' object has no attribute 'data'. May be its something to do with the "^M" shown on the xml response back! Any ideas/comments would be appreciated, Cheers

    Read the article

  • How do I find the "concrete class" of a django model baseclass

    - by Mr Shark
    I'm trying to find the actual class of a django-model object, when using model-inheritance. Some code to describe the problem: class Base(models.model): def basemethod(self): ... class Child_1(Base): pass class Child_2(Base): pass If I create various objects of the two Child classes and the create a queryset containing them all: Child_1().save() Child_2().save() (o1, o2) = Base.objects.all() I want to determine if the object is of type Child_1 or Child_2 in basemethod, I can get to the child object via o1.child_1 and o2.child_2 but that reconquers knowledge about the childclasses in the baseclass. I have come up with the following code: def concrete_instance(self): instance = None for subclass in self._meta.get_all_related_objects(): acc_name = subclass.get_accessor_name() try: instance = self.__getattribute__(acc_name) return instance except Exception, e: pass But it feels brittle and I'm not sure of what happens when if I inherit in more levels.

    Read the article

  • Dynamic Spacer in ReportLab

    - by ptikobj
    I'm automatically generating a PDF-file with Platypus that has dynamic content. This means that it might happen that the length of the text content (which is directly at the bottom of the pdf-file) may vary. However, it might happen that a page break is done in cases where the content is too long. This is because i use a "static" spacer: s = Spacer(width=0, height=23.5*cm) as i always want to have only one page, I somehow need to dynamically set the height of the Spacer, so that it takes the "rest" of the space that is on the page as its height. Now, how do i get the "rest" of height that is left on my page?

    Read the article

  • How do I call setattr() on the current module?

    - by Matt Joiner
    What do I pass as the first parameter "object" to the function setattr(object, name, value), to set variables on the current module? For example: setattr(object, "SOME_CONSTANT", 42); giving the same effect as: SOME_CONSTANT = 42 within the module containing these lines (with the correct object). I'm generate several values at the module level dynamically, and as I can't define __getattr__ at the module level, this is my fallback.

    Read the article

  • how do simple SQLAlchemy relationships work?

    - by Carson Myers
    I'm no database expert -- I just know the basics, really. I've picked up SQLAlchemy for a small project, and I'm using the declarative base configuration rather than the "normal" way. This way seems a lot simpler. However, while setting up my database schema, I realized I don't understand some database relationship concepts. If I had a many-to-one relationship before, for example, articles by authors (where each article could be written by only a single author), I would put an author_id field in my articles column. But SQLAlchemy has this ForeignKey object, and a relationship function with a backref kwarg, and I have no idea what any of it MEANS. I'm scared to find out what a many-to-many relationship with an intermediate table looks like (when I need additional data about each relationship). Can someone demystify this for me? Right now I'm setting up to allow openID auth for my application. So I've got this: from __init__ import Base from sqlalchemy.schema import Column from sqlalchemy.types import Integer, String class Users(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) username = Column(String, unique=True) email = Column(String) password = Column(String) salt = Column(String) class OpenID(Base): __tablename__ = 'openid' url = Column(String, primary_key=True) user_id = #? I think the ? should be replaced by Column(Integer, ForeignKey('users.id')), but I'm not sure -- and do I need to put openids = relationship("OpenID", backref="users") in the Users class? Why? What does it do? What is a backref?

    Read the article

< Previous Page | 360 361 362 363 364 365 366 367 368 369 370 371  | Next Page >