Search Results

Search found 13534 results on 542 pages for 'python 3 3'.

Page 360/542 | < Previous Page | 356 357 358 359 360 361 362 363 364 365 366 367  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • inheritance from str or int

    - by wiso
    Why I have problem creating a class the inherite from str (or also int) class C(str): def __init__(self, a, b): str.__init__(self,a) self.b = b C("a", "B") TypeError: str() takes at most 1 argument (2 given) tha same appened if I try to use int instead of str, but it works with custom classes. I need to use __new__ instead of __init__? why?

    Read the article

  • web2py error while using distinct in the queries

    - by Steve
    Hi, I am using web2py with GAE. While using some of the queries which has a distinct clause, GAE throws out an error.I have pasted the Traceback. Can someone please help me out with this. In FILE: /base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py Traceback (most recent call last): File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/restricted.py", line 173, in restricted exec ccode in environment File "/base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py:profileview", line 263, in <module> File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/globals.py", line 96, in <lambda> self._caller = lambda f: f() File "/base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py:profileview", line 97, in profileview File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/contrib/gql.py", line 675, in select (items, tablename, fields) = self._select(*fields, **attributes) File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/contrib/gql.py", line 624, in _select raise SyntaxError, 'invalid select attribute: %s' % key SyntaxError: invalid select attribute: distinct Thanks

    Read the article

  • Qt/PyQt dialog with toggable fullscreen mode - problem on Windows

    - by Guard
    I have a dialog created in PyQt. It's purpose and functionality don't matter. The init is: class MyDialog(QWidget, ui_module.Ui_Dialog): def __init__(self, parent=None): super(MyDialog, self).__init__(parent) self.setupUi(self) self.installEventFilter(self) self.setWindowFlags(Qt.Dialog | Qt.WindowTitleHint) self.showMaximized() Then I have event filtering method: def eventFilter(self, obj, event): if event.type() == QEvent.KeyPress: key = event.key() if key == Qt.Key_F11: if self.isFullScreen(): self.setWindowFlags(self._flags) if self._state == 'm': self.showMaximized() else: self.showNormal() self.setGeometry(self._geometry) else: self._state = 'm' if self.isMaximized() else 'n' self._flags = self.windowFlags() self._geometry = self.geometry() self.setWindowFlags(Qt.Tool | Qt.FramelessWindowHint) self.showFullScreen() return True elif key == Qt.Key_Escape: self.close() return QWidget.eventFilter(self, obj, event) As can be seen, Esc is used for dialog hiding, and F11 is used for toggling full-screen. In addition, if the user changed the dialog mode from the initial maximized to normal and possibly moved the dialog, it's state and position are restored after exiting the full-screen. Finally, the dialog is created on the MainWindow action triggered: d = MyDialog(self) d.show() It works fine on Linux (Ubuntu Lucid), but quite strange on Windows 7: if I go to the full-screen from the maximized mode, I can't exit full-screen (on F11 dialog disappears and appears in full-screen mode again. If I change the dialog's mode to Normal (by double-clicking its title), then go to full-screen and then return back, the dialog is shown in the normal mode, in the correct position, but without the title line. Most probably the reason for both cases is the same - the setWindowFlags doesn't work. But why? Is it also possible that it is the bug in the recent PyQt version? On Ubuntu I have 4.6.x from apt, and on Windows - the latest installer from the riverbank site.

    Read the article

  • Huge amount of time sending data with suds and proxy

    - by Roman
    Hi everyone, I have the following code to send data through a proxy using suds: import suds t = suds.transport.http.HttpTransport() proxy = urllib2.ProxyHandler({'http': 'http://192.168.3.217:3128'}) opener = urllib2.build_opener(proxy) t.urlopener = opener ws = suds.client.Client('http://xxxxxxx/web.asmx?WSDL', transport=t) req = ws.factory.create('ActionRequest.request') req.SerialNumber = 'asdf' req.HostName = 'hola' res = ws.service.ActionRequest(req) I don't know why, but it can be sending data above 2 or 3 minutes, or even more and it raises a "Gateway timeout" exception sometimes. If I don't use the proxy, the amount of time used is above 2 seconds or less. Here is the SOAP reply: (ActionResponse){ Id = None Action = "Action.None" Objects = "" } The proxy is running right with other requests through urllib2, or using normal web browsers like firefox. Does anyone have any idea what's happening here with suds? Thanks a lot in advance!!!

    Read the article

  • How come my South migrations doesn't work for Django?

    - by TIMEX
    First, I create my database. create database mydb; I add "south" to installed Apps. Then, I go to this tutorial: http://south.aeracode.org/docs/tutorial/part1.html The tutorial tells me to do this: $ py manage.py schemamigration wall --initial >>> Created 0001_initial.py. You can now apply this migration with: ./manage.py migrate wall Great, now I migrate. $ py manage.py migrate wall But it gives me this error... django.db.utils.DatabaseError: (1146, "Table 'fable.south_migrationhistory' doesn't exist") So I use Google (which never works. hence my 870 questions asked on Stackoverflow), and I get this page: http://groups.google.com/group/south-users/browse_thread/thread/d4c83f821dd2ca1c Alright, so I follow that instructions >> Drop database mydb; >> Create database mydb; $ rm -rf ./wall/migrations $ py manage.py syncdb But when I run syncdb, Django creates a bunch of tables. Yes, it creates the south_migrationhistory table, but it also creates my app's tables. Synced: > django.contrib.admin > django.contrib.auth > django.contrib.contenttypes > django.contrib.sessions > django.contrib.sites > django.contrib.messages > south > fable.notification > pagination > timezones > fable.wall > mediasync > staticfiles > debug_toolbar Not synced (use migrations): - (use ./manage.py migrate to migrate these) Cool....now it tells me to migrate these. So, I do this: $ py manage.py migrate wall The app 'wall' does not appear to use migrations. Alright, so fine. I'll add wall to initial migrations. $ py manage.py schemamigration wall --initial Then I migrate: $ py manage.py migrate wall You know what? It gives me this BS: _mysql_exceptions.OperationalError: (1050, "Table 'wall_content' already exists") Sorry, this is really pissing me off. Can someone help ? thanks. How do I get South to work and sync correctly with everything?

    Read the article

  • How do I launch background jobs w/ paramiko?

    - by sophacles
    Here is my scenario: I am trying to automate some tasks using Paramiko. The tasks need to be started in this order (using the notation (host, task)): (A, 1), (B, 2), (C, 2), (A,3), (B,3) -- essentially starting servers and clients for some testing in the correct order. Further, because in the tests networking may get mucked up, and because I need some of the output from the tests, I would like to just redirect output to a file. In similar scenarios the common response is to use 'screen -m -d' or to use 'nohup'. However with paramiko's exec_cmd, nohup doesn't actually exit. Using: bash -c -l nohup test_cmd & doesnt work either, exec_cmd still blocks to process end. In the screen case, output redirection doesn't work very well, (actually, doesnt work at all the best I can figure out). So, after all that explanation, my question is: is there an easy elegant way to detach processes and capture output in such a way as to end paramiko's exec_cmd blocking? Update The dtach command works nicely for this!

    Read the article

  • Sort a list of tuples without case sensitivity

    - by dound
    How can I efficiently and easily sort a list of tuples without being sensitive to case? For example this: [('a', 'c'), ('A', 'b'), ('a', 'a'), ('a', 5)] Should look like this once sorted: [('a', 5), ('a', 'a'), ('A', 'b'), ('a', 'c')] The regular lexicographic sort will put 'A' before 'a' and yield this: [('A', 'b'), ('a', 5), ('a', 'a'), ('a', 'c')]

    Read the article

  • Import boto from local library

    - by ensnare
    I'm trying to use boto as a downloaded library, rather than installing it globally on my machine. I'm able to import boto, but when I run boto.connect_dynamodb() I get an error: ImportError: No module named dynamodb.layer2 Here's my file structure: project/ project/ __init__.py libraries/ __init__.py flask/ boto/ views/ .... modules/ __init__.py db.py .... templates/ .... static/ .... runserver.py And the contents of the relevant files as follows: project/project/modules/db.py from project.libraries import boto conn = boto.connect_dynamodb( aws_access_key_id='<YOUR_AWS_KEY_ID>', aws_secret_access_key='<YOUR_AWS_SECRET_KEY>') What am I doing wrong? Thanks in advance.

    Read the article

  • Non-global middleware in Django

    - by hekevintran
    In Django there is a settings file that defines the middleware to be run on each request. This middleware setting is global. Is there a way to specify a set of middleware on a per-view basis? I want to have specific urls use a set of middleware different from the global set.

    Read the article

  • Matplotlib: plotting discrete values

    - by Arkapravo
    I am trying to plot the following ! from numpy import * from pylab import * import random for x in range(1,500): y = random.randint(1,25000) print(x,y) plot(x,y) show() However, I keep getting a blank graph (?). Just to make sure that the program logic is correct I added the code print(x,y), just the confirm that (x,y) pairs are being generated. (x,y) pairs are being generated, but there is no plot, I keep getting a blank graph. Any help ?

    Read the article

  • Execute function without sending 'self' to it

    - by Sergey
    Is that possible to define a function without referencing to self this way? def myfunc(var_a,var_b) But so that it could also get sender data, like if I defined it like this: def myfunc(self, var_a,var_b) That self is always the same so it looks a little redundant here always to run a function this way: myfunc(self,'data_a','data_b'). Then I would like to get its data in the function like this sender.fields. UPDATE: Here is some code to understand better what I mean. The class below is used to show a page based on Jinja2 templates engine for users to sign up. class SignupHandler(webapp.RequestHandler): def get(self, *args, **kwargs): utils.render_template(self, 'signup.html') And this code below is a render_template that I created as wrapper to Jinja2 functions to use it more conveniently in my project: def render_template(response, template_name, vars=dict(), is_string=False): template_dirs = [os.path.join(root(), 'templates')] logging.info(template_dirs[0]) env = Environment(loader=FileSystemLoader(template_dirs)) try: template = env.get_template(template_name) except TemplateNotFound: raise TemplateNotFound(template_name) content = template.render(vars) if is_string: return content else: response.response.out.write(content) As I use this function render_template very often in my project and usually the same way, just with different template files, I wondered if there was a way to get rid of having to call it like I do it now, with self as the first argument but still having access to that object.

    Read the article

  • the error "invalid literal for int() with base 10:" keeps coming up

    - by ratce003
    I'm trying to write a very simple program, I want to print out the sum of all the multiples of 3 and 5 below 100, but, an error keeps accuring, saying "invalid literal for int() with base 10:" my program is as follows: sum = "" sum_int = int(sum) for i in range(1, 101): if i % 5 == 0: sum += i elif i % 3 == 0: sum += i else: sum += "" print sum Any help would be much appreciated.

    Read the article

  • SQLAlchemy unsupported type error - and table design issues?

    - by Az
    Hi there, back again with some more SQLAlchemy shenanigans. Let me step through this. My table is now set up as so: engine = create_engine('sqlite:///:memory:', echo=False) metadata = MetaData() students_table = Table('studs', metadata, Column('sid', Integer, primary_key=True), Column('name', String), Column('preferences', Integer), Column('allocated_rank', Integer), Column('allocated_project', Integer) ) metadata.create_all(engine) mapper(Student, students_table) Fairly simple, and for the most part I've been enjoying the ability to query almost any bit of information I want provided I avoid the error cases below. The class it is mapped from is: class Student(object): def __init__(self, sid, name): self.sid = sid self.name = name self.preferences = collections.defaultdict(set) self.allocated_project = None self.allocated_rank = 0 def __repr__(self): return str(self) def __str__(self): return "%s %s" %(self.sid, self.name) Explanation: preferences is basically a set of all the projects the student would prefer to be assigned. When the allocation algorithm kicks in, a student's allocated_project emerges from this preference set. Now if I try to do this: for student in students.itervalues(): session.add(student) session.commit() It throws two errors, one for the allocated_project column (seen below) and a similar error for the preferences column: sqlalchemy.exc.InterfaceError: (InterfaceError) Error binding parameter 4 - probably unsupported type. u'INSERT INTO studs (sid, name, allocated_rank, allocated_project) VALUES (?, ?, ?, ?, ?, ?, ?)' [1101, 'Muffett,M.', 1, 888 Human-spider relationships (Supervisor id: 123)] If I go back into my code I find that, when I'm copying the preferences from the given text files, it actually refers to the Project class which is mapped to a dictionary, using the unique project id's (pid) as keys. Thus, as I iterate through each student via their rank and to the preferences set, it adds not a project id, but the reference to the project id from the projects dictionary. students[sid].preferences[int(rank)].add(projects[int(pid)]) Now this is very useful to me since I can find out all I want to about a student's preferred projects without having to run another check to pull up information about the project id. The form you see in the error has the object print information passed as: return "%s %s (Supervisor id: %s)" %(self.proj_id, self.proj_name, self.proj_sup) My questions are: I'm trying to store an object in a database field aren't I? Would the correct way then, be copying the project information (project id, name, etc) into its own table, referenced by the unique project id? That way I can just have the project id field for one of the student tables just be an integer id and when I need more information, just join the tables? So and so forth for other tables? If the above makes sense, then how does one maintain the relationship with a column of information in one table which is a key index on another table? Does this boil down into a database design problem? Are there any other elegant ways of accomplishing this? Apologies if this is a very long-winded question. It's rather crucial for me to solve this, so I've tried to explain as much as I can, whilst attempting to show that I'm trying (key word here sadly) to understand what could be going wrong.

    Read the article

  • Google App Engine appcfg.py data_upload Authentication fail

    - by Pradeep Upadhyay
    Hi, I am using appcfg.py to upload data to datastore from a csv file. But every time I try, I am getting error: [info ] Authentication failed even if i am using Admin id and password. In my app.yaml file I am having: handlers: - url: /remote_api script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py login: admin - url: .* script: MainHandler.py Can anybody please help me? Thanks in advance.

    Read the article

  • Send a "304 Not Modified" for images stored in the datastore

    - by Emilien
    I store user-uploaded images in the Google App Engine datastore as db.Blob, as proposed in the docs. I then serve those images on /images/<id>.jpg. The server always sends a 200 OK response, which means that the browser has to download the same image multiple time (== slower) and that the server has to send the same image multiple times (== more expensive). As most of those images will likely never change, I'd like to be able to send a 304 Not Modified response. I am thinking about calculating some kind of hash of the picture when the user uploads it, and then use this to know if the user already has this image (maybe send the hash as an Etag?) I have found this answer and this answer that explain the logic pretty well, but I have 2 questions: Is it possible to send an Etag in Google App Engine? Has anyone implemented such logic, and/or is there any code snippet available?

    Read the article

  • How can I display multiple django modelformset forms together?

    - by JT
    I have a problem with needing to provide multiple model backed forms on the same page. I understand how to do this with single forms, i.e. just create both the forms call them something different then use the appropriate names in the template. Now how exactly do you expand that solution to work with modelformsets? The wrinkle, of course, is that each 'form' must be rendered together in the appropriate fieldset. For example I want my template to produce something like this: <fieldset> <label for="id_base-0-desc">Home Base Description:</label> <input id="id_base-0-desc" type="text" name="base-0-desc" maxlength="100" /> <label for="id_likes-0-icecream">Want ice cream?</label> <input type="checkbox" name="likes-0-icecream" id="id_likes-0-icecream" /> </fieldset> <fieldset> <label for="id_base-1-desc">Home Base Description:</label> <input id="id_base-1-desc" type="text" name="base-1-desc" maxlength="100" /> <label for="id_likes-1-icecream">Want ice cream?</label> <input type="checkbox" name="likes-1-icecream" id="id_likes-1-icecream" /> </fieldset> I am using a loop like this to process the results for base_form, likes_form in map(None, base_forms, likes_forms): which works as I'd expect (I'm using map because the # of forms can be different). The problem is that I can't figure out a way to do the same thing with the templating engine. The system does work if I layout all the base models together then all the likes models after wards, but it doesn't meet the layout requirements.

    Read the article

  • How to clone a mercurial repository over an ssh connection initiated by fabric when http authorizati

    - by Monika Sulik
    I'm attempting to use fabric for the first time and I really like it so far, but at a certain point in my deployment script I want to clone a mercurial repository. When I get to that point I get an error: err: abort: http authorization required My repository requires http authorization and fabric doesn't prompt me for the user and password. I can get around this by changing my repository address from: https://hostname/repository to: https://user:password@hostname/repository But for various reasons I would prefer not to go this route. Are there any other ways in which I could bypass this problem?

    Read the article

  • Content-Length header not returned from Pylons response

    - by Evgeny
    I'm still struggling to Stream a file to the HTTP response in Pylons. In addition to the original problem, I'm finding that I cannot return the Content-Length header, so that for large files the client cannot estimate how long the download will take. I've tried response.content_length = 12345 and I've tried response.headers['Content-Length'] = 12345 In both cases the HTTP response (viewed in Fiddler) simply does not contain the Content-Length header. How do I get Pylons to return this header? (Oh, and if you have any ideas on making it stream the file please reply to the original question - I'm all out of ideas there.)

    Read the article

  • Get particular row as series from pandas dataframe

    - by Pratyush
    How do we get a particular filtered row as series? Example dataframe: >>> df = pd.DataFrame({'date': [20130101, 20130101, 20130102], 'location': ['a', 'a', 'c']}) >>> df date location 0 20130101 a 1 20130101 a 2 20130102 c I need to select the row where location is c as a series. I tried: row = df[df["location"] == "c"].head(1) # gives a dataframe row = df.ix[df["location"] == "c"] # also gives a dataframe with single row In either cases I can't the row as series.

    Read the article

< Previous Page | 356 357 358 359 360 361 362 363 364 365 366 367  | Next Page >