Search Results

Search found 13776 results on 552 pages for 'python appengine'.

Page 363/552 | < Previous Page | 359 360 361 362 363 364 365 366 367 368 369 370  | Next Page >

  • Best way to get back to using the power of lxml after having to use a regex to find something in an

    - by PyNEwbie
    I am trying to rip some text out of a large number of html documents (numbers in the hundreds of thousands). The documents are really forms but they are prepared by a very large group of different organizations so there is significant variation in how they create the document. For example, the documents are divided into chapters. I might want to extract the contents of Chapter 5 from every document so I can analyze the content of the chapter. Initially I thought this would be easy but it turns out that the authors might use a set of non-nested tables throughout the document to hold the content so that Chapter n could be displayed using td tags inside a table. Or they might use other elements such as p tags H tags, div tags or any other block level element. After trying repeatedly to use lxml to help me identify the beginning and end of each chapter I have determined that it is a lot cleaner to use a regular expression because in every case, no matter what the enclosing html element is the chapter label is always in the form of >Chapter # It is a little more complicated in that there might be some white space or non-breaking space represented in different ways (  or   or just spaces). Nonetheless it was trivial to write a regular expression to identify the beginning of each section. (The beginning of one section is the end of the previous section.) But now I want to use lxml to get the text out. My thought is that I have really no choice but to walk along my string to find the close tag for the element that encloses the text I am using to find the relevant section. That is here is one example where the element holding the Chapter name is a div <div style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt" align="left"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Times New Roman">Chapter 1.&#160;&#160;&#160;Our Beginnings.</font></div> So I am imagining that I would begin at the location where I found the match for chapter 1 and set up a regular expressions to find the next </div|</td|</p|</h1 . . . So at this point I have identified the type of element holding my chapter heading I can use the same logic to find all of the text that is within that element that is set up a regular expression to help me mark from >Chapter 1.&#160;&#160;&#160;Our Beginnings.< So I have identified where my Chapter 1 begins I can do the same for chapter 2 (which is where Chapter 1 ends) Now I am imagining that I am going to snip the document beginning at the opening of the element that I identified as the element the indicates where chapter 1 begins and ending just before the opening of the element that I identified as the element that indicates where Chapter 2 begins. The string that I have identified will then be fed to lxml to use its power to get the content. I am going to all of this trouble because I have read over and over - never use a regular expression to extract content from html documents and I have not hit on a way to be as accurate with lxml to identify the starting and ending locations for the text I want to extract. For example, I can never be certain that the subtitle of Chapter 1 is Our Beginnings it could be Our Red Canary. Let me say that I spent two solid days trying with lxml to be confident that I had the beginning and ending elements and I could only be accurate <60% of the time but a very short regular expression has given me better than 95% success. I have a tendency to make things more complicated than necessary so I am wondering if anyone has seen or solved a similar problems and if they had an approach (not the details mind you) that they would like to offer.

    Read the article

  • How to get progress bar to time Class exectution

    - by chrissygormley
    Hello, I am trying to use progress bar to show the progress of a script. I want it increase progress after every function in a class is executed. The code I have tried is below: import progressbar from time import sleep class hello(): def no(self): print 'hello!' def yes(self): print 'No!!!!!!' def pro(): bar = progressbar.ProgressBar(widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()]) for i in Yep(): bar.update(Yep.i()) sleep(0.1) bar.finish() if __name__ == "__main__": Yep = hello() pro() Does anyone know how to get this working. Thanks

    Read the article

  • Facebook Connect via Javascript doesn't close and doesn't pass session id

    - by ensnare
    I'm trying to authenticate users via Facebook Connect using a custom Javascript button: <form> <input type="button" value="Connect with Facebook" onclick="window.open('http://www.facebook.com/login.php?api_key=XXXXX&extern=1&fbconnect=1&req_perms=publish_stream,email&return_session=0&v=1.0&next=http%3A%2F%2Fwww.example.com%2Fxd_receiver.htm&fb_connect=1&cancel_url=http%3A%2F%2Fwww.example.com%2Fregister%2Fcancel', '_blank', 'top=442,width=480,height=460,resizable=yes', true)" onlogin='window.location="/register/step2"' /> </form> I am able to authenticate users. However after authentication, the popup window just stays open and the main window is not directed anywhere. In fact, it is the popup window that goes to "/register/step2" How can I get the login window to close as expected, and to pass the facebook session id to /register/step2? Thanks!

    Read the article

  • how to detect an escape sequence in a string

    - by mix
    Given a string named line whose raw version has this value: \rRAWSTRING how can I detect if it has the escape character \r? What I've tried is: if repr(line).startswith('\r'): blah... but it doesn't catch it. I also tried find, such as: if repr(line).find('\r') != -1: blah doesn't work either. What am I missing? thx! EDIT: thanks for all the replies and the corrections re terminolgy and sorry for the confusion. OK, if i do this print repr(line) then what it prints is: '\rSET ENABLE ACK\n' (including the single quotes). i have tried all the suggestions, including: line.startswith(r'\r') line.startswith('\\r') each of which returns False. also tried: line.find(r'\r') line.find('\\r') each of which returns -1

    Read the article

  • App Engine HTTP 500s

    - by pocoa
    This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application. I've handled all the situations, also DeadlineExceededError too. But sometimes I see these error messages in error logs. That request took about 10k ms, so it's not exceeded the limit too. But there is no other specific message about this error. All I know is that it returned HTTP 500. Is there anyone know the reason of these error messages? Thank you.

    Read the article

  • Storing a list of objects in GAE

    - by Dominic Bou-Samra
    I need to store some data that looks a little like this: xyz 123 abc 456 hij 678 rer 838 Now I would just store it as a traditional string and integer model, and put in the datastore. But the data changes regularly, and is ONLY relevant when looked at as a COLLECTION. So it needs to be store as either a list of lists, or a list of objects, both of which can't really be done without pickling as far as I know. Can anyone help? Even storing it as a text file may work :S

    Read the article

  • Dynamic Spacer in ReportLab

    - by ptikobj
    I'm automatically generating a PDF-file with Platypus that has dynamic content. This means that it might happen that the length of the text content (which is directly at the bottom of the pdf-file) may vary. However, it might happen that a page break is done in cases where the content is too long. This is because i use a "static" spacer: s = Spacer(width=0, height=23.5*cm) as i always want to have only one page, I somehow need to dynamically set the height of the Spacer, so that it takes the "rest" of the space that is on the page as its height. Now, how do i get the "rest" of height that is left on my page?

    Read the article

  • Reducing size of a character array in Numpy

    - by Morgoth
    Given a character array: In [21]: x = np.array(['a ','bb ','cccc ']) One can remove the whitespace using: In [22]: np.char.strip(x) Out[22]: array(['a', 'bb', 'cccc'], dtype='|S8') but is there a way to also shrink the width of the column to the minimum required size, in the above case |S4?

    Read the article

  • In Django, how to create tables from an SQL file when syncdb is run

    - by Sidney
    Hi, How do I make syncdb execute SQL queries (for table creation) defined by me, rather then generating tables automatically. I'm looking for this solution as some particular models in my app represent SQL-table-views for a legacy-database table. So, I've created their SQL-views in my django-DB like this: CREATE VIEW legacy_series AS SELECT * FROM legacy.series; I have a reverse engineered model that represents the above view/legacytable. But whenever I run syncdb, I have to create all the views first by running sql scripts, otherwise syncdb simply creates tables for them (if a view is not found). How do I make syncdb run the above mentioned SQL?

    Read the article

  • Speeding up templates in GAE-Py by aggregating RPC calls

    - by Sudhir Jonathan
    Here's my problem: class City(Model): name = StringProperty() class Author(Model): name = StringProperty() city = ReferenceProperty(City) class Post(Model): author = ReferenceProperty(Author) content = StringProperty() The code isn't important... its this django template: {% for post in posts %} <div>{{post.content}}</div> <div>by {{post.author.name}} from {{post.author.city.name}}</div> {% endfor %} Now lets say I get the first 100 posts using Post.all().fetch(limit=100), and pass this list to the template - what happens? It makes 200 more datastore gets - 100 to get each author, 100 to get each author's city. This is perfectly understandable, actually, since the post only has a reference to the author, and the author only has a reference to the city. The __get__ accessor on the post.author and author.city objects transparently do a get and pull the data back (See this question). Some ways around this are Use Post.author.get_value_for_datastore(post) to collect the author keys (see the link above), and then do a batch get to get them all - the trouble here is that we need to re-construct a template data object... something which needs extra code and maintenance for each model and handler. Write an accessor, say cached_author, that checks memcache for the author first and returns that - the problem here is that post.cached_author is going to be called 100 times, which could probably mean 100 memcache calls. Hold a static key to object map (and refresh it maybe once in five minutes) if the data doesn't have to be very up to date. The cached_author accessor can then just refer to this map. All these ideas need extra code and maintenance, and they're not very transparent. What if we could do @prefetch def render_template(path, data) template.render(path, data) Turns out we can... hooks and Guido's instrumentation module both prove it. If the @prefetch method wraps a template render by capturing which keys are requested we can (atleast to one level of depth) capture which keys are being requested, return mock objects, and do a batch get on them. This could be repeated for all depth levels, till no new keys are being requested. The final render could intercept the gets and return the objects from a map. This would change a total of 200 gets into 3, transparently and without any extra code. Not to mention greatly cut down the need for memcache and help in situations where memcache can't be used. Trouble is I don't know how to do it (yet). Before I start trying, has anyone else done this? Or does anyone want to help? Or do you see a massive flaw in the plan?

    Read the article

  • Iterating over key and value of defaultdict dictionaries

    - by gf
    The following works as expected: d = [(1,2), (3,4)] for k,v in d: print "%s - %s" % (str(k), str(v)) But this fails: d = collections.defaultdict(int) d[1] = 2 d[3] = 4 for k,v in d: print "%s - %s" % (str(k), str(v)) With: Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not iterable Why? How can i fix it?

    Read the article

  • Graphics glitch when drawing to a Cairo context obtained from a gtk.DrawingArea inside a gtk.Viewport.

    - by user410023
    I am trying to redraw the part of the DrawingArea that is visible in the Viewport in the expose-event handler. However, it seems that I am doing something wrong with the coordinates that are passed to the event handler because there is garbage at the edge of the Viewport when scrolling. Can anyone tell what I am doing wrong? Here is a small example: import pygtk pygtk.require("2.0") import gtk from numpy import array from math import pi class Circle(object): def init(self, position = [0., 0.], radius = 0., edge = (0., 0., 0.), fill = None): self.position = position self.radius = radius self.edge = edge self.fill = fill def draw(self, ctx): rect = array(ctx.clip_extents()) rect[2] -= rect[0] rect[3] -= rect[1] center = rect[2:4] / 2 ctx.arc(center[0], center[1], self.radius, 0., 2. * pi) if self.fill != None: ctx.set_source_rgb(*self.fill) ctx.fill_preserve() ctx.set_source_rgb(*self.edge) ctx.stroke() class Scene(object): class Proxy(object): directory = {} def init(self, target, layers = set()): self.target = target self.layers = layers Scene.Proxy.directory[target] = self def __init__(self, viewport): self.objects = {} self.layers = [set()] self.viewport = viewport self.signals = {} def draw(self, ctx): x = self.viewport.get_hadjustment().value y = self.viewport.get_vadjustment().value ctx.set_source_rgb(1., 1., 1.) ctx.paint() ctx.translate(x, y) for obj in self: obj.draw(ctx) def add(self, item, layer = 0): item = Scene.Proxy(item, layers = set((layer,))) assert(hasattr(item.target, "draw")) assert(isinstance(layer, int)) item.layers.add(layer) while not layer < len(self.layers): self.layers.append(set()) self.layers[layer].add(item) if not item in self.objects: self.objects[item] = set() self.objects[item].add(layer) def remove(self, item, layers = None): item = Scene.Proxy.directory[item] if layers == None: layers = self.objects[item] for layer in layers: layer.remove(item) item.layers.remove(layer) if len(item.layers) == 0: self.objects.remove(item) def __iter__(self): for layer in self.layers: for item in layer: yield item.target class App(object): def init(self): signals = { "canvas_exposed": self.update_canvas, "gtk_main_quit": gtk.main_quit } self.builder = gtk.Builder() self.builder.add_from_file("graphics_glitch.glade") self.window = self.builder.get_object("window") self.viewport = self.builder.get_object("viewport") self.canvas = self.builder.get_object("canvas") self.scene = Scene(self.viewport) signals.update(self.scene.signals) self.builder.connect_signals(signals) self.window.show() def update_canvas(self, widget, event): ctx = self.canvas.window.cairo_create() self.scene.draw(ctx) ctx.clip() if name == "main": app = App() scene = app.scene scene.add(Circle((0., 0.), 10.)) gtk.main() And the Glade file "graphics_glitch.glade": <?xml version="1.0"?> <interface> <requires lib="gtk+" version="2.16"/> <!-- interface-naming-policy project-wide --> <object class="GtkWindow" id="window"> <property name="width_request">200</property> <property name="height_request">200</property> <property name="visible">True</property> <signal name="destroy" handler="gtk_main_quit"/> <child> <object class="GtkScrolledWindow" id="scrolledwindow1"> <property name="visible">True</property> <property name="can_focus">True</property> <property name="hadjustment">h_adjust</property> <property name="vadjustment">v_adjust</property> <property name="hscrollbar_policy">automatic</property> <property name="vscrollbar_policy">automatic</property> <child> <object class="GtkViewport" id="viewport"> <property name="visible">True</property> <property name="resize_mode">queue</property> <child> <object class="GtkDrawingArea" id="canvas"> <property name="width_request">640</property> <property name="height_request">480</property> <property name="visible">True</property> <signal name="expose_event" handler="canvas_exposed"/> </object> </child> </object> </child> </object> </child> </object> <object class="GtkAdjustment" id="h_adjust"> <property name="lower">-1000</property> <property name="upper">1000</property> <property name="step_increment">1</property> <property name="page_increment">25</property> <property name="page_size">25</property> </object> <object class="GtkAdjustment" id="v_adjust"> <property name="lower">-1000</property> <property name="upper">1000</property> <property name="step_increment">1</property> <property name="page_increment">25</property> <property name="page_size">25</property> </object> </interface> Thanks! --Dan

    Read the article

  • method __getattr__ is not inherited from parent class

    - by ??????
    Trying to subclass mechanize.Browser class: from mechanize import Browser class LLManager(Browser, object): IS_AUTHORIZED = False def __init__(self, login = "", passw = "", *args, **kwargs): super(LLManager, self).__init__(*args, **kwargs) self.set_handle_robots(False) But when I make something like this: lm["Widget[LinksList]_link_1_title"] = anc then I get an error: Traceback (most recent call last): File "<pyshell#8>", line 1, in <module> lm["Widget[LinksList]_link_1_title"] = anc TypeError: 'LLManager' object does not support item assignment Browser class have overridden method __getattr__ as shown: def __getattr__(self, name): # pass through _form.HTMLForm methods and attributes form = self.__dict__.get("form") if form is None: raise AttributeError( "%s instance has no attribute %s (perhaps you forgot to " ".select_form()?)" % (self.__class__, name)) return getattr(form, name) Why my class or instance don't get this method as in parent class?

    Read the article

  • SQLAlchemy: select over multiple tables

    - by ahojnnes
    Hi, I wanted to optimize my database query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( not_(link_table.c.id.in_( select( columns=[request_table.c.recipient], whereclause=request_table.c.donator==donator.id ).as_scalar() )), link_table.c.id!=donator.id, ), limit=20, ).execute().fetchall() and tried to merge those two selects in one query: link_list = select( columns=[link_table.c.rating, link_table.c.url, link_table.c.donations_in], whereclause=and_( link_table.c.active==True, link_table.c.id!=donator.id, request_table.c.donator==donator.id, link_table.c.id!=request_table.c.recipient, ), limit=20, order_by=[link_table.c.rating.desc()] ).execute().fetchall() the database-schema looks like: link_table = Table('links', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('url', Unicode(250), index=True, unique=True), Column('registration_date', DateTime), Column('donations_in', Integer), Column('active', Boolean), ) request_table = Table('requests', metadata, Column('id', Integer, primary_key=True, autoincrement=True), Column('recipient', Integer, ForeignKey('links.id')), Column('donator', Integer, ForeignKey('links.id')), Column('date', DateTime), ) There are several links (donator) in request_table pointing to one link in the link_table. I want to have links from link_table, which are not yet "requested". But this does not work. Is it actually possible, what I'm trying to do? If so, how would you do that? Thank you very much in advance!

    Read the article

  • How to manually create a DBRef using pymongo?

    - by Soviut
    I want to create a DBRef manually so that I can add an additional field to it. However, when I try to pass the following: {'$ref': 'projects', '$id': '1029412409721', 'project_name': 'My Project'} Pymongo raises an error: pymongo.errors.InvalidName: key '$id' must not start with '$' It would seem that pymongo reserve the $ for the special key, leading me to wonder if it is even possible to do what I'm trying to do?

    Read the article

  • Allowing users to delete their own comments in Django

    - by RaDeuX
    I am using the delete() function from django.contrib.comments.views.moderation module. The staff-member is allowed to delete ANY comment posts, which is completely fine. However, I would also like to give registered non-staff members the privilege to delete their OWN comment posts, and their OWN only. How can I accomplish this?

    Read the article

  • web2py error while using distinct in the queries

    - by Steve
    Hi, I am using web2py with GAE. While using some of the queries which has a distinct clause, GAE throws out an error.I have pasted the Traceback. Can someone please help me out with this. In FILE: /base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py Traceback (most recent call last): File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/restricted.py", line 173, in restricted exec ccode in environment File "/base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py:profileview", line 263, in <module> File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/globals.py", line 96, in <lambda> self._caller = lambda f: f() File "/base/data/home/apps/panneersoda/1.341206242889687944/applications/init/controllers/default.py:profileview", line 97, in profileview File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/contrib/gql.py", line 675, in select (items, tablename, fields) = self._select(*fields, **attributes) File "/base/data/home/apps/panneersoda/1.341206242889687944/gluon/contrib/gql.py", line 624, in _select raise SyntaxError, 'invalid select attribute: %s' % key SyntaxError: invalid select attribute: distinct Thanks

    Read the article

  • downloading archives response corrupts files

    - by panchicore
    wrapper = FileWrapper(file("C:/pics.zip")) content_type = mimetypes.guess_type(result.files)[0] response = HttpResponse(wrapper, content_type=content_type) response['Content-Length'] = os.path.getsize("C:/pics.zip") response['Content-Disposition'] = "attachment; filename=pics.zip" return response pics.zip is a valid file with 3 pictures inside. server response the download, but when I am going to open the zip, winrar says This archive is either in unknown format or damaged! If I change the file path and the file name to a valid image C:/pic.jpg is downloaded damaged too. What Im missing in this download view?

    Read the article

< Previous Page | 359 360 361 362 363 364 365 366 367 368 369 370  | Next Page >