Search Results

Search found 13816 results on 553 pages for 'python larry'.

Page 359/553 | < Previous Page | 355 356 357 358 359 360 361 362 363 364 365 366 | Next Page >

Scraping paginated items from a website using scrapy

- by Mridang Agarwalla

I'm using scrapy to scrape items from a site. I'm not being able to implement this scraping pattern. The site I'm trying to scrape is a forum and I scrape the site once a day. Each page has a table containing posts. New posts are added to the top of the table and as more and more posts are posted to the site, the older posts go further into the pages due to pagination. This is a very simple scenario and we will assume that the order of the posts never change. I would like to scrape this site and scrape all the "new" records until the last scraped post from yesterday is encountered. I have configured my spider to paginate endlessly and when it encounters yesterday's last scraped post, it should stop. How can implement this? (My Scrapy installation works with my Django installation using django-dynamic-scraper )

Read the article
Graphics glitch when drawing to a Cairo context obtained from a gtk.DrawingArea inside a gtk.Viewport.

- by user410023

I am trying to redraw the part of the DrawingArea that is visible in the Viewport in the expose-event handler. However, it seems that I am doing something wrong with the coordinates that are passed to the event handler because there is garbage at the edge of the Viewport when scrolling. Can anyone tell what I am doing wrong? Here is a small example: import pygtk pygtk.require("2.0") import gtk from numpy import array from math import pi class Circle(object): def init(self, position = [0., 0.], radius = 0., edge = (0., 0., 0.), fill = None): self.position = position self.radius = radius self.edge = edge self.fill = fill def draw(self, ctx): rect = array(ctx.clip_extents()) rect[2] -= rect[0] rect[3] -= rect[1] center = rect[2:4] / 2 ctx.arc(center[0], center[1], self.radius, 0., 2. * pi) if self.fill != None: ctx.set_source_rgb(*self.fill) ctx.fill_preserve() ctx.set_source_rgb(*self.edge) ctx.stroke() class Scene(object): class Proxy(object): directory = {} def init(self, target, layers = set()): self.target = target self.layers = layers Scene.Proxy.directory[target] = self def __init__(self, viewport): self.objects = {} self.layers = [set()] self.viewport = viewport self.signals = {} def draw(self, ctx): x = self.viewport.get_hadjustment().value y = self.viewport.get_vadjustment().value ctx.set_source_rgb(1., 1., 1.) ctx.paint() ctx.translate(x, y) for obj in self: obj.draw(ctx) def add(self, item, layer = 0): item = Scene.Proxy(item, layers = set((layer,))) assert(hasattr(item.target, "draw")) assert(isinstance(layer, int)) item.layers.add(layer) while not layer < len(self.layers): self.layers.append(set()) self.layers[layer].add(item) if not item in self.objects: self.objects[item] = set() self.objects[item].add(layer) def remove(self, item, layers = None): item = Scene.Proxy.directory[item] if layers == None: layers = self.objects[item] for layer in layers: layer.remove(item) item.layers.remove(layer) if len(item.layers) == 0: self.objects.remove(item) def __iter__(self): for layer in self.layers: for item in layer: yield item.target class App(object): def init(self): signals = { "canvas_exposed": self.update_canvas, "gtk_main_quit": gtk.main_quit } self.builder = gtk.Builder() self.builder.add_from_file("graphics_glitch.glade") self.window = self.builder.get_object("window") self.viewport = self.builder.get_object("viewport") self.canvas = self.builder.get_object("canvas") self.scene = Scene(self.viewport) signals.update(self.scene.signals) self.builder.connect_signals(signals) self.window.show() def update_canvas(self, widget, event): ctx = self.canvas.window.cairo_create() self.scene.draw(ctx) ctx.clip() if name == "main": app = App() scene = app.scene scene.add(Circle((0., 0.), 10.)) gtk.main() And the Glade file "graphics_glitch.glade": <?xml version="1.0"?> <interface> <requires lib="gtk+" version="2.16"/>  <object class="GtkWindow" id="window"> <property name="width_request">200</property> <property name="height_request">200</property> <property name="visible">True</property> <signal name="destroy" handler="gtk_main_quit"/> <child> <object class="GtkScrolledWindow" id="scrolledwindow1"> <property name="visible">True</property> <property name="can_focus">True</property> <property name="hadjustment">h_adjust</property> <property name="vadjustment">v_adjust</property> <property name="hscrollbar_policy">automatic</property> <property name="vscrollbar_policy">automatic</property> <child> <object class="GtkViewport" id="viewport"> <property name="visible">True</property> <property name="resize_mode">queue</property> <child> <object class="GtkDrawingArea" id="canvas"> <property name="width_request">640</property> <property name="height_request">480</property> <property name="visible">True</property> <signal name="expose_event" handler="canvas_exposed"/> </object> </child> </object> </child> </object> </child> </object> <object class="GtkAdjustment" id="h_adjust"> <property name="lower">-1000</property> <property name="upper">1000</property> <property name="step_increment">1</property> <property name="page_increment">25</property> <property name="page_size">25</property> </object> <object class="GtkAdjustment" id="v_adjust"> <property name="lower">-1000</property> <property name="upper">1000</property> <property name="step_increment">1</property> <property name="page_increment">25</property> <property name="page_size">25</property> </object> </interface> Thanks! --Dan

Read the article
Programatically Determining Bin Path

- by Andy

I'm working on a web app called pj and there is a bin file and a src folder. The relative paths before I deploy the app will look something like: pj/bin and pj/src/pj/script.py. However, after deployment, the relative paths will look like: pj_dep/deployed/bin and pj_dep/deployed/lib/python2.6/site-packages/pj/script.py Question: Within script.py, I am trying to find the path of a file in the bin directory. This leads to 2 different behaviors in the dev and deployment environment. If I do os.path.join(os.path.dirname(__file__), 'bin') to try to get the path for the dev environment, I will have a different path for the deployment environment. Is there a more generalized way I can find the bin directory so that I do not need to rely on an if statement to determine how many directories to go up based on the current env? This doesn't seem flexible and might cause other issues later on when the code is moved.

Read the article
functional correctness of wiki mersenne twister psuedocode

- by calccrypto

can anyone tell me if the mersenne twister psuedocode on this page is the same as the code here? if they are not the same, which one is correct?

Read the article
the error "invalid literal for int() with base 10:" keeps coming up

- by ratce003

I'm trying to write a very simple program, I want to print out the sum of all the multiples of 3 and 5 below 100, but, an error keeps accuring, saying "invalid literal for int() with base 10:" my program is as follows: sum = "" sum_int = int(sum) for i in range(1, 101): if i % 5 == 0: sum += i elif i % 3 == 0: sum += i else: sum += "" print sum Any help would be much appreciated.

Read the article
Django database caching

- by hekevintran

I have a Django form that uses an integer field to lookup a model object by its primary key. The form has a save() method that uses the model object referred to by the integer field. The model's manager's get() method is called twice, once in the clean method and once in the save() method: class MyForm(forms.Form): id_a = fields.IntegerField() def clean_id_a(user_id): id_a = self.cleaned_data['id_a'] try: # here is the first call to get MyModel.objects.get(id=id_a) except User.DoesNotExist: raise ValidationError('Object does not exist') def save(self): id_a = self.cleaned_data['id_a'] # here is the second call to get my_model_object = MyModel.objects.get(id=id_a) # do other stuff I wasn't sure whether this hits the database two times or one time so I returned the object itself in the clean method so that I could avoid a second get() call. Does calling get() hit the database two times? Or is the object cached in the thread? class MyForm(forms.Form): id_a = fields.IntegerField() def clean_id_a(user_id): id_a = self.cleaned_data['id_a'] try: # here is my workaround return MyModel.objects.get(id=id_a) except User.DoesNotExist: raise ValidationError('Object does not exist') def save(self): # looking up the cleaned value returns the model object my_model_object = self.cleaned_data['id_a'] # do other stuff

Read the article
Non-global middleware in Django

- by hekevintran

In Django there is a settings file that defines the middleware to be run on each request. This middleware setting is global. Is there a way to specify a set of middleware on a per-view basis? I want to have specific urls use a set of middleware different from the global set.

Read the article
PyEphem (sunrise / sunset time calculation) equivalent in C#

- by dassouki

PyEphem is a neat library that allows easy calculation of sunrise, set, dawn, and dusk of location based on lat, long, and UTC timestamp I don't want to go through the calculation myself, rather I was wondering if there is an existing library I could use

Read the article
What can be done in Cpython that can not be done in IronPython?

- by WeNeedAnswers

What can be done in Cpython that can not be done in IronPython?

Read the article
In Elixir or SQLAlchemy, is there a way to also store a comment for a/each field in my entities?

- by kchau

Our project is basically a web interface to several systems of record. We have many tables mapped, and the names of each column aren't as well named and intuitive as we'd like... The users would like to know what data fields are available (i.e. what's been mapped from the database). But, it's pointless to just give them column names like: USER_REF1, USER_REF2, etc. So, I was wondering, is there a way to provide a comment in the declaration of my field? E.g. class SegregationCode(Entity): using_options(tablename="SEGREGATION_CODES") segCode = Field(String(20), colname="CODE", ... primary_key=True) #Have a comment attr too? If not, any suggestions?

Read the article
how to search for file's has a known file extension like .py ??

- by Rami Jarrar

how to search for file's has a known file extension like .py ?? fext = raw_input("Put file extension to search: ") dir = raw_input("Dir to search in: ") ##Search for the file and get the right one's

Read the article
'NoneType' object has no attribute 'data'

- by Bill Jordan

Hello guys, I am sending a SOAP request to my server and getting the response back. sample of the response string is shown below: <?xml version = '1.0' ?> <env:Envelope xmlns:env=http:////www.w3.org/2003/05/soap-envelop . .. .. <env:Body> <epas:get-all-config-resp xmlns:epas="urn:organization:epas:soap"> ^M ... ... <epas:property name="Tom">12</epas:property> > > <epas:property name="Alice">34</epas:property> > > <epas:property name="John">56</epas:property> > > <epas:property name="Danial">78</epas:property> > > <epas:property name="George">90</epas:property> > > <epas:property name="Luise">11</epas:property> ... ^M </env:Body? </env:Envelop> What I noticed in the response is that there is an extra character shown in the body which is "^M". Not sure if this could be the issue. Note the ^M shown! when I tried parsing the string returned from the server to get the names and values using the code sample: elements = minidom.parseString(xmldoc).getElementsByTagName("property") myDict = {} for element in elements: myDict[element.getAttribute('name')] = element.firstChild.data But, I am getting this error: 'NoneType' object has no attribute 'data'. May be its something to do with the "^M" shown on the xml response back! Any ideas/comments would be appreciated, Cheers

Read the article
How to clone a mercurial repository over an ssh connection initiated by fabric when http authorizati

- by Monika Sulik

I'm attempting to use fabric for the first time and I really like it so far, but at a certain point in my deployment script I want to clone a mercurial repository. When I get to that point I get an error: err: abort: http authorization required My repository requires http authorization and fabric doesn't prompt me for the user and password. I can get around this by changing my repository address from: https://hostname/repository to: https://user:password@hostname/repository But for various reasons I would prefer not to go this route. Are there any other ways in which I could bypass this problem?

Read the article
downloading archives response corrupts files

- by panchicore

wrapper = FileWrapper(file("C:/pics.zip")) content_type = mimetypes.guess_type(result.files)[0] response = HttpResponse(wrapper, content_type=content_type) response['Content-Length'] = os.path.getsize("C:/pics.zip") response['Content-Disposition'] = "attachment; filename=pics.zip" return response pics.zip is a valid file with 3 pictures inside. server response the download, but when I am going to open the zip, winrar says This archive is either in unknown format or damaged! If I change the file path and the file name to a valid image C:/pic.jpg is downloaded damaged too. What Im missing in this download view?

Read the article
Selective emboldeing of text in a webpage

- by Eknath Iyer

while printing out utf-8 characters onto a webpage, if encapsulate them with they get emboldened, but anything else, the page turns blank. Why? def main(): print "Content-type: text/html\r\n\r\n"; print '<html>' print '<head>' print '<style type="text/css">' print '.highlight { background-color: yellow }' print '.color1 { color: green; }' print '.color2 { color: blue; }' print '.color3 { color: purple; }' print '.color4 { color: red; }' print '.color5 { color: teal; }' print '.color6 { color: yellow; }' print '.color7 { color: orange; }' print '.color8 { color: violet; }' print '</style></head>' print '<body>' form = cgi.FieldStorage() ch = form.getvalue('choice') if ch == 'English': in_sent = form.getvalue('f1') in_sent = in_sent.lower() cho=0 elif ch == 'Hindi': in_sent = trans_he(form.getvalue('transl1').decode("utf-8")).strip() cho=1 #cho = 0 for english #cho = 1 for hindi adict=[] print '<center><u> User Input Sentence ==> <b>', in_sent,'</b></u></center><br>' in_sent=in_sent.strip().split(' ') colordict={} counter=1 for word in in_sent: colordict[word]=counter counter = counter + 1 f = open('bidirectional.alignment.txt','rb').read() records=f.strip().split('\n\n\n') for record in records: el=[] el2 = [] #basic file processing is done here. record = record.strip().split('\n') source = record[cho] target = record[(cho+1)%2] source_sent = source.split(' # ')[1] target_sent = target.split(' # ')[1] source_words = source_sent.strip().split(' ') target_words = target_sent.strip().split(' ') trans_index = source.split(' # ')[2].strip().split(' ') for word in in_sent: if word in source_words: if int(trans_index[source_words.index(word)]) > 0: tword=target_words[(int(trans_index[source_words.index(word)])-1)] target_sent = target_sent.replace(tword+' ','<b>'+tword+' </b>') # When the <b> tag is used here(for the 'target_sent = ...' statement). it is fine. But when <b> is replaced by something like in the next line or even <i> or <u>, it doesn't show an output at all source_sent = source_sent.replace(word+' ','<span class="color1">'+word+' </span>') el2.append(source_sent) el2.append(target_sent) el.append(target_sent.count('<b>')) el.append(el2) if target_sent.count('<b>') > 0: adict.append(el) print '<table><tr><td><center><h1>SOURCE LANGUAGE</h1></center></td><td><center> <h1>TARGET LANGUAGE</h1></center></td></tr>' for entry in adict: print '<tr><td>',entry[1][0],'</td><td>',trans_eh(entry[1][1]).encode("utf-8"),'</td> </tr>' print '</table></body>' print '</html>' main()

Read the article
How to create instances of related models in Django

- by sevennineteen

I'm working on a CMSy app for which I've implemented a set of models which allow for creation of custom Template instances, made up of a number of Fields and tied to a specific Customer. The end-goal is that one or more templates with a set of custom fields can be defined through the Admin interface and associated to a customer, so that customer can then create content objects in the format prescribed by the template. I seem to have gotten this hooked up such that I can create any number of Template objects, but I'm struggling with how to create instances - actual content objects - in those templates. For example, I can define a template "Basic Page" for customer "Acme" which has the fields "Title" and "Body", but I haven't figured out how to create Basic Page instances where these fields can be filled in. Here are my (somewhat elided) models... class Customer(models.Model): ... class Field(models.Model): ... class Template(models.Model): label = models.CharField(max_length=255) clients = models.ManyToManyField(Customer, blank=True) fields = models.ManyToManyField(Field, blank=True) class ContentObject(models.Model): label = models.CharField(max_length=255) template = models.ForeignKey(Template) author = models.ForeignKey(User) customer = models.ForeignKey(Customer) mod_date = models.DateTimeField('Modified Date', editable=False) def __unicode__(self): return '%s (%s)' % (self.label, self.template) def save(self): self.mod_date = datetime.datetime.now() super(ContentObject, self).save() Thanks in advance for any advice!

Read the article
Technique to remove common words(and their plural versions) from a string

- by Jake M

I am attempting to find tags(keywords) for a recipe by parsing a long string of text. The text contains the recipe ingredients, directions and a short blurb. What do you think would be the most efficient way to remove common words from the tag list? By common words, I mean words like: 'the', 'at', 'there', 'their' etc. I have 2 methodologies I can use, which do you think is more efficient in terms of speed and do you know of a more efficient way I could do this? Methodology 1: - Determine the number of times each word occurs(using the library Collections) - Have a list of common words and remove all 'Common Words' from the Collection object by attempting to delete that key from the Collection object if it exists. - Therefore the speed will be determined by the length of the variable delims import collections from Counter delim = ['there','there\'s','theres','they','they\'re'] # the above will end up being a really long list! word_freq = Counter(recipe_str.lower().split()) for delim in set(delims): del word_freq[delim] return freq.most_common() Methodology 2: - For common words that can be plural, look at each word in the recipe string, and check if it partially contains the non-plural version of a common word. Eg; For the string "There's a test" check each word to see if it contains "there" and delete it if it does. delim = ['this','at','them'] # words that cant be plural partial_delim = ['there','they',] # words that could occur in many forms word_freq = Counter(recipe_str.lower().split()) for delim in set(delims): del word_freq[delim] # really slow for delim in set(partial_delims): for word in word_freq: if word.find(delim) != -1: del word_freq[delim] return freq.most_common()

Read the article
Sending data from one Protocol to another Protocol in Twisted?

- by veb

Hi! One of my protocols is connected to a server, and with the output of that I'd like to send it to the other protocol. I need to access the 'msg' method in ClassA from ClassB but I keep getting: exceptions.AttributeError: 'NoneType' object has no attribute 'write' Actual code: http://pastebin.com/MQPhduSY Any ideas please? :-)

Read the article
What are some strategies to add spell checking to a Google App Engine program?

- by btol45

I'm working on a Google App Engine program that will require some basic spell checking features. Normally iSpell or it's cousins would be options, but I'm not sure that will work in GEA. Are there other strategies/tools that would work in that environment?

Read the article
Can you separate water from oil [closed]

- by John Tyler

You know all that oil in the gulf coast, can we recover any of it and put it in cars? I mean jesus man, if we can't I am really mad about it. But if we can get it back it's like wutevskies.

Read the article
Is it good to use django 1.1 on app engine?

- by Software Enthusiastic

Hi We are planning a web application to buid on google app engine platform. My query is, is it good to use django 1.1 framework to develop google app engine applciation. If not, could you please suggest me the best option available, which has good tutorials and learning resource... Thank you very much.

Read the article
Sort a list of tuples without case sensitivity

- by dound

How can I efficiently and easily sort a list of tuples without being sensitive to case? For example this: [('a', 'c'), ('A', 'b'), ('a', 'a'), ('a', 5)] Should look like this once sorted: [('a', 5), ('a', 'a'), ('A', 'b'), ('a', 'c')] The regular lexicographic sort will put 'A' before 'a' and yield this: [('A', 'b'), ('a', 5), ('a', 'a'), ('a', 'c')]

Read the article
BeautifulSoup, but for CSS?

- by MTsoul

BeautifulSoup parses HTML and offers various ways to manipulate and search within HTML. Is there something similar for CSS? Specifically, I'd like to know if a given HTML text is rendered as bold. Either it has an ancestor that is the <strong> or the <bold> tag (which can be done with BeautifulSoup), or it has an ancestor (or itself) that has CSS attributes with font-weight: bold. Is this possible without resulting to writing my own library?

Read the article
I am trying to move a rectangle in Pygame using coordinates but won't work

- by user1821449

this is my code import pygame from pygame.locals import * import sys pygame.init() pygame.display.set_caption("*no current mission*") size = (1280, 750) screen = pygame.display.set_mode(size) clock = pygame.time.Clock() bg = pygame.image.load("bg1.png") guy = pygame.image.load("hero_stand.png") rect = guy.get_rect() x = 10 y = 10 while True: for event in pygame.event.get(): if event.type == pygame.QUIT: sys.exit() if event.type == KEYDOWN: _if event.key == K_RIGHT: x += 5 rect.move(x,y)_ rect.move(x,y) screen.blit(bg,(0,0)) screen.blit(guy, rect) pygame.display.flip() it is just a simple test to see if i can get a rectangle to move. Everything seems to work except the code I put in italic.

Read the article
how to import a.py not a folder

- by zjm1126

zjm_code |-----a.py |-----a |----- __init__.py |-----b.py in a.py is : c='ccc' in b.py is : import a print dir(a) when i execute b.py ,it show (it import 'a' folder): ['__builtins__', '__doc__', '__file__', '__name__', '__path__'] and when i delete a folder, it show ,(it import a.py): ['__builtins__', '__doc__', '__file__', '__name__', 'c'] so my question is : how to import a.py via not delete a folder thanks

Read the article

< Previous Page | 355 356 357 358 359 360 361 362 363 364 365 366 | Next Page >