Search Results

Search found 13534 results on 542 pages for 'python 3 3'.

Page 361/542 | < Previous Page | 357 358 359 360 361 362 363 364 365 366 367 368  | Next Page >

  • Scraping paginated items from a website using scrapy

    - by Mridang Agarwalla
    I'm using scrapy to scrape items from a site. I'm not being able to implement this scraping pattern. The site I'm trying to scrape is a forum and I scrape the site once a day. Each page has a table containing posts. New posts are added to the top of the table and as more and more posts are posted to the site, the older posts go further into the pages due to pagination. This is a very simple scenario and we will assume that the order of the posts never change. I would like to scrape this site and scrape all the "new" records until the last scraped post from yesterday is encountered. I have configured my spider to paginate endlessly and when it encounters yesterday's last scraped post, it should stop. How can implement this? (My Scrapy installation works with my Django installation using django-dynamic-scraper )

    Read the article

  • How do I get javascript results using selenium?

    - by Seth
    I have the following code: from selenium import selenium selenium = selenium("localhost", 4444, "*chrome", "http://some_site.com/") selenium.start() sel = selenium sel.open("/") sel.type("ctl00_ContentPlaceHolder1_SuburbTownTextBox", "Adelaide,SA,5000") sel.click("ctl00_ContentPlaceHolder1_SearchImageButton") #text = sel.get_body_text() text = sel.get_html_source() print(text) The click executes a javascript file which then produces results on the same page. Obviously print(text) will only print the orignal html source. How do I get to the results of the javascript?

    Read the article

  • Django Grouping Query

    - by Matt
    I have the following (simplified) models: class Donation(models.Model): entry_date = models.DateTimeField() class Category(models.Model): name = models.CharField() class Item(models.Model): donation = models.ForeignKey(Donation) category = models.ForeignKey(Category) I'm trying to display the total number of items, per category, grouped by the donation year. I've tried this: Donation.objects.extra(select={'year': "django_date_trunc('year', %s.entry_date)" % Donation._meta.db_table}).values('year', 'item__category__name').annotate(items=Sum('item__quantity')) But I get a Field Error on item__category__name. I've also tried: Item.objects.extra(select={"year": "django_date_trunc('year', entry_date)"}, tables=["donations_donation"]).values("year", "category__name").annotate(items=Sum("quantity")).order_by() Which generally gets me what I want, but the item quantity count is multiplied by the number of donation records. Any ideas? Basically I want to display this: 2010 - Category 1: 10 items - Category 2: 17 items 2009 - Category 1: 5 items - Category 3: 8 items

    Read the article

  • Combinatorial optimisation of a distance metric

    - by Jose
    I have a set of trajectories, made up of points along the trajectory, and with the coordinates associated with each point. I store these in a 3d array ( trajectory, point, param). I want to find the set of r trajectories that have the maximum accumulated distance between the possible pairwise combinations of these trajectories. My first attempt, which I think is working looks like this: max_dist = 0 for h in itertools.combinations ( xrange(num_traj), r): for (m,l) in itertools.combinations (h, 2): accum = 0. for ( i, j ) in itertools.izip ( range(k), range(k) ): A = [ (my_mat[m, i, z] - my_mat[l, j, z])**2 \ for z in xrange(k) ] A = numpy.array( numpy.sqrt (A) ).sum() accum += A if max_dist < accum: selected_trajectories = h This takes forever, as num_traj can be around 500-1000, and r can be around 5-20. k is arbitrary, but can typically be up to 50. Trying to be super-clever, I have put everything into two nested list comprehensions, making heavy use of itertools: chunk = [[ numpy.sqrt((my_mat[m, i, :] - my_mat[l, j, :])**2).sum() \ for ((m,l),i,j) in \ itertools.product ( itertools.combinations(h,2), range(k), range(k)) ]\ for h in itertools.combinations(range(num_traj), r) ] Apart from being quite unreadable (!!!), it is also taking a long time. Can anyone suggest any ways to improve on this?

    Read the article

  • How do I code this relationship in SQLAlchemy?

    - by Martin Del Vecchio
    I am new to SQLAlchemy (and SQL, for that matter). I can't figure out how to code the idea I have in my head. I am creating a database of performance-test results. A test run consists of a test type and a number (this is class TestRun below) A test suite consists the version string of the software being tested, and one or more TestRun objects (this is class TestSuite below). A test version consists of all test suites with the given version name. Here is my code, as simple as I can make it: from sqlalchemy import * from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import relationship, backref, sessionmaker Base = declarative_base() class TestVersion (Base): __tablename__ = 'versions' id = Column (Integer, primary_key=True) version_name = Column (String) def __init__ (self, version_name): self.version_name = version_name class TestRun (Base): __tablename__ = 'runs' id = Column (Integer, primary_key=True) suite_directory = Column (String, ForeignKey ('suites.directory')) suite = relationship ('TestSuite', backref=backref ('runs', order_by=id)) test_type = Column (String) rate = Column (Integer) def __init__ (self, test_type, rate): self.test_type = test_type self.rate = rate class TestSuite (Base): __tablename__ = 'suites' directory = Column (String, primary_key=True) version_id = Column (Integer, ForeignKey ('versions.id')) version_ref = relationship ('TestVersion', backref=backref ('suites', order_by=directory)) version_name = Column (String) def __init__ (self, directory, version_name): self.directory = directory self.version_name = version_name # Create a v1.0 suite suite1 = TestSuite ('dir1', 'v1.0') suite1.runs.append (TestRun ('test1', 100)) suite1.runs.append (TestRun ('test2', 200)) # Create a another v1.0 suite suite2 = TestSuite ('dir2', 'v1.0') suite2.runs.append (TestRun ('test1', 101)) suite2.runs.append (TestRun ('test2', 201)) # Create another suite suite3 = TestSuite ('dir3', 'v2.0') suite3.runs.append (TestRun ('test1', 102)) suite3.runs.append (TestRun ('test2', 202)) # Create the in-memory database engine = create_engine ('sqlite://') Session = sessionmaker (bind=engine) session = Session() Base.metadata.create_all (engine) # Add the suites in version1 = TestVersion (suite1.version_name) version1.suites.append (suite1) session.add (suite1) version2 = TestVersion (suite2.version_name) version2.suites.append (suite2) session.add (suite2) version3 = TestVersion (suite3.version_name) version3.suites.append (suite3) session.add (suite3) session.commit() # Query the suites for suite in session.query (TestSuite).order_by (TestSuite.directory): print "\nSuite directory %s, version %s has %d test runs:" % (suite.directory, suite.version_name, len (suite.runs)) for run in suite.runs: print " Test '%s', result %d" % (run.test_type, run.rate) # Query the versions for version in session.query (TestVersion).order_by (TestVersion.version_name): print "\nVersion %s has %d test suites:" % (version.version_name, len (version.suites)) for suite in version.suites: print " Suite directory %s, version %s has %d test runs:" % (suite.directory, suite.version_name, len (suite.runs)) for run in suite.runs: print " Test '%s', result %d" % (run.test_type, run.rate) The output of this program: Suite directory dir1, version v1.0 has 2 test runs: Test 'test1', result 100 Test 'test2', result 200 Suite directory dir2, version v1.0 has 2 test runs: Test 'test1', result 101 Test 'test2', result 201 Suite directory dir3, version v2.0 has 2 test runs: Test 'test1', result 102 Test 'test2', result 202 Version v1.0 has 1 test suites: Suite directory dir1, version v1.0 has 2 test runs: Test 'test1', result 100 Test 'test2', result 200 Version v1.0 has 1 test suites: Suite directory dir2, version v1.0 has 2 test runs: Test 'test1', result 101 Test 'test2', result 201 Version v2.0 has 1 test suites: Suite directory dir3, version v2.0 has 2 test runs: Test 'test1', result 102 Test 'test2', result 202 This is not correct, since there are two TestVersion objects with the name 'v1.0'. I hacked my way around this by adding a private list of TestVersion objects, and a function to find a matching one: versions = [] def find_or_create_version (version_name): # Find existing for version in versions: if version.version_name == version_name: return (version) # Create new version = TestVersion (version_name) versions.append (version) return (version) Then I modified my code that adds the records to use it: # Add the suites in version1 = find_or_create_version (suite1.version_name) version1.suites.append (suite1) session.add (suite1) version2 = find_or_create_version (suite2.version_name) version2.suites.append (suite2) session.add (suite2) version3 = find_or_create_version (suite3.version_name) version3.suites.append (suite3) session.add (suite3) Now the output is what I want: Suite directory dir1, version v1.0 has 2 test runs: Test 'test1', result 100 Test 'test2', result 200 Suite directory dir2, version v1.0 has 2 test runs: Test 'test1', result 101 Test 'test2', result 201 Suite directory dir3, version v2.0 has 2 test runs: Test 'test1', result 102 Test 'test2', result 202 Version v1.0 has 2 test suites: Suite directory dir1, version v1.0 has 2 test runs: Test 'test1', result 100 Test 'test2', result 200 Suite directory dir2, version v1.0 has 2 test runs: Test 'test1', result 101 Test 'test2', result 201 Version v2.0 has 1 test suites: Suite directory dir3, version v2.0 has 2 test runs: Test 'test1', result 102 Test 'test2', result 202 This feels wrong to me; it doesn't feel right that I am manually keeping track of the unique version names, and manually adding the suites to the appropriate TestVersion objects. Is this code even close to being correct? And what happens when I'm not building the entire database from scratch, as in this example. If the database already exists, do I have to query the database's TestVersion table to discover the unique version names? Thanks in advance. I know this is a lot of code to wade through, and I appreciate the help.

    Read the article

  • Avoiding nesting two for loops

    - by chavanak
    Hi, Please have a look at the code below: import string from collections import defaultdict first_complex=open( "residue_a_chain_a_b_backup.txt", "r" ) first_complex_lines=first_complex.readlines() first_complex_lines=map( string.strip, first_complex_lines ) first_complex.close() second_complex=open( "residue_a_chain_a_c_backup.txt", "r" ) second_complex_lines=second_complex.readlines() second_complex_lines=map( string.strip, second_complex_lines ) second_complex.close() list_1=[] list_2=[] for x in first_complex_lines: if x[0]!="d": list_1.append( x ) for y in second_complex_lines: if y[0]!="d": list_2.append( y ) j=0 list_3=[] list_4=[] for a in list_1: pass for b in list_2: pass if a==b: list_3.append( a ) kvmap=defaultdict( int ) for k in list_3: kvmap[k]+=1 print kvmap Normally I use izip or izip_longest to club two for loops, but this time the length of the files are different. I don't want a None entry. If I use the above method, the run time becomes incremental and useless. How am I supposed to get the two for loops going? Cheers, Chavanak

    Read the article

  • A graph-based tuple merge?

    - by user1644030
    I have paired values in tuples that are related matches (and technically still in CSV files). Neither of the paired values are necessarily unique. tupleAB = (A####, B###), (A###, B###), (A###, B###)... tupleBC = (B####, C###), (B###, C###), (B###, C###)... tupleAC = (A####, C###), (A###, C###), (A###, C###)... My ideal output would be a dictionary with a unique ID and a list of "reinforced" matches. The way I try to think about it is in a graph-based context. For example, if: tupleAB[x] = (A0001, B0012) tupleBC[y] = (B0012, C0230) tupleAC[z] = (A0001, C0230) This would produce: output = {uniquekey0001, [A0001, B0012, C0230]} Ideally, this would also be able to scale up to more than three tuples (for example, adding a "D" match that would result in an additional three tuples - AD, BD, and CD - and lists of four items long; and so forth). In regards to scaling up to more tuples, I am open to having "graphs" that aren't necessarily fully connected, i.e., every node connected to every other node. My hunch is that I could easily filter based on the list lengths. I am open to any suggestions. I think, with a few cups of coffee, I could work out a brute force solution, but I thought I'd ask the community if anyone was aware of a more elegant solution. Thanks for any feedback.

    Read the article

  • Technique to remove common words(and their plural versions) from a string

    - by Jake M
    I am attempting to find tags(keywords) for a recipe by parsing a long string of text. The text contains the recipe ingredients, directions and a short blurb. What do you think would be the most efficient way to remove common words from the tag list? By common words, I mean words like: 'the', 'at', 'there', 'their' etc. I have 2 methodologies I can use, which do you think is more efficient in terms of speed and do you know of a more efficient way I could do this? Methodology 1: - Determine the number of times each word occurs(using the library Collections) - Have a list of common words and remove all 'Common Words' from the Collection object by attempting to delete that key from the Collection object if it exists. - Therefore the speed will be determined by the length of the variable delims import collections from Counter delim = ['there','there\'s','theres','they','they\'re'] # the above will end up being a really long list! word_freq = Counter(recipe_str.lower().split()) for delim in set(delims): del word_freq[delim] return freq.most_common() Methodology 2: - For common words that can be plural, look at each word in the recipe string, and check if it partially contains the non-plural version of a common word. Eg; For the string "There's a test" check each word to see if it contains "there" and delete it if it does. delim = ['this','at','them'] # words that cant be plural partial_delim = ['there','they',] # words that could occur in many forms word_freq = Counter(recipe_str.lower().split()) for delim in set(delims): del word_freq[delim] # really slow for delim in set(partial_delims): for word in word_freq: if word.find(delim) != -1: del word_freq[delim] return freq.most_common()

    Read the article

  • foreignkey problem

    - by realshadow
    Hey, Imagine you have this model: class Category(models.Model): node_id = models.IntegerField(primary_key = True) type_id = models.IntegerField(max_length = 20) parent_id = models.IntegerField(max_length = 20) sort_order = models.IntegerField(max_length = 20) name = models.CharField(max_length = 45) lft = models.IntegerField(max_length = 20) rgt = models.IntegerField(max_length = 20) depth = models.IntegerField(max_length = 20) added_on = models.DateTimeField(auto_now = True) updated_on = models.DateTimeField(auto_now = True) status = models.IntegerField(max_length = 20) node = models.ForeignKey(Category_info, verbose_name = 'Category_info', to_field = 'node_id' The important part is the foreignkey. When I try: Category.objects.filter(type_id = 15, parent_id = offset, status = 1) I get an error that get returned more than category, which is fine, because it is supposed to return more than one. But I want to filter the results trough another field, which would be type id (from the second Model) Here it is: class Category_info(models.Model): objtree_label_id = models.AutoField(primary_key = True) node_id = models.IntegerField(unique = True) language_id = models.IntegerField() label = models.CharField(max_length = 255) type_id = models.IntegerField() The type_id can be any number from 1 - 5. I am desparately trying to get only one result where the type_id would be number 1. Here is what I want in sql: SELECT c.*, ci.* FROM category c JOIN category_info ci ON (c.node_id = ci.node_id) WHERE c.type_id = 15 AND c.parent_id = 50 AND ci.type_id = 1 Any help is GREATLY appreciated. Regards

    Read the article

  • Process Name with inotify tool

    - by Priyanka
    Hi ALl Is there a way to find out which process is causing a particular event to occur? In the inotify-tool version 3.13, in struct_kernel , pid,uid and gid are used but it is not handled in the source code. Is there any latest patch available for the same. Please let me know.

    Read the article

  • verbose_name for a model's method

    - by mawimawi
    How can I set a verbose_name for a model's method, so that it might be displayed in the admin's change_view form? example: class Article(models.Model): title = models.CharField(max_length=64) created_date = models.DateTimeField(....) def created_weekday(self): return self.created_date.strftime("%A") in admin.py: class ArticleAdmin(admin.ModelAdmin): readonly_fields = ('created_weekday',) fields = ('title', 'created_weekday') Now the label for created_weekday is "Created Weekday", but I'd like it to have a different label which should be i18nable using ugettext_lazy as well. I've tried created_weekday.verbose_name=... after the method, but that did not show any result. Is there a decorator or something I can use, so I could make my own "verbose_name" / "label" / whateverthename is?

    Read the article

  • Generating two thumbnails from the same image in Django

    - by Titus
    Hello, this seems like quite an easy problem but I can't figure out what is going on here. Basically, what I'd like to do is create two different thumbnails from one image on a Django model. What ends up happening is that it seems to be looping and recreating the same image (while appending an underscore to it each time) until it throws up an error that the filename is to big. So, you end up something like: OSError: [Errno 36] File name too long: 'someimg________________etc.jpg' Here is the code: def save(self, *args, **kwargs): if self.image: iname = os.path.split(self.image.name)[-1] fname, ext = os.path.splitext(iname) tlname, tsname = fname + '_thumb_l' + ext, fname + '_thumb_s' + ext self.thumb_large.save(tlname, make_thumb(self.image, size=(250,250))) self.thumb_small.save(tsname, make_thumb(self.image, size=(100,100))) super(Artist, self).save(*args, **kwargs) def make_thumb(infile, size=(100,100)): infile.seek(0) image = Image.open(infile) if image.mode not in ('L', 'RGB'): image.convert('RGB') image.thumbnail(size, Image.ANTIALIAS) temp = StringIO() image.save(temp, 'png') return ContentFile(temp.getvalue()) I didn't show imports for the sake of brevity. Assume there are two ImageFields on the Artist model: thumb_large, and thumb_small. If this isn't the correct way to do it, I'd appreciate any feedback. Thanks!

    Read the article

  • Using Windows 7 taskbar features in PyQt

    - by Matze
    Hi all. I am looking for information on the integration of some of the new Windows 7 taskbar features into my PyQt applications. Specifically if there already exists the possibility to use the new progress indicator (see here) and the quick links (www.petri.co.il/wp-content/uploads/new_win7_taskbar_features_8.gif). If anyone could provide a few links or just a "not implemented yet", I'd be very grateful. Thanks a lot.

    Read the article

  • Programatically Determining Bin Path

    - by Andy
    I'm working on a web app called pj and there is a bin file and a src folder. The relative paths before I deploy the app will look something like: pj/bin and pj/src/pj/script.py. However, after deployment, the relative paths will look like: pj_dep/deployed/bin and pj_dep/deployed/lib/python2.6/site-packages/pj/script.py Question: Within script.py, I am trying to find the path of a file in the bin directory. This leads to 2 different behaviors in the dev and deployment environment. If I do os.path.join(os.path.dirname(__file__), 'bin') to try to get the path for the dev environment, I will have a different path for the deployment environment. Is there a more generalized way I can find the bin directory so that I do not need to rely on an if statement to determine how many directories to go up based on the current env? This doesn't seem flexible and might cause other issues later on when the code is moved.

    Read the article

  • How do I set up my own proxy server?

    - by NJTechGuy
    This website (abc.com) slowed access from our original IP address. How do I implement my own proxy server to hide my IP while browsing abc.com? Do I need special hardware/software combo to achieve this? If I can generate about 5 proxies and alternate amongst those 5 while browsing abc.com would be awesome. Please suggest. Thanks guys!

    Read the article

  • How do I store multiple copies of the same field in Django?

    - by Alistair
    I'm storing OLAC metadata which describes linguistic resources. Many of the elements of the metadata are repeatable -- for example, a resource can have two languages, three authors and four dates associated with it. Is there any way of storing this in one model? It seems like overkill to define a model for each repeatable metadata element -- especially since the models will only have one field: it's value.

    Read the article

  • SQLAlchemy - select for update example

    - by Mark
    I'm looking for a complete example of using select for update in SQLAlchemy, but haven't found one googling. I need to lock a single row and update a column, the following code doesn't work (blocks forever): s = table.select(table.c.user=="test",for_update=True) u = table.update().where(table.c.user=="test") u.execute(email="foo") Do I need a commit? How do I do that? As far as I know you need to: begin transaction select ... for update update commit

    Read the article

  • Import boto from local library

    - by ensnare
    I'm trying to use boto as a downloaded library, rather than installing it globally on my machine. I'm able to import boto, but when I run boto.connect_dynamodb() I get an error: ImportError: No module named dynamodb.layer2 Here's my file structure: project/ project/ __init__.py libraries/ __init__.py flask/ boto/ views/ .... modules/ __init__.py db.py .... templates/ .... static/ .... runserver.py And the contents of the relevant files as follows: project/project/modules/db.py from project.libraries import boto conn = boto.connect_dynamodb( aws_access_key_id='<YOUR_AWS_KEY_ID>', aws_secret_access_key='<YOUR_AWS_SECRET_KEY>') What am I doing wrong? Thanks in advance.

    Read the article

  • Scrapy Not Returning Additonal Info from Scraped Link in Item via Request Callback

    - by zoonosis
    Basically the code below scrapes the first 5 items of a table. One of the fields is another href and clicking on that href provides more info which I want to collect and add to the original item. So parse is supposed to pass the semi populated item to parse_next_page which then scrapes the next bit and should return the completed item back to parse Running the code below only returns the info collected in parse If I change the return items to return request I get a completed item with all 3 "things" but I only get 1 of the rows, not all 5. Im sure its something simple, I just can't see it. class ThingSpider(BaseSpider): name = "thing" allowed_domains = ["somepage.com"] start_urls = [ "http://www.somepage.com" ] def parse(self, response): hxs = HtmlXPathSelector(response) items = [] for x in range (1,6): item = ScrapyItem() str_selector = '//tr[@name="row{0}"]'.format(x) item['thing1'] = hxs.select(str_selector")]/a/text()').extract() item['thing2'] = hxs.select(str_selector")]/a/@href').extract() print 'hello' request = Request("www.nextpage.com", callback=self.parse_next_page,meta={'item':item}) print 'hello2' request.meta['item'] = item items.append(item) return items def parse_next_page(self, response): print 'stuff' hxs = HtmlXPathSelector(response) item = response.meta['item'] item['thing3'] = hxs.select('//div/ul/li[1]/span[2]/text()').extract() return item

    Read the article

  • Which software for intranet CMS - Django or Joomla?

    - by zalun
    In my company we are thinking of moving from wiki style intranet to a more bespoke CMS solution. Natural choice would be Joomla, but we have a specific architecture. There is a few hundred people who will use the system. System should be self explainable (easier than wiki). We use a lot of tools web, applications and integrated within 3rd party software. The superior element which is a glue for all of them is API. In example for the intranet tools we do use Django, but it's used without ORM, kind of limited to templates and url - every application has an adequate methods within our API. We do not use the Django admin interface, because it is hardly dependent on ORM. Because of that Joomla may be hard to integrate. Every employee should be able to edit most of the pages, authentication and privileges have to be managed by our API. How hard is it to plug Joomla to use a different authentication process? (extension only - no hacks) If one knows Django better than Joomla, should Django be used?

    Read the article

< Previous Page | 357 358 359 360 361 362 363 364 365 366 367 368  | Next Page >