Scrapy Not Returning Additonal Info from Scraped Link in Item via Request Callback

Posted by zoonosis on Stack Overflow See other posts from Stack Overflow or by zoonosis
Published on 2012-09-05T07:39:13Z Indexed on 2012/09/06 15:38 UTC
Read the original article Hit count: 370

Filed under:

Basically the code below scrapes the first 5 items of a table. One of the fields is another href and clicking on that href provides more info which I want to collect and add to the original item. So parse is supposed to pass the semi populated item to parse_next_page which then scrapes the next bit and should return the completed item back to parse

Running the code below only returns the info collected in parse If I change the return items to return request I get a completed item with all 3 "things" but I only get 1 of the rows, not all 5. Im sure its something simple, I just can't see it.

class ThingSpider(BaseSpider):
name = "thing"
allowed_domains = ["somepage.com"]
start_urls = [
"http://www.somepage.com"
]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    items = []

    for x in range (1,6):
        item = ScrapyItem()
        str_selector = '//tr[@name="row{0}"]'.format(x)
        item['thing1'] = hxs.select(str_selector")]/a/text()').extract()
        item['thing2'] = hxs.select(str_selector")]/a/@href').extract()
        print 'hello'
        request = Request("www.nextpage.com", callback=self.parse_next_page,meta={'item':item})
        print 'hello2'
        request.meta['item'] = item
        items.append(item)      

    return items


def parse_next_page(self, response):
    print 'stuff'
    hxs = HtmlXPathSelector(response)
    item = response.meta['item']
    item['thing3'] = hxs.select('//div/ul/li[1]/span[2]/text()').extract()
    return item

Developer IT

Scrapy Not Returning Additonal Info from Scraped Link in Item via Request Callback - Developer IT

Scrapy Not Returning Additonal Info from Scraped Link in Item via Request Callback

python

xpath

scrapy

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about xpath

xpath query in a servlet gives exception

Xpath question Xml Xpath

XPath to find element based on another XPath element

php xpath query on and xpath result

how to use nokogiri methods .xpath & .at_xpath

Categories cloud