zeynel - Developer IT

Scrapy domain_name for spider

- by Zeynel

From the Scrapy tutorial: domain_name: identifies the Spider. It must be unique, that is, you can’t set the same domain name for different Spiders. Does this mean that domain_name must be a valid domain name, like domain_name = 'example.com' Or can I name domain_name = 'ex1' The problem is I had a spider that worked with domain name domain_name = 'whitecase.com' Now I created a new spider as an instance of CrawlSpider and named it domain_name = 'wc2' but I am getting the error "could not find spider for domain "wc2""

Read the article

Scrapy spider is not working

- by Zeynel

Since nothing so far is working I started a new project with python scrapy-ctl.py startproject Nu I followed the tutorial exactly, and created the folders, and a new spider from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from scrapy.item import Item from Nu.items import NuItem from urls import u class NuSpider(CrawlSpider): domain_name = "wcase" start_urls = ['http://www.whitecase.com/aabbas/'] names = hxs.select('//td[@class="altRow"][1]/a/@href').re('/.a\w+') u = names.pop() rules = (Rule(SgmlLinkExtractor(allow=(u, )), callback='parse_item'),) def parse(self, response): self.log('Hi, this is an item page! %s' % response.url) hxs = HtmlXPathSelector(response) item = Item() item['school'] = hxs.select('//td[@class="mainColumnTDa"]').re('(?<=(JD,\s))(.*?)(\d+)') return item SPIDER = NuSpider() and when I run C:\Python26\Scripts\Nu>python scrapy-ctl.py crawl wcase I get [Nu] ERROR: Could not find spider for domain: wcase The other spiders at least are recognized by Scrapy, this one is not. What am I doing wrong? Thanks for your help!

Read the article

Django QuerySet API: How do I join iexact and icontains?

- by Zeynel

Hello, I have this join: lawyers = Lawyer.objects.filter(last__iexact=last_name).filter(first__icontains=first_name) This is the site If you try Last Name: Abbas and First Name: Amr it tells you that amr abbas has 1 schoolmates. But if you try First name only it says that there are no lawyers in the database called amr (obviously there is). If I change (last__iexact=last_name) to (last__icontains=last_name) then leaving Last Name blank works fine and amr is found. But with last__icontains=last_name if you search for "collin" you also get "collins" and "collingwood" which is not what I want. Do you know how I can use iexact and also have it ignored if it is blank? Thanks This is the view function: def search_form(request): if request.method == 'POST': search_form = SearchForm(request.POST) if search_form.is_valid(): last_name = search_form.cleaned_data['last_name'] first_name = search_form.cleaned_data['first_name'] lawyers = Lawyer.objects.filter(last__iexact=last_name).filter(first__icontains=first_name) if len(lawyers)==0: form = SearchForm() return render_to_response('not_in_database.html', {'last': last_name, 'first': first_name, 'form': form}) if len(lawyers)>1: form = SearchForm(initial={'last_name': last_name}) return render_to_response('more_than_1_match.html', {'lawyers': lawyers, 'last': last_name, 'first': first_name, 'form': form}) q_school = Lawyer.objects.filter(last__icontains=last_name).filter(first__icontains=first_name).values_list('school', flat=True) q_year = Lawyer.objects.filter(last__icontains=last_name).filter(first__icontains=first_name).values_list('year_graduated', flat=True) lawyers1 = Lawyer.objects.filter(school__iexact=q_school[0]).filter(year_graduated__icontains=q_year[0]).exclude(last__icontains=last_name) form = SearchForm() return render_to_response('search_results.html', {'lawyers': lawyers1, 'last': last_name, 'first': first_name, 'form': form}) else: form = SearchForm() return render_to_response('search_form.html', {'form': form, })

Developer IT