Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

Posted by Samsun Knight on Stack Overflow See other posts from Stack Overflow or by Samsun Knight
Published on 2013-06-29T16:18:03Z Indexed on 2013/06/29 16:21 UTC
Read the original article Hit count: 898

Filed under:

html5lib

so I'm trying to scrape an Amazon page for data, and I'm getting an error when I try to parse for where the seller is located. Here's my code:

#getting the html
request = urllib2.Request('http://www.amazon.com/gp/offer-listing/0393934241/')
opener = urllib2.build_opener()
#hiding that I'm a webscraper
request.add_header('User-Agent', 'Mozilla/5 (Solaris 10) Gecko')
#opening it up, putting into soup form
html = opener.open(request).read()
soup = BeautifulSoup(html, "html5lib")

#parsing for the seller info
sellers = soup.findAll('div', {'class' : 'a-row a-spacing-medium olpOffer'})
for eachseller in sellers:
    #parsing for price
    price = eachseller.find('span', {'class' : 'a-size-large a-color-price olpOfferPrice a-text-bold'})
    #parsing for shipping costs
    shippingprice = eachseller.find('span'
    , {'class' : 'olpShippingPrice'})
    #parsing for condition
    condition = eachseller.find('span', {'class' : 'a-size-medium'})
    #parsing for seller name
    sellername = eachseller.find('b')
     #parsing for seller location
    location = eachseller.find('div', {'class' : 'olpAvailability'})

    #printing it all out
    print "price, " + price.string + ", shipping price, " + shippingprice.string + ", condition," + condition.string + ", seller name, " + sellername.string + ", location, " + location.string

I get the error message, pertaining to the 'print' command at the end, "TypeError: coercing to Unicode: need string or buffer, NoneType found"

I know that it's coming from this line - location = eachseller.find('div', {'class' : 'olpAvailability'}) - because the code works fine without that line, and I know that I'm getting NoneType because the line isn't finding anything. Here's the html from the section I'm looking to parse:

<*div class="olpAvailability">
    In Stock. 
        Ships from WI, United States.
    <*br/><*a href="/gp/aag/details/ref=olp_merch_ship_9/175-0430757-3801038?ie=UTF8&amp;asin=0393934241&amp;seller=A1W2IX7T37FAMZ&amp;sshmPath=shipping-rates#aag_shipping">Domestic shipping rates</a>
         and <*a href="/gp/aag/details/ref=olp_merch_return_9/175-0430757-3801038?ie=UTF8&amp;asin=0393934241&amp;seller=A1W2IX7T37FAMZ&amp;sshmPath=returns#aag_returns">return policy</a>.
<*/div>

(but without the stars - just making sure the HTML doesn't compile out of code form)

I don't see what's the problem with the 'location' line of code, or why it's not pulling the data I want. Help?

Developer IT

Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found - Developer IT

Parsing with BeautifulSoup, error message TypeError: coercing to Unicode: need string or buffer, NoneType found

python

web-scraping

beautifulsoup

web-crawler

html5lib

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about web-scraping

Webscraping Google tasks via Google Calendar

Source for Names to use in web scraping

Web scraping with Python

Web scraping with Python

Python web scraping involving HTML tags with attributes

Categories cloud