BeautifulSoup Parser Confusion - HTML
        Posted  
        
            by 
                lyngbym
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by lyngbym
        
        
        
        Published on 2011-01-08T20:50:24Z
        Indexed on 
            2011/01/08
            20:53 UTC
        
        
        Read the original article
        Hit count: 264
        
beautifulsoup
I'm trying to scrape some content off another site and I'm not sure why BeautifulSoup is producing this output. It is only finding a blank space inside the match, but the real HTML contains a large amount of markup. I apologize if this is something stupid on my part. I'm new to python.
Here's my code:
import sys
import os
import mechanize
import re
from BeautifulSoup import BeautifulSoup
def scrape_trails(BASE_URL, data):
    #Get the trail names
    soup = BeautifulSoup(data)
    sitesDiv = soup.findAll("div", attrs={"id" : "sitesDiv"})
    print sitesDiv
def main():
    BASE_URL = "http://www.dnr.state.mn.us/skiing/skipass/list.html"
    br = mechanize.Browser()
    data = br.open(BASE_URL).get_data()
    links = scrape_trails(BASE_URL, data)
if __name__ == '__main__':
    main()
If you follow that URL you can see the sitesDiv contains a lot of markup. I'm not sure if I'm doing something wrong or if this is just malformed markup that the script can't handle. Thanks!
© Stack Overflow or respective owner