Scraping html WITHOUT uniquie identifiers using python
        Posted  
        
            by 
                Nicholas Law
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Nicholas Law
        
        
        
        Published on 2013-10-22T21:43:36Z
        Indexed on 
            2013/10/22
            21:53 UTC
        
        
        Read the original article
        Hit count: 257
        
I would like to design an algorithm using python that scrapes thousands of pages like this one and this one, gathers all the data and inserts it into a MySQL database. The script will be run on a weekly or bi-weekly basis to update the database of any new information added to each individual page.
Ideally I would like a scraper that is easy to work with for table structured data but also data that does not have unique identifiers (ie. id and classes attributes).
Which scraper add-on should I use? BeautifulSoup, Scrapy or Mechanize?
Are there any particular tutorials/books I should be looking at for this desired result?
In the long-run I will be implementing a mobile app that works with all this data through querying the database.
© Stack Overflow or respective owner