Writing a PHP web crawler using cron
        Posted  
        
            by 
                Horse
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Horse
        
        
        
        Published on 2011-01-11T16:42:58Z
        Indexed on 
            2011/01/11
            16:53 UTC
        
        
        Read the original article
        Hit count: 290
        
Hi all
I have written myself a web crawler using simplehtmldom, and have got the crawl process working quite nicely. It crawls the start page, adds all links into a database table, sets a session pointer, and meta refreshes the page to carry onto the next page. That keeps going until it runs out of links
That works fine however obviously the crawl time for larger websites is pretty tedious. I wanted to be able to speed things up a bit though, and possibly make it a cron job.
Any ideas on making it as quick and efficient as possible other than setting the memory limit / execution time higher?
© Stack Overflow or respective owner