Search Results

Search found 1 results on 1 pages for 'sigpwned'.

Page 1/1 | 1 

  • Do not filter outlinks in Nutch?

    - by sigpwned
    I'm currently trying to perform a deep crawl within a small list of sites. To accomplish this, I updated conf/domain-urlfilter.txt with the domains of the sites I wish to scrape, which worked nicely. However, I found that not only were the links crawled at every step filtered, but the outlinks captured from each page crawled were filtered as well. Is there a way to avoid filtering captured outlinks while still filtering crawled URLs?

    Read the article

1