Do not filter outlinks in Nutch?

Posted by sigpwned on Stack Overflow See other posts from Stack Overflow or by sigpwned
Published on 2013-10-28T03:51:46Z Indexed on 2013/10/28 3:53 UTC
Read the original article Hit count: 133

Filed under:

I'm currently trying to perform a deep crawl within a small list of sites. To accomplish this, I updated conf/domain-urlfilter.txt with the domains of the sites I wish to scrape, which worked nicely. However, I found that not only were the links crawled at every step filtered, but the outlinks captured from each page crawled were filtered as well.

Is there a way to avoid filtering captured outlinks while still filtering crawled URLs?

© Stack Overflow or respective owner

Related posts about nutch