posting nutch data into a BASIC auth secured Solr instance
- by mlathe
Hi. I've secured a solr instance using BASIC auth, kind of how it is shown here:
http://blog.comtaste.com/2009/02/securing_your_solr_server_on_t.html
Now i'm trying to update my batch processes to push data into the authenticated instance. The ones using "curl" are easy, but i also have a Nutch crawl that uses the "solrindex" command to push data into Solr. When i do that i get this error:
  2010-02-22 12:09:28,226 INFO 
  auth.AuthChallengeProcessor - basic
  authentication scheme selected
  2010-02-22 12:09:28,229 INFO 
  httpclient.HttpMethodDirector - No
  credentials available for BASIC
  'Tomcat Manager
  Application'@ninja:5500 2010-02-22
  12:09:28,236 WARN 
  mapred.LocalJobRunner - job_local_0001
  org.apache.solr.common.SolrException:
  Unauthorized
  
  Unauthorized
  
  request:
  http://ninja:5500/solr/foo/update?wt=javabin&version=2.2
    at
  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)
    at
  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
    at
  org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
    at
  org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
    at
  org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:69)
    at
  org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
    at
  org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
    at
  org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
  2010-02-22 12:09:29,134 FATAL
  solr.SolrIndexer - SolrIndexer:
  java.io.IOException: Job failed!  at
  org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at
  org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:73)
    at
  org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:95)
    at
  org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at
  org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:104)
Apparently nutch uses SolrJ to push the content, and after going through the solrj code, it's clear that it uses commons-httpclient without providing a way to set the credentials.
Here are my question(s)
Is this possible to do? ie push
from nutch into a BASIC auth secured
Solr instance?
Is it possible to tell commons-httpclient about a credential without explicitly doing an _httpclient.getState().setCredentials(...)?
Anyother ideas? One idea i had was to use an IPfiltering Valve for just the "update" Solr webservices. That would mean you could only make an update call from certain nodes.
Thanks