posting nutch data into a BASIC auth secured Solr instance

Posted by mlathe on Stack Overflow See other posts from Stack Overflow or by mlathe
Published on 2010-02-22T20:48:45Z Indexed on 2010/05/11 10:14 UTC
Read the original article Hit count: 656

Filed under:
|
|
|

Hi. I've secured a solr instance using BASIC auth, kind of how it is shown here: http://blog.comtaste.com/2009/02/securing_your_solr_server_on_t.html

Now i'm trying to update my batch processes to push data into the authenticated instance. The ones using "curl" are easy, but i also have a Nutch crawl that uses the "solrindex" command to push data into Solr. When i do that i get this error:

2010-02-22 12:09:28,226 INFO auth.AuthChallengeProcessor - basic authentication scheme selected 2010-02-22 12:09:28,229 INFO httpclient.HttpMethodDirector - No credentials available for BASIC 'Tomcat Manager Application'@ninja:5500 2010-02-22 12:09:28,236 WARN mapred.LocalJobRunner - job_local_0001 org.apache.solr.common.SolrException: Unauthorized

Unauthorized

request: http://ninja:5500/solr/foo/update?wt=javabin&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:69) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170) 2010-02-22 12:09:29,134 FATAL solr.SolrIndexer - SolrIndexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:73) at org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:95) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:104)

Apparently nutch uses SolrJ to push the content, and after going through the solrj code, it's clear that it uses commons-httpclient without providing a way to set the credentials.

Here are my question(s)

  1. Is this possible to do? ie push from nutch into a BASIC auth secured Solr instance?
  2. Is it possible to tell commons-httpclient about a credential without explicitly doing an _httpclient.getState().setCredentials(...)?
  3. Anyother ideas? One idea i had was to use an IPfiltering Valve for just the "update" Solr webservices. That would mean you could only make an update call from certain nodes.

Thanks

© Stack Overflow or respective owner

Related posts about solr

Related posts about nutch