lucene - Page 8 - Developer IT

How to use NGramTokenizerFactory or NGramFilterFactory?

- by user572485

Hi, Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned. I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word. I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me? Thanks!

Read the article

How to optimize indexing of large number of DB records using Zend_Lucene and Zend_Paginator

- by jdichev

So I have this cron script that is deployed and ran using Cron on a host and indexes all the records in a database table - the index is later used both for the front end of the site and the backed operations as well. After the operation, the index is about 3-4 MB. The problem is it takes a lot of resources (CPU: 30+ and a good chunk of memory) and slows the machine down. My question is about how to optimize the operation described below: First there is a select query built using the Zend Framework API, this query is then passed to a Paginator factory that returns a paginator which I am using to balance the current number of items being indexed and not iterate over too much items. The script is iterating over the current items in the paginator object using a foreach loop until reaching the end and then it starts from the beginning after getting items for the next page. I am suspecting this overhead is caused by the Zend_Lucene but no idea how this could be improved.

Read the article

Is it possible for lucene to store the index only in one file and no other assisting files shall be

- by Akhil

When we create a lucene index, various files are created. If we donot optimize Index writer three files are created, one named _0.cfs which contains all of the index data and two other files conataining meta data. Is it possible to force lucene to create only one file instead of three.

Read the article

Is MongoDB a valid alternative to relational db + lucene?

- by Hugo

Hi! On a new project I need a hard use of lucene for a searcher implementation. This searcher will be a very important (and big) piece of the project. Is valid or convenient replacing Relational Database + Lucene with MongoDb? Thanks.

Read the article

Remove results below a certain score threshold in Solr/Lucene?

- by snickernet

Hi Guys, Is there a built-in functionalities in solr/lucene to filter the results if they fall below a certain score threshold? Let's say if I provide a score threshold of .2, then all documents with score less than .2 will be removed from my results. My intuition is that this is possible by updating/customizing solr or lucene. Could you point me to right direction on how to do this? Thanks in advance!

Read the article

does lucene search function work in large size document?

- by shaon-fan

Hi,there I have a problem when do search with lucene. First, in lucene indexing function, it works well to huge size document. such as .pst file, the outlook mail storage. It can build indexing file include all the information of .pst. The only problem is to large sometimes, include very much words. So when i search using lucene, it only can process the front part of this indexing file, if one word come out the back part of the indexing file, it couldn't find this word and no hits in result. But when i separate this indexing file to several parts in stupid way when debugging, and searching every parts, it can work well. So i want to know how to separate indexing file, how much size should be the limit of searching? cheers and wait 4 reply. ++++++++++++++++++++++++++++++++++++++++++++++++++ hi,there, follow Coady siad, i set the length to max 2^31-1. But the search result still can't include what i want. simply, i convert the doc word to string array[] to analyze, one doc word has 79680 words include the space and any symbol. when i search certain word, it just return 300 count, actually it has more than 300 results. The same reason, when i search a word in back part of the doc, it also couldn't find. //////////////set the length idexwriter.SetMaxFieldLength(2147483647); ////////////////////search IndexSearcher searcher = new ndexSearcher(Program.Parameters["INDEX_LOCATION"].ToString()); Hits hits = searcher.Search(query); This is my code, as others same. I found that problem when i need to count every word hits in a doc. So i also found it couldn't search word in back part of doc. pls help me to find, is there any set searcher length somewhere? how u meet this problem.

Read the article

Apache Lucene: Is Relevance Score Always Between 0 and 1?

- by Eamorr

Greetings, I have the following Apache Lucene snippet that's giving me some nice results: int numHits=100; int resultsPerPage=100; IndexSearcher searcher=new IndexSearcher(reader); TopScoreDocCollector collector=TopScoreDocCollector.create(numHits,true); Query q=parser.parse(queryString); searcher.search(q,collector); ScoreDoc[] hits=collector.topDocs(0*resultsPerPage,resultsPerPage).scoreDocs; Results r=new Results(); r.length=hits.length; for(int i=0;i<hits.length;i++){ Document doc=searcher.doc(hits[i].doc); double distanceKm=getGreatCircleDistance(lucene2double(doc.get("lat")), lucene2double(doc.get("lng")), Double.parseDouble(userLat), Double.parseDouble(userLng)); double newRelevance=((1/distanceKm)*Math.log(hits[i].score)/Math.log(2))*(0-1); System.out.println(hits[i].doc+"\t"+hits[i].score+"\t"+doc.get("content")+"\t"+"Km="+distanceKm+"\trlvnc="+String.valueOf(newRelevance)); } What I want to know, is hits[i].score always between 0 and 1? It seems that way, but I can't be sure. I've even checked the Lucene documentation (class ScoreDocs) to no avail. You'll see I'm calculating the log of the "newRelevance" value, which is based on hits[i].score. I need hits[i].score to be between 0 and 1, because if it is below zero, I'll get an error; above 1 and the sign will change from negative to positive. I hope some Lucene expert out there can offer me some insight. Many thanks,

Read the article

Using Lucene to index private data, should I have a separate index for each user or a single index

- by Nathan Bayles

I am developing an Azure based website and I want to provide search capabilities using Lucene. (structured json objects would be indexed and stored in Lucene and other content such as Word documents, etc. would be indexed in lucene but stored in blob storage) I want the search to be secure, such that one user would never see a document belonging to another user. I want to allow ad-hoc searches as typed by the user. Lastly, I want to query programmatically to return predefined sets of data, such as "all notes for user X". I think I understand how to add properties to each document to achieve these 3 objectives. (I am listing them here so if anyone is kind enough to answer, they will have better idea of what I am trying to do) My questions revolve around performance and security. Can I improve document security by having a separate index for each user, or is including the user's ID as a parameter in each search sufficient? Can I improve indexing speed and total throughput of the system by having a separate index for each user? My thinking is that having separate indexes would allow me to scale the system by having multiple index writers (perhaps even on different server instances) working at the same time, each on their own index. Any insight would be greatly appreciated. Regards, Nate

Read the article

Best practices for combining Lucene.NET and a relational database?

- by FlySwat

I'm working on a project where I will have a LOT of data, and it will be searchable by several forms that are very efficiently expressed as SQL Queries, but it also needs to be searched via natural language processing. My plan is to build an index using Lucene for this form of search. My question is that if I do this, and perform a search, Lucene will then return the ID's of matching documents in the index, I then have to lookup these entities from the relational database. This could be done in two ways (That I can think of so far): N amount of queries (Horrible) Pass all the ID's to a stored procedure at once (Perhaps as a comma delimited parameter). This has the downside of being limited to the max parameter size, and the slow performance of a UDF to split the string into a temporary table. I'm almost tempted to mirror everything into lucenes index, so that I can periodicly generate the index from the backing store, but only need to access it for the frontend. Advice?

Read the article

Can documents indexed with Solr on JDK6 be retrieved using only lucene api on JDK1.4?

- by huynhjl

My runtime environment is still on JDK1.4 but I like the Solr features related to how documents are ingested and indexed. Would I be able to index my documents using Solr offline on a recent version of the JDK, copy the index over and use it in my runtime environment with an older version of the JDK? Version wise, Solr 1.4.0 uses Apache Lucene 2.9.1 which is JDK1.4 compatible. (but Solr itself requires JDK5). Assuming what I'm trying to do is even possible, what features would I lose if I search Solr indices only with the Lucene API?

Read the article

How to do query auto-completion/suggestions in Lucene?

- by Mat Mannion

I'm looking for a way to do query auto-completion/suggestions in Lucene. I've Googled around a bit and played around a bit, but all of the examples I've seen seem to be setting up filters in Solr. We don't use Solr and aren't planning to move to using Solr in the near future, and Solr is obviously just wrapping around Lucene anyway, so I imagine there must be a way to do it! I've looked into using EdgeNGramFilter, and I realise that I'd have to run the filter on the index fields and get the tokens out and then compare them against the inputted Query... I'm just struggling to make the connection between the two into a bit of code, so help is much appreciated! To be clear on what I'm looking for (I realised I wasn't being overly clear, sorry) - I'm looking for a solution where when searching for a term, it'd return a list of suggested queries. When typing 'inter' into the search field, it'll come back with a list of suggested queries, such as 'internet', 'international', etc.

Read the article

Can a raw Lucene index be loaded by Solr?

- by wynz

Some colleagues of mine have a large Java web app that uses a search system built with Lucene Java. What I'd like to do is have a nice HTTP-based API to access those existing search indexes. I've used Nutch before and really liked how simple the OpenSearch implementation made it to grab results as RSS. I've tried setting Solr's dataDir in solrconfig.xml, hoping it would happily pick up the existing index files, but it seems to just ignore them. My main question is: Can Solr be used to access Lucene indexes created elsewhere? Or might there be a better solution?

Read the article

Lucene.Net: How can I add a date filter to my search results?

- by rockinthesixstring

I've got my searcher working really well, however it does tend to return results that are obsolete. My site is much like NerdDinner whereby events in the past become irrelevant. I'm currently indexing like this Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False) Dim doc As Document = New Document doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED)) doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing, "User" & searchableEvent.User.ID, searchableEvent.User.UserName), Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED)) doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED)) writer.AddDocument(doc) writer.Optimize() writer.Close() Return True End Function Notice how I have a "date" index that stores the event date. My search then looks like this ''# code omitted Dim reader As IndexReader = IndexReader.Open(luceneDirectory) Dim searcher As IndexSearcher = New IndexSearcher(reader) Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer()) Dim query As Query = parser.Parse(q.ToLower) ''# We're using 10,000 as the maximum number of results to return ''# because I have a feeling that we'll never reach that full amount ''# anyways. And if we do, who in their right mind is going to page ''# through all of the results? Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000) Dim doc As Document = Nothing ''# loop through the topDocs and grab the appropriate 10 results based ''# on the submitted page number While i <= last AndAlso i < topDocs.totalHits doc = searcher.Doc(topDocs.scoreDocs(i).doc) IDList.Add(doc.[Get]("id")) i += 1 End While ''# code omitted I did try the following, but it was to no avail (threw a NullReferenceException). While i <= last AndAlso i < topDocs.totalHits If Date.Parse(doc.[Get]("date")) >= Date.Today Then doc = searcher.Doc(topDocs.scoreDocs(i).doc) IDList.Add(doc.[Get]("id")) i += 1 End If End While I also found the following documentation, but I can't make heads or tails of it http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html

Read the article

Can I integrate Solr with Sharepoint with out using Lucene Connector Framework.

- by Rohan Patil

Can I integrate Solr with Sharepoint with out using Lucene Connector Framework. if so should I make Solr Index the Sharepoint's underlying database ? Will this produce successful search results ?

Read the article

Is there a way to include a list of exception words in Zend lucene Search?

- by Amit

I have Zend Lucene Search Algorithm implemented in my code. Some of the keywords are not returning any results. The keywords for which no results are returned are included in list of Stopwords. Is there a way to include certain stopwords while searching???

Read the article

CF9's Apache Lucene vs SQL Server's full text search?

- by Henry

ColdFusion 9's full text search is now based on Apache Lucene. We also use SQL Server. Which one's better? Which one's easier? Thanks!

Read the article

Hibernate/Lucene/HibernateSearch: find all words that start with specific prefix.

- by Giuseppe

I want to get a list of all words in a database table that start with a specific prefix. I've been looking for a way to query the terms in a Lucene index (I need the terms, I don't care about the documents they are from) but without success. Any ideas?

Read the article

In Lucene, using a Standard Analyzer, I want to make fields with spaces and special characters searc

- by Matt

In Lucene, using a Standard Analyzer, I want to make fields with spaces and special characters(underscore,!,@,#,....) searchable. I set IndexField to NOT_ANALYZED_NO_NORMS and Field.Store.YES When I look at my index in LUKE, the fields are as I expected, a value such as: 'SKU Number', yet when I search for 'SKU' or 'SKU*' nothing comes up. What am I missing?

Read the article

Lucene query: bla~* (match words that start with something fuzzy), how?

- by Skipperkongen

In the Lucene query syntax I'd like to combine * and ~ in a valid query similar to: bla~* //invalid query Meaning: Please match words that begin with "bla" or something similar to "bla".

Read the article

How can I search on a list of values using Solr/Lucene?

- by Mike

Given the following query: (field:value1 OR field:value2 OR field:value3 OR ... OR field:value50) Can this be broken down into something less verbose? Basically I have hundreds of category IDs, and I need to search for items under large groups of category IDs (20-50 at a time). In MySQL, I'd just use field IN(value1, value2, value3) rather than (field = value1 OR field = value2 etc...). Is there a simpler way for Solr/Lucene?

Read the article

Does Lucene use fuzzy matching with Custom Analyzer containing stop word filter?

- by iamrohitbanga

I am noticing strange behavior with Lucene. Could you confirm that if I use fuzzy matching with QueryParser and a custom analyzer that applies standardfilter, stopfilter and porterstemfilter?

Read the article

Lucene Search for documents that have a particular field?

- by RP

Lucene.Net - Is there a way to query for documents that contain a particular field. Lets say some of my documents have a field 'foo' and some do not. I want to find all documents that have the field 'foo' - regardless of what the value of foo is. How do I do this? Is it some sort of TermQuery?

Read the article

OutOfMemoryError: Java heap space error when start solr

- by Hamid

Hi I start indexing DB articles with solr, but after add about 58 million article (and about 113 GB size of disk) , i get below error message on tomcat log error Note1: i already set Init memory pool to 256MB, and Max memory pool:1400MB to tomcat server. Note2: I can post or search article but must wait over 3 min for get response. 8-apr-2010 14:27:07 org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:89) at org.apache.lucene.search.HitQueue.<init>(HitQueue.java:67) at org.apache.lucene.search.TopScoreDocCollector.<init>(TopScoreDocCollector.java:113) at org.apache.lucene.search.TopScoreDocCollector.<init>(TopScoreDocCollector.java:37) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.<init>(TopScoreDocCollector.java:42) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.<init>(TopScoreDocCollector.java:40) at org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:100) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:979) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527) at java.lang.Thread.run(Unknown Source) What's problem ? Have any suggestion ? Thanks in advanced

Read the article

Lucene.NET vs SQL Server Full-text – Generating a million records and a full-text index

In this article we will take a look at how SQL Server performs with one million records in a table. We will create a quick data pumper program to fill up a table with a million dynamically created rows of data. From there we will take a look at querying the data without any special optimizations. Then we will create a Full-text index and see how that helps our searching capability.

Read the article

Lucene.NET and searching on multiple fields with specific values...

- by Kieron

Hi, I've created an index with various bits of data for each document I've added, each document can differ in it field name. Later on, when I come to search the index I need to query it with exact field/ values - for example: FieldName1 = X AND FieldName2 = Y AND FieldName3 = Z What's the best way of constructing the following using Lucene .NET: What analyser is best to use for this exact match type? Upon retrieving a match, I only need one specific field to be returned (which I add to each document) - should this be the only one stored? Later on I'll need to support keyword searching (so a field can have a list of values and I'll need to do a partial match). The fields and values come from a Dictionary<string, string>. It's not user input, it's constructed from code. Thanks, Kieron

Search Results

Search found 393 results on 16 pages for 'lucene'.

Page 8/16 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

- by user572485

- by jdichev

- by Akhil

- by Hugo

- by snickernet

- by shaon-fan

- by Eamorr

- by Nathan Bayles

- by FlySwat

- by huynhjl

- by Mat Mannion

- by wynz

- by rockinthesixstring

- by Rohan Patil

- by Amit

- by Henry

- by Giuseppe

- by Matt

- by Skipperkongen

- by Mike

- by iamrohitbanga

- by RP

- by Hamid

- by Kieron

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >