Search Results

Search found 393 results on 16 pages for 'lucene'.

Page 2/16 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Using Lucene to Query File properties in Windows

- by sneha

Hi All, I am planning to use Apache lucense in one of my projects, I want to index files based on the file properties (I won’t be indexing the data) and I want lucense to query the index so that I can quickly find list of files to based on the properties . E.g: give me all the files with access time greater than 10/10/2005 and access time less than 10/04/2010 and file created by james. Can i use Lucene for these kind of projects ? or i better of using windows search (the foor print is very heavy almost 5 MB :( ) and i have to bundling this as part of my application is seems to tough. Can you please suggest is there any better alternatives here?

Read the article
Indexing large DB's with Lucene/PHP

- by thebluefox

Afternoon chaps, Trying to index a 1.7million row table with the Zend port of Lucene. On small tests of a few thousand rows its worked perfectly, but as soon as I try and up the rows to a few tens of thousands, it times out. Obviously, I could increase the time php allows the script to run, but seeing as 360 seconds gets me ~10,000 rows, I'd hate to think how many seconds it'd take to do 1.7million. I've also tried making the script run a few thousand, refresh, and then run the next few thousand, but doing this clears the index each time. Any ideas guys? Thanks :)

Read the article
My Lucene queries only ever find one hit

- by Bob

I'm getting started with Lucene.Net (stuck on version 2.3.1). I add sample documents with this: Dim indexWriter = New IndexWriter(indexDir, New Standard.StandardAnalyzer(), True) Dim doc = Document() doc.Add(New Field("Title", "foo", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO)) doc.Add(New Field("Date", DateTime.UtcNow.ToString, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO)) indexWriter.AddDocument(doc) indexWriter.Close() I search for documents matching "foo" with this: Dim searcher = New IndexSearcher(indexDir) Dim parser = New QueryParser("Title", New StandardAnalyzer()) Dim Query = parser.Parse("foo") Dim hits = searcher.Search(Query) Console.WriteLine("Number of hits = " + hits.Length.ToString) No matter how many times I run this, I only ever get one result. Any ideas?

Read the article
Lucene .NET IndexWriter lock

- by Pini Salim

My question related to the next code snippet: static void Main(string[] args) { Lucene.Net.Store.Directory d = FSDirectory.Open(new DirectoryInfo(/*my index path*/)); IndexWriter writer = new IndexWriter(d, new WhitespaceAnalyzer()); //Exiting without closing the indexd writer... } In this test, I opened an IndexWriter without closing it - so even after the test exits, the write.lock file still exists in the index directory, so I expected that the next time I open an instance of IndexWriter to that index, a LockObatinFailedException will be thrown. Can someone please explain to me why am I wrong? I mean, does the meaning of the write.lock file is to protect creation of two IndexWriters in the same process only? that doesnt seems the right answer to me...

Read the article
Lucene .Net Searching with TermVector

- by Ashish

in Lucene.Net,i am creating the document for searching a word and want to display before 10 words and after 10 words.i have used TermVector. Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field("content", content, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.TOKENIZED, Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS); Can anyone help me how to find out the keyword position and extract nearest 15 words. please send some code. Thanks Ashish

Read the article
how do i filter my lucene search results?

- by Andrew Bullock

Say my requirement is "search for all users by name, who are over 18" If i were using SQL, i might write something like: Select * from [Users] Where ([firstname] like '%' + @searchTerm + '%' OR [lastname] like '%' + @searchTerm + '%') AND [age] >= 18 However, im having difficulty translating this into lucene.net. This is what i have so far: var parser = new MultiFieldQueryParser({ "firstname", "lastname"}, new StandardAnalyser()); var luceneQuery = parser.Parse(searchterm) var query = FullTextSession.CreateFullTextQuery(luceneQuery, typeof(User)); var results = query.List<User>(); How do i add in the "where age = 18" bit? I've heard about .SetFilter(), but this only accepts LuceneQueries, and not IQueries. If SetFilter is the right thing to use, how do I make the appropriate filter? If not, what do I use and how do i do it? Thanks! P.S. This is a vastly simplified version of what I'm trying to do for clarity, my WHERE clause is actually a lot more complicated than shown here. In reality i need to check if ids exist in subqueries and check a number of unindexed properties. Any solutions given need to support this. Thanks

Read the article
"no inclosing instance error " while getting top term frequencies for document from Lucene index

- by Julia

Hello ! I am trying to get the most occurring term frequencies for every particular document in Lucene index. I am trying to set the treshold of top occuring terms that I care about, maybe 20 However, I am getting the "no inclosing instance of type DisplayTermVectors is accessible" when calling Comparator... So to this function I pass vector of every document and max top terms i would like to know protected static Collection getTopTerms(TermFreqVector tfv, int maxTerms){ String[] terms = tfv.getTerms(); int[] tFreqs = tfv.getTermFrequencies(); List result = new ArrayList(terms.length); for (int i = 0; i < tFreqs.length; i++) { TermFrq tf = new TermFrq(terms[i], tFreqs[i]); result.add(tf); } Collections.sort(result, new FreqComparator()); if(maxTerms < result.size()){ result = result.subList(0, maxTerms); } return result; } /Class for objects to hold the term/freq pairs/ static class TermFrq{ private String term; private int freq; public TermFrq(String term,int freq){ this.term = term; this.freq = freq; } public String getTerm(){ return this.term; } public int getFreq(){ return this.freq; } } /*Comparator to compare the objects by the frequency*/ class FreqComparator implements Comparator{ public int compare(Object pair1, Object pair2){ int f1 = ((TermFrq)pair1).getFreq(); int f2 = ((TermFrq)pair2).getFreq(); if(f1 > f2) return 1; else if(f1 < f2) return -1; else return 0; } } Explanations and corrections i will very much appreciate, and also if someone else had experience with term frequency extraction and did it better way, I am opened to all suggestions! Please help!!!! Thanx!

Read the article
Lucene Search Returning Extra, Undesired Records

- by Brandon

I have a Lucene index that contains a field called 'Name'. I escape all special characters before inserting a value into my index using QueryParser.Escape(value). In my example I have 2 documents with the following names respectively: Test Test (Test) They get inserted into my index as such (I can confirm this using Luke): [test] [test] [\(test\)] I insert these values as TOKENIZED and using the StandardAnalyzer. When I perform a search, I use the QueryParser.Escape(searchString) against my search string input to escape special characters and then use the QueryParser with my 'Name' field and the StandardAnalyzer to perform my search. When I perform a search for 'Test', I get back both documents in my index (as expected). However, when I perform a search for 'Test (Test)', I am getting back both documents still. I realize that in both examples it matches on the 'test' term in the index, but I am confused in my 2nd example why it would not just pull back the document with the value of 'Test (Test)' because my search should create two terms: [test] and [\(test\)] I would imagine it would perform some sort of boolean operator where BOTH terms must match in that situation so I would get back just one record. Is there something I am missing or a trick to make the search behave as desired?

Read the article
How do I get Lucene (.NET) to highlight correctly with wildcards?

- by Scott Stafford

I am using the Lucene.NET API directly in my ASP.NET/C# web application. When I search using a wildcard, like "fuc*", the highlighter doesn't highlight anything, but when I search for the whole word, like "fuchsia", it highlights fine. Does Lucene have the ability to highlight using the same logic it used to match with? Various maybe-relevant code-snippets below: var formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter( "<span class='srhilite'>", "</span>"); var fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(100); var scorer = new Lucene.Net.Highlight.QueryScorer(query); var highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer); highlighter.SetTextFragmenter(fragmenter); and then on each hit... string description = Server.HtmlEncode(doc.Get("Description")); var stream = analyzer.TokenStream("Description", new System.IO.StringReader(description)); string highlighted_text = highlighter.GetBestFragments( stream, description, 1, "..."); And I'm using the QueryParser and the StandardAnalyzer.

Read the article
How do I detect if there is already a similar document stored in Lucene index.

- by Jenea

Hi. I need to exclude duplicates in my database. The problem is that duplicates are not considered exact match but rather similar documents. For this purpose I decided to use FuzzyQuery like follows: var fuzzyQuery = new global::Lucene.Net.Search.FuzzyQuery( new Term("text", queryText), 0.8f, 0); hits = _searcher.Search(query); The idea was to set the minimal similarity to 0.8 (that I think is high enough) so only similar documents will be found excluding those that are not sufficiently similar. To test this code I decided to see if it finds already existing document. To the variable queryText was assigned a value that is stored in the index. The code from above found nothing, in other words it doesn't detect even exact match. Index was build by this code: doc.Add(new global::Lucene.Net.Documents.Field( "text", text, global::Lucene.Net.Documents.Field.Store.YES, global::Lucene.Net.Documents.Field.Index.TOKENIZED, global::Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS)); I followed recomendations from bellow and the results are: TermQuery doesn't return any result. Query contructed with var _analyzer = new RussianAnalyzer(); var parser = new global::Lucene.Net.QueryParsers .QueryParser("text", _analyzer); var query = parser.Parse(queryText); var _searcher = new IndexSearcher (Settings.General.Default.LuceneIndexDirectoryPath); var hits = _searcher.Search(query); Returns several results with the maximum score the document that has exact match and other several documents that have similar content.

Read the article
Lucene: Question of score caculation with PrefixQuery

- by Keven

Hi, I meet some problem with the score caculation with a PrefixQuery. To change score of each document, when add document into index, I have used setBoost to change the boost of the document. Then I create PrefixQuery to search, but the result have not been changed according to the boost. It seems setBoost totally doesn't work for a PrefixQuery. Please check my code below: @Test public void testNormsDocBoost() throws Exception { Directory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED); Document doc1 = new Document(); Field f1 = new Field("contents", "common1", Field.Store.YES, Field.Index.ANALYZED); doc1.add(f1); doc1.setBoost(100); writer.addDocument(doc1); Document doc2 = new Document(); Field f2 = new Field("contents", "common2", Field.Store.YES, Field.Index.ANALYZED); doc2.add(f2); doc2.setBoost(200); writer.addDocument(doc2); Document doc3 = new Document(); Field f3 = new Field("contents", "common3", Field.Store.YES, Field.Index.ANALYZED); doc3.add(f3); doc3.setBoost(300); writer.addDocument(doc3); writer.close(); IndexReader reader = IndexReader.open(dir); IndexSearcher searcher = new IndexSearcher(reader); TopDocs docs = searcher.search(new PrefixQuery(new Term("contents", "common")), 10); for (ScoreDoc doc : docs.scoreDocs) { System.out.println("docid : " + doc.doc + " score : " + doc.score + " " + searcher.doc(doc.doc).get("contents")); } } The output is : docid : 0 score : 1.0 common1 docid : 1 score : 1.0 common2 docid : 2 score : 1.0 common3

Read the article
Multiple or single index in Lucene?

- by Bruno Reis

I have to index different kinds of data (text documents, forum messages, user profile data, etc) that should be searched together (ie, a single search would return results of the different kinds of data). What are the advantages and disadvantages of having multiple indexes, one for each type of data? And the advantages and disadvantages of having a single index for all kinds of data? Thank you.

Read the article
How to call a Zend lucene search function?

- by stef

I inherited a Zend project devoid of comments and I didn't get to talk to the previous developer. Since I have no Zend experience I'm having some issues :) I'd like to print out some variables inside an function that indexes items from the site using Zend_Search_Lucene because I think something is going wrong here. From what I've read, ::create creates a new index and ::open updates it. So it's in this ::open function I'd like to print out some variables. The name and params of the function are below. Does anyone have any idea how this function can be called so I can run some tests? private function search($category,$string,$page = 1,$itemsByPage = 5) EDIT: OR, is there a way I can nuke the existing index and force it to be rebuilt completely, for example by deleting the index files on the FS and then performing some searches?

Read the article
How do i keep my DB and lucene in sync?

- by acidzombie24

So i can have a transaction in sql. But i am sure its not a good idea to wait in the middle of a transaction for lucene to finish also i am unsure if lucene is permanently saved in the DB until i do something there. Whats the best way to keep my DB and lucene in sync? I am thinking of adding a lucene_queue in my sql db and everytime i make a change i add it into the queue (removing older queue if any) and delete it once it is done. Is this the best way? Also i am unsure how to make lucene permanently keep the changes i made and how frequent i can/should do it.

Read the article
pylucene: install error

- by Pradeep

I am trying to install Pylucene (pylucene-3.3-3-src.tar.gz) on my ubuntu linux 11.10. I have python 2.7.2. I was able to compile JCC (I think) because I didnt see any error when I installed it. When I tried to install Pylucene I get the following error. Can someone help? Thanks. ICU not installed /usr/bin/python -m jcc --shared --jar lucene-java-3.3/lucene/build/lucene-core-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/analyzers/common/lucene-analyzers-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/memory/lucene-memory-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/highlighter/lucene-highlighter-3.3.jar --jar build/jar/extensions.jar --jar lucene-java-3.3/lucene/build/contrib/queries/lucene-queries-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/grouping/lucene-grouping-3.3.jar --package java.lang java.lang.System java.lang.Runtime --package java.util java.util.Arrays java.util.HashMap java.util.HashSet java.text.SimpleDateFormat java.text.DecimalFormat java.text.Collator --package java.util.regex --package java.io java.io.StringReader java.io.InputStreamReader java.io.FileInputStream --exclude org.apache.lucene.queryParser.Token --exclude org.apache.lucene.queryParser.TokenMgrError --exclude org.apache.lucene.queryParser.QueryParserTokenManager --exclude org.apache.lucene.queryParser.ParseException --exclude org.apache.lucene.search.regex.JakartaRegexpCapabilities --exclude org.apache.regexp.RegexpTunnel --exclude org.apache.lucene.analysis.cn.smart.AnalyzerProfile --python lucene --mapping org.apache.lucene.document.Document 'get:(Ljava/lang/String;)Ljava/lang/String;' --mapping java.util.Properties 'getProperty:(Ljava/lang/String;)Ljava/lang/String;' --sequence java.util.AbstractList 'size:()I' 'get:(I)Ljava/lang/Object;' --rename org.apache.lucene.search.highlight.SpanScorer=HighlighterSpanScorer --version 3.3 --module python/collections.py --module python/ICUNormalizer2Filter.py --module python/ICUFoldingFilter.py --module python/ICUTransformFilter.py --files 3 --build /usr/bin/python: No module named jcc make: *** [compile] Error 1 Here is my Makefile configuration which I uncommented PREFIX_PYTHON=/usr ANT=ant PYTHON=$(PREFIX_PYTHON)/bin/python JCC=$(PYTHON) -m jcc --shared NUM_FILES=3

Read the article
How can I get DocId when adding a document in Lucene index?

- by Rohit

I am indexing a row of data from database in Lucene.Net. A row is equivalent of Document. I want to update my database with the DocId, so that I can use the DocId in the results to be able to retrieve rows quickly. I currently first retrive the PK from the result docs which I think should be slower than retriving directly from the database using DocId. How can I find the DocId when adding a document to Lucene?

Read the article
Can zend's lucene implementation be configured to use a mysql database instead of the file system?

- by soulmerge

Is there an option for Zend's lucene implementation (or a third-party plugin) that would allow me to put the lucene dictionary into a [MySQL] database? The reason I need to ask is that the database is the only common resource for our two otherwise independent web servers.

Read the article
Recommended way to perform Lucene search without limit

- by Thomas

The Lucene documents tell me that "Hits" will be removed from the API in Lucene 3.0. Deprecated. Hits will be removed in Lucene 3.0. Use search(Query, Filter, int) instead. The proposed overload limits the number of documents returned to the value of the int. So my question is: what is the recommended way to perform a search in Lucene with no limit on the number of documents to be returned?

Read the article
Installing Apache Lucene for LAMP server

- by Pawan

I have Ubuntu running for LAMP (Linux, Apache, MySQL and PHP) server. To provide better search capability one of my friend recommended to install "Apache Lucene". While reading about it I came to know that "Apache Lucene" required tomcat and java to run. Please let me know if it be feasible to have it or there are other better alternates for LAMP stack. I am looking for some proven solution. Thanks :)

Read the article
Lucene's nested query evaluation regarding negation

- by ponzao

Hi, I am adding Apache Lucene support to Querydsl (which offers type-safe queries for Java) and I am having problems understanding how Lucene evaluates queries especially regarding negation in nested queries. For instance the following two queries in my opinion are semantically the same, but only the first one returns results. +year:1990 -title:"Jurassic Park" +year:1990 +(-title:"Jurassic Park") The simplified object tree in the second example is shown below. query : Query clauses : ArrayList [0] : BooleanClause "MUST" occur : BooleanClause.Occur "year:1990" query : TermQuery [1] : BooleanClause "MUST" occur : BooleanClause.Occur query : BooleanQuery clauses : ArrayList [0] : BooleanClause "MUST_NOT" occur : BooleanClause.Occur "title:"Jurassic Park"" query : TermQuery Lucene's own QueryParser seems to evaluate "AND (NOT" into the same kind of object trees. Is this a bug in Lucene or have I misunderstood Lucene's query evaluation? I am happy to give more information if necessary.

Read the article
How does lucene index documents?

- by Mehdi Amrollahi

Hello, I read some document about Lucene; also I read the document in this link (http://lucene.sourceforge.net/talks/pisa). I don't really understand how Lucene indexes documents and don't understand which algorithms Lucene uses for indexing? On the above link, it says Lucene uses this algorithm for indexing: incremental algorithm: maintain a stack of segment indices create index for each incoming document push new indexes onto the stack let b=10 be the merge factor; M=8 for (size = 1; size < M; size *= b) { if (there are b indexes with size docs on top of the stack) { pop them off the stack; merge them into a single index; push the merged index onto the stack; } else { break; } } How does this algorithm provide optimized indexing? Does Lucene use B-tree algorithm or any other algorithm like that for indexing - or does it have a particular algorithm? Thank you for reading my post.

Read the article
How lucene indexing ?

- by user312140

Hello I read some document about lucene ; also i read the document in this link ( http://lucene.sourceforge.net/talks/pisa ) . I don't really understand how lucene index documents and don't understand lucene work with which algorithm for indexing ? On above link , said lucene use this algorithm for indexing : * incremental algorithm: o maintain a stack of segment indices o create index for each incoming document o push new indexes onto the stack o let b=10 be the merge factor; M=8 for (size = 1; size < M; size *= b) { if (there are b indexes with size docs on top of the stack) { pop them off the stack; merge them into a single index; push the merged index onto the stack; } else { break; } } How this algorithm help us to have an optimize indexing ? Does lucene use B-tree algorithm or any other algorithm like that for indexing or have a paticular algorithm ? Thank you for reading my post .

Read the article
How to sort by a field that has an alternative value if null in lucene?

- by citizenmatt

Hi folks. I want to sort my lucene(.net) search results by a date field (date1), but if date1 is not set, I'd like to use date2. The traditional sort method is to sort by date1, and then sort the values that are the same by date 2. This would mean that whenever I did fall back to date2, these values would be at the top (or bottom) of the result set. I'd like to interleave the date2 values with the date1 values. In other words, I want to sort on (date1 != null ? date1 : date2). Is this possible in lucene? I reckon I could do this in the index creation phase (just put the relevant date value in a new field) but I don't have enough control of the indexing process to be able to do this, so would like a sorting solution. Any ideas? Thanks Matt

Read the article
How to build Lucene / Solr from source code in windows environment in order to add patches

- by Simon

I have successfully implemented Apache’s Solr for free text searching a database driven web site build for windows platforms using Visual Studio in c#. I am trying to get a version Solr working with field collapsing (which is not in the release version). There are patches available from apache and discussions on the web of people successfully doing this for the version I am using but my problem is cannot get the build to work. I am a c# coder on windows platforms so java development is new to me. I understand I need to get the correct source code (and revision) from SVN, add the appropriate patches, then build the war file to deploy to my system. I cannot seem to get the source to build and produce the deployment code including jar (and subsequent war) files. My system is: Windows 7 Ultimate for development Visual Studio 2010 for c# / javascript development MyEclipse 8.6 / Eclipse 3.5 for the java build from source Subecplise 1.6x SVN plugin to get the source from apache’s SVN Apache Solr 1.4.1 So far I have: Found the right patches for the function I need: https://issues.apache.org/jira/browse/SOLR-236 Specifically I need to patch: field_collapsing_1.1.0.patch HTTPS //issues.apache.org/jira/secure/attachment/12357681/field_collapsing_1.1.0.patch and SOLR-236-1_4_1.patch HTTPS //issues.apache.org/jira/secure/attachment/12448216/SOLR-236-1_4_1.patch I downloaded the Lucene trunk version from the day before the patch was released (revision 958303 from 28/6/10) via subeclipse into a java package in myeclipse from: HTTPS //svn.apache.org/repos/asf/lucene/dev/trunk (Solr is the web implementation of Lucene and is in the subfolder solr/) I can apply patches to the solr directory once it has downloaded but the parent Lucene project doesn’t build the war files, copy the jar or other files into the bin folder (it stays empty). The build process starts, but doesn’t do anything apart from creating the folders bin and src. I am building the whole Lucene project, which contains Solr. I have tried building the source without patching and the same happens. If I copy out the Solr directory into a new project, it runs the build and copies all the related files, tests, etc but fails with 4,500 errors and does not produce the jar files or war file, which I assume is because it can’t find the Lucene trunk files which it depends on. I have two interrelated problems 1) I can't get the Lucene downloaded trunk to build 2) The jar, war and associated files are not created Can anyone help with what I am missing to build the war file? I have spent 2 days to get this far as the help online is extremely patchy and I can’t find a walk though tutorial on building a java war file from source in a windows environment. Any help will be much appreciated. Simon

Read the article
How to find similar/related text with Zend Lucene?

- by Arty

Say I need to make searching for related titles just like stackoverflow does before you add your question or digg.com before submitting news. I didn't find a way how to do this with Zend Lucene. There are setSlop method for queries, but as I understand, it doesn't help. Is there any way to do this kind of searches?

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >