Search Results

Search found 393 results on 16 pages for 'lucene'.

Page 9/16 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

How do you boost term relevance in Sql Server Full Text Search like you can in Lucene?

- by Snives

I'm doing a typical full text search using containstable using 'ISABOUT(term1,term2,term3)' and although it supports term weighting that's not what I need. I need the ability to boost the relevancy of terms contained in certain portions of text. For example, it is customary for metatags or page title to be weighted differently than body text when searching web pages. Although I'm not dealing with web pages I do seek the same functionality. In Lucene it's called Document Field Level Boosting. How would one natively do this in Sql Server Full Text Search?

Read the article
Lucene: Fastest way to return the document occurance of a phrase?

- by dont say the kid's name

Hi Guys, I am trying to use Lucene (actually PyLucene!) to find out how many documents contain my exact phrase. My code currently looks like this... but it runs rather slow. Does anyone know a faster way to return document counts? phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases... countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True) analyzer = StandardAnalyzer(Version.LUCENE_CURRENT) for phrase in phraseList: query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"") scoreDocs = countsearcher.search(query, 200).scoreDocs print "count is: " + str(len(scoreDocs))

Read the article
How to keep Lucene index synchronized with Mysql database?

- by ?????

I am trying to utilize Lucene to develop full text search in my application, which need to build index based on my mysql database. I was wondering is how to keep these index synchronized with db? I came up with to ways: 1) add extra code in business logic tightly to update the search index . 2) running a separated task to rebuild the index periodically. do you have any other approaches? and what do you think is the best way? Any comments would be appreciate, thanks in advance!

Read the article
Zend_Search_Lucene vs SOLR

- by spacemonkey

Hi, I have recenlty stumbled into Zend Lucene port of Lucene project. I have a little bit experience with SOLR so I would like to know what is the difference between two of them especially from performance and installation side. As much as I know SOLR requires Tomcat serverlet running in web hosting in order to work, what about Zend Lucene library? I am also a bit confused what means "being implemented on the top of Lucene"?

Read the article
How to structure an index for type ahead for extremely large dataset using Lucene or similar?

- by Pete

I have a dataset of 200million+ records and am looking to build a dedicated backend to power a type ahead solution. Lucene is of interest given its popularity and license type, but I'm open to other open source suggestions as well. I am looking for advice, tales from the trenches, or even better direct instruction on what I will need as far as amount of hardware and structure of software. Requirements: Must have: The ability to do starts with substring matching (I type in 'st' and it should match 'Stephen') The ability to return results very quickly, I'd say 500ms is an upper bound. Nice to have: The ability to feed relevance information into the indexing process, so that, for example, more popular terms would be returned ahead of others and not just alphabetical, aka Google style. In-word substring matching, so for example ('st' would match 'bestseller') Note: This index will purely be used for type ahead, and does not need to serve standard search queries. I am not worried about getting advice on how to set up the front end or AJAX, as long as the index can be queried as a service or directly via Java code. Up votes for any useful information that allows me to get closer to an enterprise level type ahead solution

Read the article
How to search Multiple Sites using Lucene Search engine API?

- by Wael Salman

Hope that someone can help me as soon as possible :-) I would like to know how can we search Multiple Sites using Lucene??! (All sites are in one index). I have succeeded to search one website , and to index multiple sites, however I am not able to search all websites. Consider this method that I have: private void PerformSearch() { DateTime start = DateTime.Now; //Create the Searcher object string strIndexDir = Server.MapPath("index") + @"\" + mstrURL; IndexSearcher objSearcher = new IndexSearcher(strIndexDir); //Parse the query, "text" is the default field to search Query objQuery = QueryParser.Parse(mstrQuery, "text", new StandardAnalyzer()); //Create the result DataTable mobjDTResults.Columns.Add("title", typeof(string)); mobjDTResults.Columns.Add("path", typeof(string)); mobjDTResults.Columns.Add("score", typeof(string)); mobjDTResults.Columns.Add("sample", typeof(string)); mobjDTResults.Columns.Add("explain", typeof(string)); //Perform search and get hit count Hits objHits = objSearcher.Search(objQuery); mintTotal = objHits.Length(); //Create Highlighter QueryHighlightExtractor highlighter = new QueryHighlightExtractor(objQuery, new StandardAnalyzer(), "<B>", "</B>"); //Initialize "Start At" variable mintStartAt = GetStartAt(); //How many items we should show? int intResultsCt = GetSmallerOf(mintTotal, mintMaxResults + mintStartAt); //Loop through results and display for (int intCt = mintStartAt; intCt < intResultsCt; intCt++) { //Get the document from resuls index Document doc = objHits.Doc(intCt); //Get the document's ID and set the cache location string strID = doc.Get("id"); string strLocation = ""; if (mstrURL.Substring(0,3) == "www") strLocation = Server.MapPath("cache") + @"\" + mstrURL + @"\" + strID + ".htm"; else strLocation = doc.Get("path") + doc.Get("filename"); //Load the HTML page from cache string strPlainText; using (StreamReader sr = new StreamReader(strLocation, System.Text.Encoding.Default)) { strPlainText = ParseHTML(sr.ReadToEnd()); } //Add result to results datagrid DataRow row = mobjDTResults.NewRow(); if (mstrURL.Substring(0,3) == "www") row["title"] = doc.Get("title"); else row["title"] = doc.Get("filename"); row["path"] = doc.Get("path"); row["score"] = String.Format("{0:f}", (objHits.Score(intCt) * 100)) + "%"; row["sample"] = highlighter.GetBestFragments(strPlainText, 200, 2, "..."); Explanation objExplain = objSearcher.Explain(objQuery, intCt); row["explain"] = objExplain.ToHtml(); mobjDTResults.Rows.Add(row); } objSearcher.Close(); //Finalize results information mTsDuration = DateTime.Now - start; mintFromItem = mintStartAt + 1; mintToItem = GetSmallerOf(mintStartAt + mintMaxResults, mintTotal); } as you can see that I use the site URL 'mstrURL' when I create the search object string strIndexDir = Server.MapPath("index") + @"\" + mstrURL; How can I do the same when I want to search multiple sites?? Actually I am using the code from http://www.keylimetie.com/blog/2005/8/4/lucenenet/

Read the article
Cheatsheet: 2010 04.01 ~ 04.07

- by gOODiDEA

Web Web Performance Best Practices: How masters.com re-designed their site to boost performance – and what that re-design missed What’s wrong with extending the DOM John Resig on Advanced Javascript to Improve your Web App .NET Hammock for REST - a REST library for .NET Programming Windows Phone 7 Series by Charlez Petzold – Free EBook Testing the Lock-Free Queue Some Last-Minute New C# 4.0 Features - while (x --> 0) { Console.WriteLine("x = {0}", x); } Better Coding with Visual Studio 2010 Revisiting Asynchronous ASP.NET Pages Database Understanding RAID for SQL Server – Part 2 Cassandra Jump Start For The Windows Developer Cassandra Internals – Writing - Cassandra Write Operation Performance Explained Cassandra Internals – Reading - Cassandra Reads Performance Explained MongoDB Growing Up: Release 1.4 and Commercial Support by 10gen Why NoSQL Will Not Die How Many Hard Drives Do I Need to Support SQL Server? Other Presentation: CouchDB and Lucene MongoDB Cacti Graphs HBase vs Cassandra: why we moved How to use the DedicatedDumpFile registry value to overcome space limitations on the system drive when capturing a system memory dump

Read the article
Crawling for geotagged data

- by abe3

I have no experience with web crawlers -- but I know that Apache maintains an open source web crawler called "Lucene." How would I go about writing such a crawler to search the web for geo tagged data close to a particular location? What would a general road map look like? How do I pick which slice of the web to crawl? Do I use regular expressions to find things that look like longitudes and latitudes? What does a general sketch of that solution look like?

Read the article
How can I use Lucene for personal name (first name, last name) search?

- by os111

I'm writing a search feature for a database of NFL players. The user enters a search string like "Jason Campbell" or "Campbell" or "Jason". I'm having trouble getting the appropriate results. Which Analyzer should I use when indexing? Which Query when querying? Should I distinguish between first name and last name or just index the full name string? I'd like the following behavior: Query: "Jason Campbell" - Result: exact match for 1 player, Jason Campbell Query: "Campbell" - Result: all players with Campbell in their name Query: "Jason" - Result: all players with Jason in their name Query: "Cambel" [misspelled] - Result: all players with Campbell in their name

Read the article
Compass - Lucene Full text search. Structure and Best Practice.

- by Rob

Hi, I have played about with the tutorial and Compass itself for a bit now. I have just started to ramp up the use of it and have found that the performance slows drastically. I am certain that this is due to my mappings and the relationships that I have between entities and was looking for suggestions about how this should be best done. Also as a side question I wanted to know if a variable is in an @searchableComponent but is not defined as @searchable when the component object is pulled out of Compass results will you be able to access that variable? I have 3 main classes that I want to search on - Provider, Location and Activity. They are all inter-related - a Provider can have many locations and activites and has an address; A Location has 1 provider, many activities and an address; An activity has 1 provider and many locations. I have a join table between activity and Location called ActivityLocation that can later provider additional information about the relationship. My classes are mapped to compass as shown below for provider location activity and address. This works but gives a huge index and searches on it are comparatively slow, any advice would be great. Cheers, Rob @Searchable public class AbstractActivity extends LightEntity implements Serializable { /** * Serialisation ID */ private static final long serialVersionUID = 3445339493203407152L; @SearchableId (name="actID") private Integer activityId =-1; @SearchableComponent() private Provider provider; @SearchableComponent(prefix = "activity") private Category category; private String status; @SearchableProperty (name = "activityName") @SearchableMetaData (name = "activityshortactivityName") private String activityName; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "activityshortDescription") private String shortDescription; @SearchableProperty (name = "abRating") @SearchableMetaData (name = "activityabRating") private Integer abRating; private String contactName; private String phoneNumber; private String faxNumber; private String email; private String url; @SearchableProperty (name = "completed") @SearchableMetaData (name = "activitycompleted") private Boolean completed= false; @SearchableProperty (name = "isprivate") @SearchableMetaData (name = "activityisprivate") private Boolean isprivate= false; private Boolean subs= false; private Boolean newsfeed= true; private Set news = new HashSet(0); private Set ActivitySession = new HashSet(0); private Set payments = new HashSet(0); private Set userclub = new HashSet(0); private Set ActivityOpeningTimes = new HashSet(0); private Set Events = new HashSet(0); private User creator; private Set userInterested = new HashSet(0); boolean freeEdit = false; private Integer activityType =0; @SearchableComponent (maxDepth=2) private Set activityLocations = new HashSet(0); private Double userRating = -1.00; Getters and Setters .... @Searchable public class AbstractLocation extends LightEntity implements Serializable { /** * Serialisation ID */ private static final long serialVersionUID = 3445339493203407152L; @SearchableId (name="locationID") private Integer locationId; @SearchableComponent (prefix = "location") private Category category; @SearchableComponent (maxDepth=1) private Provider provider; @SearchableProperty (name = "status") @SearchableMetaData (name = "locationstatus") private String status; @SearchableProperty private String locationName; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "locationshortDescription") private String shortDescription; @SearchableProperty (name = "abRating") @SearchableMetaData (name = "locationabRating") private Integer abRating; private Integer boolUseProviderDetails; @SearchableProperty (name = "contactName") @SearchableMetaData (name = "locationcontactName") private String contactName; @SearchableComponent private Address address; @SearchableProperty (name = "phoneNumber") @SearchableMetaData (name = "locationphoneNumber") private String phoneNumber; @SearchableProperty (name = "faxNumber") @SearchableMetaData (name = "locationfaxNumber") private String faxNumber; @SearchableProperty (name = "email") @SearchableMetaData (name = "locationemail") private String email; @SearchableProperty (name = "url") @SearchableMetaData (name = "locationurl") private String url; @SearchableProperty (name = "completed") @SearchableMetaData (name = "locationcompleted") private Boolean completed= false; @SearchableProperty (name = "isprivate") @SearchableMetaData (name = "locationisprivate") private Boolean isprivate= false; @SearchableComponent private Set activityLocations = new HashSet(0); private Set LocationOpeningTimes = new HashSet(0); private Set events = new HashSet(0); @SearchableProperty (name = "adult_cost") @SearchableMetaData (name = "locationadult_cost") private String adult_cost =""; @SearchableProperty (name = "child_cost") @SearchableMetaData (name = "locationchild_cost") private String child_cost =""; @SearchableProperty (name = "family_cost") @SearchableMetaData (name = "locationfamily_cost") private String family_cost =""; private Double userRating = -1.00; private Set costs = new HashSet(0); private String cost_caveats =""; Getters and Setters .... @Searchable public class AbstractActivitylocations implements java.io.Serializable { /** * */ private static final long serialVersionUID = 1365110541466870626L; @SearchableId (name="id") private Integer id; @SearchableComponent private Activity activity; @SearchableComponent private Location location; Getters and Setters..... @Searchable public class AbstractProvider extends LightEntity implements Serializable { private static final long serialVersionUID = 3060354043429663058L; @SearchableId private Integer providerId = -1; @SearchableComponent (prefix = "provider") private Category category; @SearchableProperty (name = "businessName") @SearchableMetaData (name = "providerbusinessName") private String businessName; @SearchableProperty (name = "contactName") @SearchableMetaData (name = "providercontactName") private String contactName; @SearchableComponent private Address address; @SearchableProperty (name = "phoneNumber") @SearchableMetaData (name = "providerphoneNumber") private String phoneNumber; @SearchableProperty (name = "faxNumber") @SearchableMetaData (name = "providerfaxNumber") private String faxNumber; @SearchableProperty (name = "email") @SearchableMetaData (name = "provideremail") private String email; @SearchableProperty (name = "url") @SearchableMetaData (name = "providerurl") private String url; @SearchableProperty (name = "status") @SearchableMetaData (name = "providerstatus") private String status; @SearchableProperty (name = "notes") @SearchableMetaData (name = "providernotes") private String notes; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "providershortDescription") private String shortDescription; private Boolean completed = false; private Boolean isprivate = false; private Double userRating = -1.00; private Integer ABRating = 1; @SearchableComponent private Set locations = new HashSet(0); @SearchableComponent private Set activities = new HashSet(0); private Set ProviderOpeningTimes = new HashSet(0); private User creator; boolean freeEdit = false; Getters and Setters... Thanks for reading!! Rob

Read the article
How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?

- by harsha

Also I want to know how to add meta data while indexing so that i can boost some parameters

Read the article
Manipulate score/rank on query results from NHibernate.Search

- by Fernando Figueiredo

I've been working with NHibernate, NHibernate.Search and Lucene.Net to improve the search engine used on the website I develop. Basically, I use it to search contents of corporations specification documents. This is not to be confused with Lucene's notion of documents: in my case, a specification document (which I'll hereafter call a "specdoc") can contain many pages, and the content of these pages are the ones that are actually indexed (thus, the pages themselves are the ones that fall into Lucene's concept of documents). So, the pages belong to a specdoc, that in turn belong to a corporation (so, a corporation can have many specdocs). I'm using NHibernate.Search "IndexEmbedded" and "ContainedIn" attributes to associate the pages with their specdoc and the specdocs to their corporations, so I can query for terms in specdoc pages and have Lucene/NH.Search return either the pages themselves, the specdocs, or the corporations that match the query on the pages. I can query this way and get ranked results, thus presenting results (that is, corporations, specdocs or pages) by relevance, which is great. But now I need something more. Specifically in the case where I query terms and have NH.Search return the corporations that match, I need to manually/artificially tune the score of some of the results, because there are corporations that I want to show up on the top of the result set - think of "sponsored results". I'm thinking of doing it on my application, maybe creating an entity/database table that contain an association to the corporation entity, and a score boost value. But I don't know how to feed this to Lucene and have it boost the results accordingly at search time. Initially I thought about deriving a Similarity class to do this, but it doesn't look like Similarity can be used to modify result sets at search time. As per this page, it looks like what I need is to mess around with weight or scoring. But the docs are a little superficial in that there are no examples on how to implement a custom scoring, let alone integrate it with NH.Search. So, does anyone know how to do this, or point me to some documentation or working example on how to do something similar? Thanks!

Read the article
How to acces a file under WEB-INF from a Java web application

- by John Manak

Hi! Do you have any idea how to access files in WEB-INF/index folder from my application? I'm using OpenCMS for my application and I want to open a Lucene search index (with the help of Lucene IndexReader class) located at WEB-INF/index folder. Lucene jar is stored in WEB-INF/lib folder.

Read the article
free text search engine

- by lata patil

how to install sphinx and lucene on windows xp?

Read the article
Good library for search text tokenization

- by Chris Dutrow

Looking to tokenize some text in the same or similar way in which a search engine would do it. The reason we are doing this is so that we can run some statistical analysis on the tokens. The language we are using is python, so would prefer a library in that language, but could probably set something up to use another language if necessary. Example Original token: We have some great burritos! More simplified: (remove plurals and punctuation) We have some great burrito Even more simplified: (remove superfluous words) great burrito Best: (recognize positive and negative meaning): burrito -positive-

Read the article
How important is index size when searching?

- by Michael K

My company has recently began using Apache Solr to search its data. As we learn how to use it we have gone down the path of indexing multiple fields to get the results we need. Most of these are either N-Grammed or Edge-N-Grammed. Gramming by nature takes up a lot of space, which takes more time to search. Space is cheap, but time is less so. Index time is not too important, since a delta-import (only get the changes since last index) is extremely quick and you only pay a penalty on the first import. What we've not been able to determine is what effect the index size has on query times. Obviously a larger index takes longer to search, but the time added by n-gramming a field is difficult to predict. How do you determine whether a field is worth gramming? Can you predict how much longer a query will take when you gram a field?

Read the article
Hibernate Search Paging + FullTextSearch + Criteria

- by Roy Chan

I am trying to do a search with some criteria FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(finalQuery, KnowledgeBaseSolution.class).setCriteriaQuery(criteria); and then page it //Gives me around 700 results result.setResultCount(fullTextQuery.getResultSize()); //Some pages are empty fullTextQuery.setFirstResult(( (pageNumber - 1) * pageSize )); fullTextQuery.setMaxResults( pageSize ); result.setResults(fullTextQuery.list()); I suspect Lucene return full result of the full text search without taking the criteria into account and then hibernate search applies the criteria after, therefore some page are empty (after filtering by criteria) What is proper way to do fullTextSearch with some criteria, is it possible to apply the criteria before the lucene search? Or do I have to use pure Lucene (if so what's the point of Hibernate Search?) Thanks in advance

Read the article
how to lucene serch in android

- by xyz Sad

Lucen with android logic ..??? public class TestAndroidLuceneActivity extends Activity { @Override public void onCreate(Bundle icicle) { super.onCreate(icicle); setContentView(R.layout.main); try { Directory directory = new RAMDirectory(); Analyzer analyzer = new StandardAnalyzer(); Document doc = new Document(); doc.add(new Field("header", "ABC", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); doc.add(new Field("header", "DEF", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); doc.add(new Field("header", "GHI", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); doc.add(new Field("header", "JKL", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); indexWriter.optimize(); indexWriter.close(); IndexSearcher indexSearcher = new IndexSearcher(directory); QueryParser parser = new QueryParser("header", analyzer); // Query query = parser.parse("(" + "Anil" + ")"); Query query = parser.parse("(" + "ABC" + ")"); Hits hits = indexSearcher.search(query); for (int i = 0; i < hits.length(); i++) { Document hitDoc = hits.doc(i); Log.i("TestAndroidLuceneActivity", "Lucene: " +hitDoc.get("header")); // Toast.makeText(this, hitDoc.get("header"),Toast.LENGTH_LONG).show(); } indexSearcher.close(); directory.close(); } catch (Exception ex) { System.out.println(ex.getMessage()); } } } i have this code but i m not able to understnd plz send me related or modifed and set it main.xml show me some out put plzz..its does not serch after "ABC" plz tell me wat is the problem in logic any thing missing???..

Read the article
MySQL Full-Text Search Across Multiple Tables - Quick/Long Solution?

- by Kerry

Hello all, I have been doing a bit of research on full-text searches as we realized a series of LIKE statements are terrible. My first find was MySQL full-text searches. I tried to implement this and it worked on one table, failed when I was trying to join multiple tables, and so I consulted stackoverflow's articles (look at the end for a list of the ones I've been to) I didn't see anything that clearly answered my questions. I'm trying to get this done literally in an hour or two (quick solution) but I also want to do a better long term solution. Here is my query: SELECT a.`product_id`, a.`name`, a.`slug`, a.`description`, b.`list_price`, b.`price`, c.`image`, c.`swatch`, e.`name` AS industry FROM `products` AS a LEFT JOIN `website_products` AS b ON (a.`product_id` = b.`product_id`) LEFT JOIN ( SELECT `product_id`, `image`, `swatch` FROM `product_images` WHERE `sequence` = 0) AS c ON (a.`product_id` = c.`product_id`) LEFT JOIN `brands` AS d ON (a.`brand_id` = d.`brand_id`) INNER JOIN `industries` AS e ON (a.`industry_id` = e.`industry_id`) WHERE b.`website_id` = 96 AND b.`status` = 1 AND b.`active` = 1 AND MATCH( a.`name`, a.`sku`, a.`description`, d.`name` ) AGAINST ( 'ashley sofa' ) GROUP BY a.`product_id` ORDER BY b.`sequence` LIMIT 0, 9 The error I get is: Incorrect arguments to MATCH If I remove d.name from the MATCH statement it works. I have a full-text index on that column. I saw one of the articles say to use an OR MATCH for this table, but won't that lose the effectiveness of being able to rank them together or match them properly? Other places said to use UNIONs but I don't know how to do that properly. Any advice would be greatly appreciated. In the idea of a long term solution it seems that either Sphinx or Lucene is best. Now by no means and I a MySQL guru, and I heard that Lucene is a bit more complicated to setup, any recommendations or directions would be great. Articles: http://stackoverflow.com/questions/1117005/mysql-full-text-search-across-multiple-tables http://stackoverflow.com/questions/668371/mysql-fulltext-search-across-1-table http://stackoverflow.com/questions/2378366/mysql-how-to-make-multiple-table-fulltext-search http://stackoverflow.com/questions/737275/pros-cons-of-full-text-search-engine-lucene-sphinx-postgresql-full-text-searc http://stackoverflow.com/questions/1059253/searching-across-multiple-tables-best-practices

Read the article
How to count term frequency for set of documents?

- by ManBugra

i have a Lucene-Index with following documents: doc1 := { caldari, jita, shield, planet } doc2 := { gallente, dodixie, armor, planet } doc3 := { amarr, laser, armor, planet } doc4 := { minmatar, rens, space } doc5 := { jove, space, secret, planet } so these 5 documents use 14 different terms: [ caldari, jita, shield, planet, gallente, dodixie, armor, amarr, laser, minmatar, rens, jove, space, secret ] the frequency of each term: [ 1, 1, 1, 4, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1 ] for easy reading: [ caldari:1, jita:1, shield:1, planet:4, gallente:1, dodixie:1, armor:2, amarr:1, laser:1, minmatar:1, rens:1, jove:1, space:2, secret:1 ] What i do want to know now is, how to obtain the term frequency vector for a set of documents? for example: Set<Documents> docs := [ doc2, doc3 ] termFrequencies = magicFunction(docs); System.out.pring( termFrequencies ); would result in the ouput: [ caldari:0, jita:0, shield:0, planet:2, gallente:1, dodixie:1, armor:2, amarr:1, laser:1, minmatar:0, rens:0, jove:0, space:0, secret:0 ] remove all zeros: [ planet:2, gallente:1, dodixie:1, armor:2, amarr:1, laser:1 ] Notice, that the result vetor contains only the term frequencies of the set of documents. NOT the overall frequencies of the whole index! The term 'planet' is present 4 times in the whole index but the source set of documents only contains it 2 times. A naive implementation would be to just iterate over all documents in the docs set, create a map and count each term. But i need a solution that would also work with a document set size of 100.000 or 500.000. Is there a feature in Lucene i can use to obtain this term vector? If there is no such feature, how would a data structure look like someone can create at index time to obtain such a term vector easily and fast? I'm not that Lucene expert so i'am sorry if the solution is obvious or trivial.

Read the article
Scalable Full Text Search With Per User Result Ordering

- by jeremy

What options exist for creating a scalable, full text search with results that need to be sorted on a per user basis? This is for PHP/MySQL (Symfony/Doctrine as well, if relevant). In our case, we have a database of workouts that have been performed by users. The workouts that the user has done before should appear at the top of the results. The more frequently they've done the workout, the higher it should appear in search matches. If it helps, you can assume we know the number of times a user has done a workout in advance. Possible Solutions Sphinx - Use Sphinx to implement full text search, do all the querying and sorting in MySQL. This seems promising (and there's a Symfony Plugin!) but I don't know much about it. Lucene - Use Lucene to perform full text search and put the users' completions into the query. As is suggested in this Stack Overflow thread. Alternatively, use Lucene to retrieve the results, then reorder them in PHP. However, both solutions seem clunky and potentially unscalable as a user may have completed hundreds of workouts. Mysql - No native full text support (InnoDB), so we'd have use LIKE or REGEX, which isn't scalable.

Read the article
Building dictionary of words from large text

- by LiorH

I have a text file containing posts in English/Italian. I would like to read the posts into a data matrix so that each row represents a post and each column a word. The cells in the matrix are the counts of how many times each word appears in the post. The dictionary should consist of all the words in the whole file or a non exhaustive English/Italian dictionary. I know this is a common essential preprocessing step for NLP. Does anyone know of a tool\project that can perform this task? Someone mentioned apache lucene, do you know if lucene index can be serialized to a data-structure similar to my needs?

Read the article
No optimization causes wrong search result

- by KailZhang

I just took over our solr/lucene stuff from my ex-colleague. But there is a weird bug. If there is no optimization after dataimport, actually if there are multiple segment files, the search result then will be wrong. We are using a customized solr searchComponent. As far as I know about lucene, optimization is only optimization which could improve the speed of indexing and should not affect search result. I doubt this may be related to multithreading or unclosed searcher/reader or something. Anybody can help? Thank you.

Read the article
Best way to reuse a Runnable

- by Gandalf

I have a class that implements Runnable and am currently using an Executor as my thread pool to run tasks (indexing documents into Lucene). executor.execute(new LuceneDocIndexer(doc, writer)); My issue is that my Runnable class creates many Lucene Field objects and I would rather reuse them then create new ones every call. What's the best way to reuse these objects (Field objects are not thread safe so I cannot simple make them static) - should I create my own ThreadFactory? I notice that after a while the program starts to degrade drastically and the only thing I can think of is it's GC overhead. I am currently trying to profile the project to be sure this is even an issue - but for now lets just assume it is.

Read the article
Best full text search for mysql?

- by ConroyP

We're currently running MySQL on a LAMP stack and have been looking at implementing a more thorough, full-text search on our site. We've looked at MySQL's own freetext search, but it doesn't seem to cope well with large databases, which makes it far too slow for our needs. Our main requirements are: speed returning results simple updating of index In addition to the above, our "nice to have"s are: ideally not something that requires adding a module to MySQL plays nicely with PHP (majority of our dev work done using PHP) There seems to be quite a few healthy open-source projects to add fast, reliable full-text search to MySQL, so I'm basically looking for recommendations/suggestions on what you've found to be the most useful product out there, easiest to set up, etc. So far, the list of ones we've been starting to play around with are: Sphinx, C++ based, used by craigslist, thepiratebay Lucene, Java-based Apache project, powers zeoh.com and zoomf.com Solr, Java-based offshoot of Lucene, used to power searches on Digg, CNet & AOL Channels Are there any better ones out there that we haven't come across yet? Can you recommend / suggest against any of the options we've gathered so far? Thanks for your help! Update @Cletus suggested Google's Custom Search Engine. We recently trialled this on a couple of projects, and it's an almost-perfect fit for our needs. The problem is that entries on our site are updated quite regularly, and unfortunately the speed at which entries go in/get updated in Google's index was just too slow and erratic for us to rely on, even with the addition of sitemaps and requested crawl rate changes.

Read the article

Search Results

Search found 393 results on 16 pages for 'lucene'.

Page 9/16 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

- by Snives

- by dont say the kid's name

- by ?????

- by spacemonkey

- by Pete

- by Wael Salman

- by gOODiDEA

- by abe3

- by os111

- by Rob

- by harsha

- by Fernando Figueiredo

- by John Manak

- by lata patil

- by Chris Dutrow

- by Michael K

- by Roy Chan

- by xyz Sad

- by Kerry

- by ManBugra

- by jeremy

- by LiorH

- by KailZhang

- by Gandalf

- by ConroyP

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >