Search Results

Search found 9 results on 1 pages for 'stopword'.

Page 1/1 | 1 

  • What is a good stopword in full text indexation?

    - by Benoit
    When you go to the Appendix D in Oracle Text Reference they provide lists of stopwords used by Oracle Text when indexing table contents. When I see the English list, nothing puzzles me. But the reason why the French list includes moyennant (French for in view of which) for example is unclear. Oracle has probably thought it through more than once before including it. How would you constitute a list of appropriate stopwords if you were to design an indexer?

    Read the article

  • Filter items from a feed using words in a text file, with Yahoo Pipes

    - by pufferfish
    I have a pipe that filters an RSS feed and removes any item that contains "stopwords" that I've chosen. Currently I've manually created a filter for each stopword in the pipe editor, but the more logical way is to read these from a file. I've figured out how to read the stopwords out of the text file, but how do I apply the filter operator to the feed, once for every stopword? The documentation states explicitly that operators can't be applied within the loop construct, but hopefully I'm missing something here.

    Read the article

  • Yahoo Pipes: filter items in a feed based on words in a text file

    - by pufferfish
    I have a pipe that filters an RSS feed and removes any item that contains "stopwords" that I've chosen. Currently I've manually created a filter for each stopword in the pipe editor, but the more logical way is to read these from a file. I've figured out how to read the stopwords out of the text file, but how do I apply the filter operator to the feed, once for every stopword? The documentation states explicitly that operators can't be applied within the loop construct, but hopefully I'm missing something here.

    Read the article

  • ft_stopword_file not picked up

    - by Alex Holsgrove
    I have a VPS server with a company called Webfusion. I want to remove some or all of the FULLTEXT stopwords because some specific words needs to be searchable with my DB content. I opened /etc/mysql/my.cnf and added the line ft_stopword_file="". I restarted the mysql service, ran a repair table and then tried my MATCH query with no success. I ran SHOW VARIABLES LIKE 'ft_%' and it simply shows (built-in) next to the stopword file. I am running WAMP on my workstation, and whilst I realise this isn't configured the same as a commercial VPS, the above method worked just fine. Couple someone please offer some guidance?

    Read the article

  • Mysql query problem....

    - by Avinash
    I have below values in my database. been Lorem Ipsum and scrambled ever scrambledtexttextofandtooktooktypetexthastheunknownspecimenstandardsincetypesett Here is my query: SELECT nBusinessAdID, MATCH (`sHeadline`) AGAINST ("text" IN BOOLEAN MODE) AS score FROM wiki_businessads WHERE MATCH (`sHeadline`) AGAINST ("text" IN BOOLEAN MODE) AND bDeleted ="0" AND nAdStatus ="1" ORDER BY score DESC, bPrimeListing DESC, dDateCreated DESC It's not fetching first result, why? It should fetch first result because its contain text word in it. I have disabled the stopword filtering. This one is also not working SELECT nBusinessAdID, MATCH (`sHeadline`) AGAINST ('"text"' IN BOOLEAN MODE) AS score FROM wiki_businessads WHERE MATCH (`sHeadline`) AGAINST ('"text"' IN BOOLEAN MODE) AND bDeleted ="0" AND nAdStatus ="1" ORDER BY score DESC, bPrimeListing DESC, dDateCreated DESC Thanks Avinash

    Read the article

  • Removing stopwords,but should return as a line

    - by Sarath R Nair
    My question may appear silly. But as I am a rookie in Python , help me out. I have to pass a line to a stopword removal function. It works fine. But my problem is return of the function is appending the words. I want it as like follows: line = " I am feeling good , but I cant talk" Let "I,but,cant" are stopwords. After passing to the function , my output should be as "am feeling good , talk". What I a getting now is [['am','feeling','good','talk']]. Help me.

    Read the article

  • Removing words from a file

    - by user1765792
    I'm trying to take a regular text file and remove words identified in a separate file (stopwords) containing the words to be removed separated by carriage returns ("\n"). Right now I'm converting both files into lists so that the elements of each list can be compared. I got this function to work, but it doesn't remove all of the words I have specified in the stopwords file. Any help is greatly appreciated. def elimstops(file_str): #takes as input a string for the stopwords file location stop_f = open(file_str, 'r') stopw = stop_f.read() stopw = stopw.split('\n') text_file = open('sample.txt') #Opens the file whose stop words will be eliminated prime = text_file.read() prime = prime.split(' ') #Splits the string into a list separated by a space tot_str = "" #total string i = 0 while i < (len(stopw)): if stopw[i] in prime: prime.remove(stopw[i]) #removes the stopword from the text else: pass i += 1 # Creates a new string from the compilation of list elements # with the stop words removed for v in prime: tot_str = tot_str + str(v) + " " return tot_str

    Read the article

  • Exception when indexing text documents with Lucene, using SnowballAnalyzer for cleaning up

    - by Julia
    Hello!!! I am indexing the documents with Lucene and am trying to apply the SnowballAnalyzer for punctuation and stopword removal from text .. I keep getting the following error :( IllegalAccessError: tried to access method org.apache.lucene.analysis.Tokenizer.(Ljava/io/Reader;)V from class org.apache.lucene.analysis.snowball.SnowballAnalyzer Here is the code, I would very much appreciate help!!!! I am new with this.. public class Indexer { private Indexer(){}; private String[] stopWords = {....}; private String indexName; private IndexWriter iWriter; private static String FILES_TO_INDEX = "/Users/ssi/forindexing"; public static void main(String[] args) throws Exception { Indexer m = new Indexer(); m.index("./newindex"); } public void index(String indexName) throws Exception { this.indexName = indexName; final File docDir = new File(FILES_TO_INDEX); if(!docDir.exists() || !docDir.canRead()){ System.err.println("Something wrong... " + docDir.getPath()); System.exit(1); } Date start = new Date(); PerFieldAnalyzerWrapper analyzers = new PerFieldAnalyzerWrapper(new SimpleAnalyzer()); analyzers.addAnalyzer("text", new SnowballAnalyzer("English", stopWords)); Directory directory = FSDirectory.open(new File(this.indexName)); IndexWriter.MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED; iWriter = new IndexWriter(directory, analyzers, true, maxLength); System.out.println("Indexing to dir..........." + indexName); if(docDir.isDirectory()){ File[] files = docDir.listFiles(); if(files != null){ for (int i = 0; i < files.length; i++) { try { indexDocument(files[i]); }catch (FileNotFoundException fnfe){ fnfe.printStackTrace(); } } } } System.out.println("Optimizing...... "); iWriter.optimize(); iWriter.close(); Date end = new Date(); System.out.println("Time to index was" + (end.getTime()-start.getTime()) + "miliseconds"); } private void indexDocument(File someDoc) throws IOException { Document doc = new Document(); Field name = new Field("name", someDoc.getName(), Field.Store.YES, Field.Index.ANALYZED); Field text = new Field("text", new FileReader(someDoc), Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(name); doc.add(text); iWriter.addDocument(doc); } }

    Read the article

1