Search Results

Search found 28 results on 2 pages for 'nltk'.

Page 1/2 | 1 2  | Next Page >

  • Unable to import nltk in NetBeans

    - by afs
    Hello all, I am trying to import NLTK in my python code and I get this error: Traceback (most recent call last): File "/home/afs/NetBeansProjects/NER/getNE_followers.py", line 7, in import nltk ImportError: No module named nltk I am using NetBeans: 6.7.1, Python 2.6 NLTK. My NLTK module is installed in /usr/local/lib/python2.6/dist-packages/nltk/ and I have added this in Python paths in Netbeans. What am I missing here? Thanks in advance.

    Read the article

  • Sentiment analysis with NLTK python for sentences using sample data or webservice?

    - by Ke
    I am embarking upon a NLP project for sentiment analysis. I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task. Here is my task: I start with one long piece of data (lets say several hundred tweets on the subject of the UK election from their webservice) I would like to break this up into sentences (or info no longer than 100 or so chars) (I guess i can just do this in python??) Then to search through all the sentences for specific instances within that sentence e.g. "David Cameron" Then I would like to check for positive/negative sentiment in each sentence and count them accordingly NB: I am not really worried too much about accuracy because my data sets are large and also not worried too much about sarcasm. Here are the troubles I am having: All the data sets I can find e.g. the corpus movie review data that comes with NLTK arent in webservice format. It looks like this has had some processing done already. As far as I can see the processing (by stanford) was done with WEKA. Is it not possible for NLTK to do all this on its own? Here all the data sets have already been organised into positive/negative already e.g. polarity dataset http://www.cs.cornell.edu/People/pabo/movie-review-data/ How is this done? (to organise the sentences by sentiment, is it definitely WEKA? or something else?) I am not sure I understand why WEKA and NLTK would be used together. Seems like they do much the same thing. If im processing the data with WEKA first to find sentiment why would I need NLTK? Is it possible to explain why this might be necessary? I have found a few scripts that get somewhat near this task, but all are using the same pre-processed data. Is it not possible to process this data myself to find sentiment in sentences rather than using the data samples given in the link? Any help is much appreciated and will save me much hair! Cheers Ke

    Read the article

  • Is there any way to add a new location to the list of places where nltk looks for the wordnet corpus?

    - by Programming Noob
    I can't use the nltk wordnet lemmatizer because I can't download the wordnet corpus on my university computer due to access rights issues. I get the following error when I try to do so: ********************************************************************** Resource 'corpora/wordnet' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - '/home/XX/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** When I had the same issue at home, I could resolve it by two ways: Using nltk.download(), the standard way and Creating a new folder at location /home/XX/nltk_data and just pasting the corpus directory inside it. Now at the university I only have access to /home/XX/bin and not /home/XX directly. So is there anyway I could paste the wordnet corpus into /home/XX/bin and then somehow make nltk look for the corpus in that folder?

    Read the article

  • Natural language processing - Ideas for beginner's projects

    - by Microkernel
    Hi guys, I am a beginner in NLP and NLTK. I am very interested in NLP and hence joined a weekend course on AI in some local institution, which requires me to do a project for completion of the course, and I decided to do it in NLP. The problem is,the instructor is not good at all for this course (According to me she is just a charlatan) (or may not be very interested in teaching as this is her last batch here after which the institute is going to send her out). So I am stuck in a situation where where I got to finish this project in a month to one and half months period, but as a naive person in the field I am feeling it very difficult to comprehend the things required to decide on project. (Also, as I am working full time, I am not finding enough time to dedicate on this). I considered using NLTK toolkit in python for the project for following reasons. (1) Python is famous for ease of use, rapid prototyping and very active community (considering very short span of time I have, and as I am a C programmer by profession, I need a language that I can learn fast and is simple to use). (2) NLTk has good review, and extensive documentation and a very active community. So the problem is what project should I take up, so that I can learn something and will be able to finish project in time. (I know almost nothing in NLP, don't even know what exactly corpora is... :( ) So, please suggest me some topics that I should consider for the project. Regards, MicroKernel :)

    Read the article

  • Efficient Context-Free Grammar parser, preferably Python-friendly

    - by Max Shawabkeh
    I am in need of parsing a small subset of English for one of my project, described as a context-free grammar with (1-level) feature structures (example) and I need to do it efficiently . Right now I'm using NLTK's parser which produces the right output but is very slow. For my grammar of ~450 fairly ambiguous non-lexicon rules and half a million lexical entries, parsing simple sentences can take anywhere from 2 to 30 seconds, depending it seems on the number of resulting trees. Lexical entries have little to no effect on performance. Another problem is that loading the (25MB) grammar+lexicon at the beginning can take up to a minute. From what I can find in literature, the running time of the algorithm used to parse such a grammar (Earley or CKY) should be linear to the size of the grammar and cubic to the size of the input token list. My experience with NLTK indicates that ambiguity is what hurts the performance most, not the absolute size of the grammar. So now I'm looking for a CFG parser to replace NLTK. I've been considering PLY but I can't tell whether it supports feature structures in CFGs, which are required in my case, and the examples I've seen seem to be doing a lot of procedural parsing rather than just specifying a grammar. Can anybody show me an example of PLY both supporting feature structs and using a declarative grammar? I'm also fine with any other parser that can do what I need efficiently. A Python interface is preferable but not absolutely necessary.

    Read the article

  • Text mining with PHP

    - by garyc40
    Hi, I'm doing a project for a college class I'm taking. I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree. However, I can't find any PHP library that helps me do some serious language processing. Python has NLTK (http://www.nltk.org). Is there anything like that for PHP? I'm planning to use WEKA as the back end of the web app (by calling Weka in command line from within PHP), but it doesn't seem that efficient. Do you have any idea what I should use for this project? Or should I just switch to Python? Thanks

    Read the article

  • installing NUMPY for Mac OSX 10.7. (Lion) for use with Python

    - by user1744871
    I Need to use nltk and numpy with Python. I am a newbie to Python and initially used the python 2.7.3 that came with my mac (currently running OSX 10.7.5). I learned that the apple version of python may not be robust so I downloaded the standard version from python.org. I have downloaded the nltk program for Mac OSX 10.7. This seemed to install fine. I am trying to download and install numpy using the instructions from the scipy website. http://www.scipy.org/Installing_SciPy/Mac_OS_X. I cloned numpy from github but when I tried to build it using the following command $ python setup.py build I received the following error MacOS/Python: can't open file 'setup.py': [Errno 2] No such file or directory I also tried to build it using the scons command $ python setupscons.py scons --jobs=2 and received the following error /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: can't open file 'setupscons.py': [Errno 2] No such file or directory Can anyone think of a possible workaround?

    Read the article

  • How to extract common / significant phrases from a series of text entries

    - by arronsky
    I have a series of text items- raw HTML from a MYSQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com, that shows 3 snippets from hundreds of reviews of a given restaurant, in the format: "Try the hamburger" (in 44 reviews) e.g., the "Review Highlights" section of this page: http://www.yelp.com/biz/sushi-gen-los-angeles/ I have NLTK installed and I've played around with it a bit, but am honestly overwhelmed by the options. This seems like a rather common problem and I haven't been able to find a straightforward solution by searching here. Thanks in advance for any help.

    Read the article

  • Java or Python distributed compute job (on a student budget)?

    - by midget_sadhu
    I have a large dataset (c. 40G) that I want to use for some NLP (largely embarrassingly parallel) over a couple of computers in the lab, to which i do not have root access, and only 1G of user space. I experimented with hadoop, but of course this was dead in the water-- the data is stored on an external usb hard drive, and i cant load it on to the dfs because of the 1G user space cap. I have been looking into a couple of python based options (as I'd rather use NLTK instead of Java's lingpipe if I can help it), and it seems distributed compute options look like: Ipython DISCO After my hadoop experience, i am trying to make sure i try and make an informed choice -- any help on what might be more appropriate would be greatly appreciated. Amazon's EC2 etc not really an option, as i have next to no budget.

    Read the article

  • How to identify ideas and concepts in a given text

    - by Nick
    I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Balzac is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph? It'd be great to be able to detect that the person has asked for a photograph of Mr Balzac. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like: Please, never send me a photo of Mr Balzac. Does anyone know where to start with this? Is it even possible? I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great. Thanks!

    Read the article

  • How to classify NN/NNP/NNS obtained from POS tagged document as a product feature

    - by Shweta .......
    I'm planning to perform sentiment analysis on reviews of product features (collected from Amazon dataset). I have extracted review text from the dataset and performed POS tagging on that. I'm able to extract NN/NNP as well. But my doubt is how do I come to know that extracted words classify as features of the products? I know there are classifiers in nltk but I don't know how I should use it for my project. I'm assuming there are 2 ways of finding whether the extracted word is a product feature or not. One is to compare with a bag of words and find out if my word exists in that. Doubt: How do I create/get bag of words? Second way is to implement some kind of apriori algorithm to find out frequently occurring words as features. I would like to know which method is good and how to go about implementing it. Some pointers to available softwares or code snippets would be helpful! Thanks!

    Read the article

  • difference between a lexical, morphological and semantic mistakes?

    - by AnH
    Hi there, I just want to make sure that I understood the differences between a lexical, morphological and semantic mistakes correctly. If am not mistaken semantic mistakes deal with problems concerning meanings, for example writing a sentence that is correct grammar wise but doesn't make any sense is a semantic mistake, morphological mistakes deals more or less on how the word should look like, for example childrenS is a morphological mistake, so that leaves lexical mistakes, what are those exactly?? can someone sum up the differences between those 3 mistakes please so that I may know for sure that I got them down correctly? Thank you in advance

    Read the article

  • problem using pydoc in python

    - by rohanag
    I'm using pydoc in python 2.7.3 to generate documentation for a file called PreProcessingAPI.py which contains a class called PreProcessingAPI In PreProcessingAPI.py, I have the following import in the beginning of the file: from __future__ import division from re import * from nltk.stem import porter The problem is, in the documentation generated by pydoc, nltk.stem.porter is shown as a Module. There is also a DATA heading with all sorts of variables I do not know about. Is there a way to avoid these variables and avoid showing nltk.stem.porter in the modules? I'm running the following command to generate documentation python pydoc.py -w PreProcessingAPI.py I've put the file pydoc.py in the directory containing my file. Here is the file generated: https://www.dropbox.com/s/4rb6ut99o25mwly/PreProcessingAPI.html

    Read the article

  • Pip installs on Archlinux fails to build egg

    - by stmfunk
    I was trying to install nltk on my Archlinux server but it repeatedly fails with the following error output /usr/lib/python3.3/distutils/dist.py:257: UserWarning: Unknown distribution option: 'entry_points' warnings.warn(msg) /usr/lib/python3.3/distutils/dist.py:257: UserWarning: Unknown distribution option: 'zip_safe' warnings.warn(msg) /usr/lib/python3.3/distutils/dist.py:257: UserWarning: Unknown distribution option: 'test_suite' warnings.warn(msg) usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help error: invalid command 'bdist_egg' /tmp/pip_build_root/nltk/distribute-0.6.21-py3.3.egg Traceback (most recent call last): File "./distribute_setup.py", line 143, in use_setuptools raise ImportError ImportError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 16, in File "/tmp/pip_build_root/nltk/setup.py", line 23, in distribute_setup.use_setuptools() File "./distribute_setup.py", line 145, in use_setuptools return _do_download(version, download_base, to_dir, download_delay) File "./distribute_setup.py", line 125, in _do_download _build_egg(egg, tarball, to_dir) File "./distribute_setup.py", line 116, in _build_egg raise IOError('Could not build the egg.') OSError: Could not build the egg. ---------------------------------------- Cleaning up... Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/nltk Storing complete log in /root/.pip/pip.log This error is also occurring for matplotlib buts thats the only other library I found it to fail on so far. pyyaml installs fine. The install works perfectly under virtualenv on my mac which is using python 2.7 but the server is using python 3.3. Any help is appreciated.

    Read the article

  • running a python script where dependencies are not avail: distributed computing

    - by sadhu_
    Hi, I have access to a grid (running condor) that would (potentially) allow to very substantially reduce how long by nltk based nlp tasks take. unfortunately, i dont have root access on the cluster so cannot install new packages, only run whatever is available on the linux boxes. python is of course available, but nltk isnt - i was wondering however, if there might be a way around this somehow ? is there a way i can somehow still distribute the task in a self-contained 'package' of some sort? Thanks for your hel

    Read the article

  • Is sticking to one language a good practice?

    - by Ans
    I'm developing a pipeline for processing text that will go into production. The question I keep asking myself is: should I stick to one language when looking for a tool to do a particular task (e.g. NLTK, PDFMiner, CLD, CRFsuite, etc.)? Or is it OK to mix and match looking for the best tool regardless of what language it's written in (e.g. OpenNLP, ParsCit, poppler, CFR++, etc.) and warp my code around them?

    Read the article

  • How would I go about measuring the impact an article has on the internet?

    - by Jimbo Mombasa
    For an application of mine, I analyze the sentiment of articles, using NLTK, to display sentiment trends. But right now all articles weigh the same amount. This does not show a very accurate picture because some articles have a higher impact on the internet than others. For example, a blog post from some unknown blog should not weigh the same amount as an article from the New York Times. How can I determine their impact?

    Read the article

  • Is sticking to one language on a particular project a good practice?

    - by Ans
    I'm developing a pipeline for processing text that will go into production. The question I keep asking myself is: should I stick to one language for the project when I'm looking for a tool to do a particular task (e.g. NLTK, PDFMiner, CLD, CRFsuite, etc.)? Or is it OK to mix and match languages on the project? So I pick the best tool regardless of what language it's written in (e.g. OpenNLP, ParsCit, poppler, CFR++, etc.) and warp (wrap) my code around it? Note, I am not asking about should a developer stick to just one language for their career.

    Read the article

  • Russian-to-English Parallel Word Corpus?

    - by Cygorger
    Hi: I am looking for a simple Russian to English word corpus. It can be as simple as a csv that lists a russian word in the first column and the equivalent English word in the second. Any ideas where I can find such a thing? Does the NLTK toolkit have something like this? Thanks

    Read the article

  • Any Naive Bayesian Classifier in python?

    - by asldkncvas
    Dear Everyone I have tried the Orange Framework for Naive Bayesian classification. The methods are extremely unintuitive, and the documentation is extremely unorganized. Does anyone here have another framework to recommend? I use mostly NaiveBayesian for now. I was thinking of using nltk's NaiveClassification but then they don't think they can handle continuous variables. What are my options?

    Read the article

  • Problem with python class

    - by Tasbeer
    Hi I am new to Python and as a part of my assignment I have written the following class import nltk.stem.api class BanglaStemmer(nltk.stem.api.StemmerI): suffixList = ['\xef\xbb\xbf\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbe\xe0\xa6\xae\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbf\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb8\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa7\x87\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x87\xe0\xa6\x9b\n', '\xe0\xa6\xbf\xe0\xa7\x9f\xe0\xa7\x8b\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbe\xe0\xa6\xae\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbf\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb8\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa6\xbf\n', '\xe0\xa7\x87\xe0\xa6\x9b\xe0\xa7\x87\n', '\xe0\xa7\x87\xe0\xa6\x9b\n', '\xe0\xa6\xa4\xe0\xa7\x87\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbe\xe0\xa6\xae\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbf\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb8\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa6\xbf\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\xe0\xa7\x87\n', '\xe0\xa6\x9a\xe0\xa7\x8d\xe0\xa6\x9b\n', '\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbe\xe0\xa6\xae\n', '\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa7\x87\n', '\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\xe0\xa6\xbf\n', '\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb2\n', '\xe0\xa6\x9b\xe0\xa6\xbf\xe0\xa6\xb8\n', '\xe0\xa6\x9b\xe0\xa6\xbf\n', '\xe0\xa6\x9b\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\x9b\n', '\xe0\xa6\xa4\xe0\xa6\xbf\xe0\xa6\xb8\n', '\xe0\xa6\xa4\xe0\xa6\xbe\xe0\xa6\xae\n', '\xe0\xa6\xb2\xe0\xa6\xbe\xe0\xa6\xae\n', '\xe0\xa6\xb2\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\xa4\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\xac\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa7\x87\xe0\xa6\xa8\n', '\xe0\xa6\xbf\xe0\xa6\xb8\n', '\xe0\xa7\x81\xe0\xa6\xa8\n', '\xe0\xa7\x81\xe0\xa6\x95\n', '\xe0\xa6\xb2\xe0\xa7\x87\n', '\xe0\xa6\xac\xe0\xa7\x87\n', '\xe0\xa6\xb2\xe0\xa6\xbf\n', '\xe0\xa6\xac\xe0\xa6\xbf\n', '\xe0\xa6\xa4\xe0\xa6\xbf\n', '\xe0\xa6\xb2\n', '\xe0\xa6\xa4\n', '\xe0\xa7\x8b\n', '\xe0\xa6\xbf\n', '\xe0\xa7\x87\n', '\xe0\xa7\x8d\n', '\xe0\xa6\x87\n', '\xe0\xa6\xac\n', '\xe0\xa6\xb8\n', '\xe0\xa6\xa8\n', '\xe0\xa6\x95\n', '\xe0\xa6\x93\n', '\xe0\xa7\x9f\n'] def stem(self,token): for suffix in suffixList: if token.endswith(suffix): return token[:-len(suffix)] return token The problem is that when I try to compile run it by creating an instance and calling the stem() function with a parameter , it says that the suffixList is not defined. Couldn't figure out what's the problem. Is there a different way in which the class variables have to be declared ? please help

    Read the article

  • Conventions for search result scoring

    - by DeaconDesperado
    I assume this type of question is more on-topic here than on regular SO. I have been working on a search feature for my team's web application and have had a lot of success building a multithreaded, "divide and conquer" processing system to work through a large amount of fulltext. Our problem domain is pretty specific. Users of the app generate posts, and as a general rule, posts that are more recent are considered to be of greater relevance. Some of the data we are trying to extract from search is very specific (user's feelings about specific items or things) and we are using python nltk to do named-entity extraction to find interesting likely query terms. Essentially we look for descriptive adjective-noun pairs and generate a general picture of a user's expressed sentiment as a list of tokens. This search is intended as an internal tool for our team to draw out a local picture of sentiments like "soggy pizza." There's some machine learning in there too to do entity resolution on terms like "soggy" to all manner of adjectives expressing nastiness. My problem is I am at a loss for how to go about scoring these results. The text being searched is split up into tokens in a list, so my initial approach would be to normalize a float score between 0.0-1.0 generated off of how far into the list the terms appear and how often they are repeated (a later mention of the term being worth less, earlier more, greater frequency-greater score, etc.) A certain amount of weight could be given to the timestamp as well, though I am not certain how to calculate this. I am curious if anyone has had to solve a similar problem in a search relevance grading between appreciable metrics (frequency, term location/colocation, recency) and if there are and guidelines for how to weight each. I should mention as well that the final fallback procedure in the search is to pipe the query to Sphinx, which has its own scoring practices. Sphinx operates as the last resort in case our application specific processing can't find any eligible candidates.

    Read the article

1 2  | Next Page >