Search Results

Search found 3 results on 1 pages for 'adi92'.

Page 1/1 | 1 

  • Ngram IDF smoothing

    - by adi92
    I am trying to use IDF scores to find interesting phrases in my pretty huge corpus of documents. I basically need something like Amazon's Statistically Improbable Phrases, i.e. phrases that distinguish a document from all the others The problem that I am running into is that some (3,4)-grams in my data which have super-high idf actually consist of component unigrams and bigrams which have really low idf.. For example, "you've never tried" has a very high idf, while each of the component unigrams have very low idf.. I need to come up with a function that can take in document frequencies of an n-gram and all its component (n-k)-grams and return a more meaningful measure of how much this phrase will distinguish the parent document from the rest. If I were dealing with probabilities, I would try interpolation or backoff models.. I am not sure what assumptions/intuitions those models leverage to perform well, and so how well they would do for IDF scores. Anybody has any better ideas?

    Read the article

  • Converting an empty string into nil in Ruby

    - by adi92
    I have a string called word and a function called infinitive such that word.infinitive would return another string on some occasions and an empty string otherwise I am trying to find an elegant ruby one line expression for the code-snippet below if word.infinitive == "" return word else return word.infinitive Had infinitive returned nil instead of "", I could have done something like (word.infinitive or word) But since it does not, I can't take advantage of the short-circuit OR Ideally I would want 1) a single expression that I could easily embed in other code 2) the function infinitive being called only once 3) to not add any custom gems or plugins into my code

    Read the article

  • Making fscanf Ignore Optional Parameter

    - by adi92
    I am using fscanf to read a file which has lines like Number <-whitespace- string <-whitespace- optional_3rd_column I wish to extract the number and string out of each column, but ignore the 3rd_column if it exists Example Data: 12 foo something 03 bar 24 something #randomcomment I would want to extract 12,foo; 03,bar; 24, something while ignoring "something" and "#randomcomment" I currently have something like while(scanf("%d %s %*s",&num,&word)>=2) { assign stuff } However this does not work with lines with no 3rd column. How can I make it ignore everything after the 2nd string?

    Read the article

1