nlp - Developer IT

Persisting NLP parsed data

- by tjb1982

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

Starting out NLP - Python + large data set

- by pencilNero

Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the past unfort). But I'm guessing sqlite is not the way to go for a full blown nlp project(/experiment :) - what would be the sort of things I should look at ? HBase (.. and hadoop) seem interesting, i guess i could run then im java, prototype in python and maybe migrate the really slow bits to java... alternatively just run Mysql.. but the dataset is 12gb, i wonder if that will be a problem? Also looked at lucene, but not sure how (other than breaking the wiki articles into chunks) i'd get that to work.. What comes to mind for a really flexible NLP platform (i dont really know at this stage WHAT i want to do.. just want to learn large scale lang analysis tbh) ? Many thanks.

Read the article

NLP Library in java

- by user337962

hi, I need a simple Natural Language Processing library written in java which can be used to process a search query/question. What I want actually is to separate the main subject which is being searched in a query. For an example, considering a query like "What is an apple?", it's perfect if the main search word apple can be extracted. This is for a semantic search engine development purpose. Can anyone please suggest a suitable nlp library for this?? Thank You!!

Read the article

NLP with greatly contrained input and abilities

- by Mike F

Hat in hand here. I'm a seasoned developer and I would be grateful for a bit of help. I don't have time to read or digest long intricate discussions on theoretical concepts around NLP (or go get my PHD). That said, I have read a few and it's a damn interesting field. The problem is I need real world solutions, for real world products, in real world time frames. The problem I'm having is right now I'm not sure what the right questions are to ask to get started implementing. I believe this is mostly related to vocabulary. I'll read somewhere, a blog post, a forum post, a whitepaper, and it says, I'm doing flooping with the blargy blarg method, and I go google flooping and blargy blarg, and I get references to more obscurity. It seemingly never ends. So, my question is multiphased. First, more generally, how do I become passingly educated on this quickly? Just in time educated. I only need to know what I need to know to take the next step. I've spent 20 years writing code. Explain quick. I'll get it. (I mean provide a reference to something that explains quickly of course). I'm happy to read the right book, but I don't want to read a book where I read the chapter introduction that explains what floopy floop is and then skip over the rest of the chapter with examples of floopy flooping (because now I get what it is). I also don't want to read a book that goes into too much detail with theoretical underpinnings or history. For example, the Jurafsky book seems like way more than I need: http://www.amazon.com/gp/product/0131873210. But I will read it if this is the right book to read. (It's also dang expensive!) I need the root node of the expedited learning tree here, if you will. Point me in the right direction and I'll be quite grateful. I'm expecting quite a lot of firehose drinking - I just need the right firehose. Second, what I need to do is take a single sentence, with a very reduced vocabulary, and get a grammar tree (sorry if this is the wrong terminology) that I can do something with. I know I could easily write this command line input style in c in a more conventional manner, but I need it to be way better than that. But I don't need a chatterbot either. What I'm doing needs to live in a constrained environment. I can't use Python (unfortunately). I can't ship with gigabytes of corpuses. I need any libraries I use to be in c/c++. If I have to write this myself, I will. Hopefully, it will be achievable considering the reduced problem set. Maybe, probably, that's just naive. If so, let me know. :-) Thanks in advance - Mike

Read the article

stanford pos tagger runs out of memory?

- by goh

my stanford tagger ran out of memory. Is it because the text has to be properly formatted? This is because i use it to tag html contents, with the tags stripped, but there may have quite a excessive amount of newlines. here is the error: BlockquoWARNING: Untokenizable: ? (char in decimal: 9829) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(Ex actBestSequenceFinder.java:175) at edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(Exact BestSequenceFinder.java:98) at edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSente nce.java:277) at edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSent ence.java:258) at edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence. java:110) at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger. java:825) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1319) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1225) at edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.ja va:1183) at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:13 58)

Read the article

NLP - Queries using semantic wildcards in full text searching, maybe with Lucene?

- by Zsolt

Let's say I have a big corpus (for example in english or an arbitrary language), and I want to perform some semantic search on it. For example I have the query: "Be careful: [art] armada of [sg] is coming to [do sg]!" And the corpus contains the following sentence: "Be careful: an armada of alien ships is coming to destroy our planet!" It can be seen that my query string could contain "semantic placeholders", such as: [art] - some placeholder for articles (for example a / an in English) [sg], [do sg] - some placeholders for NPs and VPs (subjects and predicates) I would like to develop a library which would be capable to handle these queries efficiently. I suspect that some kind of POS-tagging would be necessary for parsing the text, but because I don't want to fully reimplement an already existing full-text search engine to make it work, I'm considering that how could I integrate this behaviour into a search engine like Lucene? I know there are SpanQueries which could behave similarly in some cases, but as I can see, Lucene doesn't do any semantic stuff with stored texts. It is possible to implement a behavior like this? Or do I have to write an own search engine?

Read the article

NLP - Word Alignment

- by mgj

Hi..:) I am looking for word alignment tools and algorithms, I am dealing with bilingual English - Hindi text, Currently I am working on DTW(Dynamic Time Warping) algorithm, CLA(Competitive Linking Algorithm) , NATool, Giza++. Could you please suggest me any other alogrithm/tool which is language independent which could achieve Statistical word alignment for parallel English Hindi Corpora and its Evaluation, some tools languages are best for certain languages.. Could one please tell me how true is that and if so could you please give me an example what would suite better for Asian languages like Hindi and what shouldn't I use for such languages. I have heard a bit about uplug word aligner.. could one tell me if I could use it as a tool for my purpose. Thank you.. :)

Read the article

AGFL npx grammar nlp techniques dependency parsing

- by Lily

Hi I am trying to obtain a dependency parse tree using AGFL. Unfortunately I cannot understand how to derive this. I am trying to generate the npx grammar but I am still lost can someone help me please? Thanks :) L

Read the article

how to get wanted nodes from Stanford Parser nlp

- by vitaly

Hello all! My mean problem is that I dont know how to extract nodes from GrammaticalStructure. I am using englishPCFG.ser in java netbeans. My target is o know the quality of the screen like: the screen of iphone 4 is great. I want to extract screen and great. how can i extract the NN (screen) and VP (great) the code that I wrote is: LexicalizedParser lp = new LexicalizedParser("C:\\englishPCFG.ser"); lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"}); String sent ="the screen is very good."; Tree parse = (Tree) lp.apply(Arrays.asList(sent)); parse.pennPrint(); System.out.println(); TreebankLanguagePack tlp = new PennTreebankLanguagePack(); GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); Collection tdl = gs.typedDependenciesCollapsed();

Read the article

Why is Prolog associated with Natural Language Processing?

- by kyphos

I have recently started learning about NLP with python, and NLP seems to be based mostly on statistics/machine-learning. What does a logic programming language bring to the table with respect to NLP? Is the declarative nature of prolog used to define grammars? Is it used to define associations between words? That is, somehow mine logical relationships between words (this I imagine would be pretty hard to do)? Any examples of what prolog uniquely brings to NLP would be highly appreciated.

Read the article

using Dependency Parser in Stanford coreNLP

- by Eddie Dovzhik

I am using the Stanford coreNLP ( http://nlp.stanford.edu/software/corenlp.shtml ) in order to parse sentences and extract dependencies between the words. I have managed to create the dependencies graph like in the example in the supplied link, but I don't know how to work with it. I can print the entire graph using the toString() method, but the problem I have is that the methods that search for certain words in the graph, such as getChildList, require an IndexedWord object as a parameter. Now, it is clear why they do because the nodes of the graph are of IndexedWord type, but it's not clear to me how I create such an object in order to search for a specific node. For example: I want to find the children of the node that represents the word "problem" in my sentence. How I create an IndexWord object that represents the word "problem" so I can search for it in the graph?

Read the article

I have a list of names, some of them are fake, I need to use NLP and Python 3.1 to keep the real nam

- by Sho Minamimoto

I have no clue of where to start on this. I've never done any NLP and only programmed in a Python 3.1, which I have to use. I'm looking at the site http://www.linkedin.com and I have to gather all of the public profiles and some of them have very fake names, like 'aaaaaa k dudujjek' and I've been told I can use NLP to find the real names, where would I even start?

Read the article

Persisting natural language processing parsed data

- by tjb1982

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article

Stanford Parser - Traversing the typed dependencies graph

- by pns

Hello! Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much

Read the article

Simple NLP: How to use ngram to do word similarity?

- by sadawd

Dear Everyone, I Hear that google uses up to 7-grams for their own data. I am interested in finding words that are similar in context (i.e. cat and dog) and I was wondering how do I compute the similarity of two words on a n-gram model given that n 2. Given a sample set like this forexample: (I, love cats), (cats, loves, dogs), (dogs, hate, human) What is a good way to compare the similarity of this pair (I, cats)? Also does anyone know of anyway to do levels for NLP? like: Army-Military-Solider ?

Read the article

NLP: any easy and good methods to find semantic similarity between words?

- by sadawd

Dear Everyone, I don't know whether stackoverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of a cameras are positive or negative for a particular attribute of the camera. (like image quality in each one of the reviews). However, not everybody uses the exact same wording "image quality" in the posts, so I am out to see if there is a way for me to build something like that: "image quality" which includes ("noise", "color", "sharpness", etc etc) so I can wrap all everything within one big umbrella. I am doing this for another language, so Wordnet is not necessarily helpful. And no, I do now work for Google or Microsoft so I do not have data from people's clicking behavior as input data either. However, I do have a lot of text, pos-tagged, segmented etc. Thanks

Read the article

Which programming language could I use for Natural Language Processing to extract clinical words?

- by MACEE

I am going to do entity extraction (like Named Entity Recognition) from clinical free text (unstructured raw text such as discharge summaries) and these entities could be any medical problem, medical tests or treatments. I am going to use one of i2b2 datasets (https://www.i2b2.org/) if case you are familiar with that. I am new to the NLP(Natural Language Processing) field and I need a programming language to support NLP tasks and also easily connect to the available libraries of machine learning algorithms like CRF. I don't know much java and I heard about Python, Perl and Scala but I am not sure which one would be the best option for this task?

Read the article

How to sift idioms and set phrases apart from other common phrases using NLP techniques?

- by hippietrail

What techniques exist that can tell the difference betwen plain common phrases such as "to the", "and the" and set phrases and idioms which have their own lexical meanings such as "pick up", "fall in love", "red herring", "dead end"? Are there techniques which are successful even without a dictionary, statistical methods HMMs train on large corpora for instance? Or are there heuristics such as ignoring or weighting down "promiscuous" words which can co-occur with just about any word versus words which occur either alone or in a specific limited set of idiomatic phrases? If there are such heuristics, how do we take into account set phrases and verbal phrases which do incorporate promiscuous words such as "up" in "beat up", "eat up", "sit up", "think up"? UPDATE I've found an interesting paper online: Unsupervised Type and Token Identi?cation of Idiomatic Expressions

Read the article

Construct sentences from tabular data

- by Sumeet

I have a huge set of html files an have to retrieve the meaningful information from them. Most of the task is accomplished, now the problem is with HTML tables. I have some literature on how to extract meaningful tables from html, but my problem is with creating meaningful sentences from tabular data (or attribute value pairs extracted from a table). Are there any NLP/Machine learning techniques to do this? Here is what I expect. Suppose below is a sample table: col_Name: Sumeet col_year: 2011 col_winner: quiz Can this be made to something meaningful like "Sumeet won quiz in 2011"?

Read the article

How to create a Semantic Network like wordnet based on Wikipedia?

- by Forbidden Overseer

I am an undergraduate student and I have to create a Semantic Network based on Wikipedia. This Semantic Network would be similar to Wordnet(except for it is based on Wikipedia and is concerned with "streams of text/topics" rather than simple words etc.) and I am thinking of using the Wikipedia XML dumps for the purpose. I guess I need to learn parsing an XML and "some other things" related to NLP and probably Machine Learning, but I am no way sure about anything involved herein after the XML parsing. Is the starting step: XML dump parsing into text a good idea/step? Any alternatives? What would be the steps involved after parsing XML into text to create a functional Semantic Network? What are the things/concepts I should learn in order to do them? I am not directly asking for book recommendations, but if you have read a book/article that teaches any thing related/helpful, please mention them. This may include a refernce to already existing implementations regarding the subject. Please correct me if I was wrong somewhere. Thanks!

Read the article

natural language processing internships

- by user552127

Hi All, Pls someone guide me in finding paid Grad internships in Natural Language Processing over the summer. I am really interested in NLP/ML and have taken up the excellent course offered at my school in Fall. I would be glad to work for passionate startups that do actual NLP tasks such as semantic extraction (and not just information retrieval) etc. I have worked with Java and teaching myself Python in all NLP tasks. Thanks, Sanjay

Read the article

Natural language processing - Ideas for beginner's projects

- by Microkernel

Hi guys, I am a beginner in NLP and NLTK. I am very interested in NLP and hence joined a weekend course on AI in some local institution, which requires me to do a project for completion of the course, and I decided to do it in NLP. The problem is,the instructor is not good at all for this course (According to me she is just a charlatan) (or may not be very interested in teaching as this is her last batch here after which the institute is going to send her out). So I am stuck in a situation where where I got to finish this project in a month to one and half months period, but as a naive person in the field I am feeling it very difficult to comprehend the things required to decide on project. (Also, as I am working full time, I am not finding enough time to dedicate on this). I considered using NLTK toolkit in python for the project for following reasons. (1) Python is famous for ease of use, rapid prototyping and very active community (considering very short span of time I have, and as I am a C programmer by profession, I need a language that I can learn fast and is simple to use). (2) NLTk has good review, and extensive documentation and a very active community. So the problem is what project should I take up, so that I can learn something and will be able to finish project in time. (I know almost nothing in NLP, don't even know what exactly corpora is... :( ) So, please suggest me some topics that I should consider for the project. Regards, MicroKernel :)

Read the article

Sentence Tree v/s Words List

- by Rohit Jose

I was recently tasked with building a Name Entity Recognizer as part of a project. The objective was to parse a given sentence and come up with all the possible combinations of the entities. One approach that was suggested was to keep a lookup table for all the know connector words like articles and conjunctions, remove them from the words list after splitting the sentence on the basis of the spaces. This would leave out the Name Entities in the sentence. A lookup is then done for these identified entities on another lookup table that associates them to the entity type, for example if the sentence was: Remember the Titans was a movie directed by Boaz Yakin, the possible outputs would be: {Remember the Titans,Movie} was {a movie,Movie} directed by {Boaz Yakin,director} {Remember the Titans,Movie} was a movie directed by Boaz Yakin {Remember the Titans,Movie} was {a movie,Movie} directed by Boaz Yakin {Remember the Titans,Movie} was a movie directed by {Boaz Yakin,director} Remember the Titans was {a movie,Movie} directed by Boaz Yakin Remember the Titans was {a movie,Movie} directed by {Boaz Yakin,director} Remember the Titans was a movie directed by {Boaz Yakin,director} Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by {Boaz Yakin,director} Remember the {the titans,Movie,Sports Team} was a movie directed by Boaz Yakin Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by Boaz Yakin Remember the {the titans,Movie,Sports Team} was a movie directed by {Boaz Yakin,director} The entity lookup table here would contain the following data: Remember the Titans=Movie a movie=Movie Boaz Yakin=director the Titans=Movie the Titans=Sports Team Another alternative logic that was put forward was to build a crude sentence tree that would contain the connector words in the lookup table as parent nodes and do a lookup in the entity table for the leaf node that might contain the entities. The tree that was built for the sentence above would be: The question I am faced with is the benefits of the two approaches, should I be going for the tree approach to represent the sentence parsing, since it provides a more semantic structure? Is there a better approach I should be going for solving it?

Read the article

What algorithm(s) can be used to achieve reasonably good next word prediction?

- by yati sagade

What is a good way of implementing "next-word prediction"? For example, the user types "I am" and the system suggests "a" and "not" (or possibly others) as the next word. I am aware of a method that uses Markov Chains and some training text(obviously) to more or less achieve this. But I read somewhere that this method is very restrictive and applies to very simple cases. I understand basics of neural networks and genetic algorithms(though have never used them in a serious project) and maybe they could be of some help. I wonder if there are any algorithms that, given appropriate training text(e.g., newspaper articles, and the user's own typing) can come up with reasonably appropriate suggestions for the next word. If not (links to)algorithms, general high-level methods to attack this problem are welcome.

Read the article

SOLR and Natural Language Parsing - Can I use it?

- by andy

hey guys, my requirements are pretty similar to this: Requirements http://stackoverflow.com/questions/90580/word-frequency-algorithm-for-natural-language-processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOLR because: It's got a bunch of tokenizers and performs a lot of NLP. It's pretty use to use out of the box. It's restful distributed app, so it's easy to hook up I've spent some time with it, so using could save me time. Can I use Solr? Although the above reasons are good, I don't know SOLR THAT well, so I need to know if it would be appropriate for my requirements. Ideal Usage Ideally, I'd like to configure SOLR, and then be able to send SOLR some text, and retrieve the indexed tonkenized content. Context So you guys know, I'm working on a small component of a bigger recommendation engine.

Search Results

Search found 128 results on 6 pages for 'nlp'.

Page 1/6 | 1 2 3 4 5 6 | Next Page >

- by tjb1982

- by pencilNero

- by user337962

- by Mike F

- by goh

- by Zsolt

- by mgj

- by Lily

- by vitaly

- by kyphos

- by Eddie Dovzhik

- by Sho Minamimoto

- by tjb1982

- by pns

- by sadawd

- by sadawd

- by MACEE

- by hippietrail

- by Sumeet

- by Forbidden Overseer

- by user552127

- by Microkernel

- by Rohit Jose

- by yati sagade

- by andy

1 2 3 4 5 6 | Next Page >