Search Results

Search found 128 results on 6 pages for 'nlp'.

Page 4/6 | < Previous Page | 1 2 3 4 5 6  | Next Page >

  • How to estimate the quality of a web page?

    - by roddik
    Hello, I'm doing a university project, that must gather and combine data on a user provided topic. The problem I've encountered is that Google search results for many terms are polluted with low quality autogenerated pages and if I use them, I can end up with wrong facts. How is it possible to estimate the quality/trustworthiness of a page? You may think "nah, Google engineers are working on the problem for 10 years and he's asking for a solution", but if you think about it, SE must provide up-to-date content and if it marks a good page as a bad one, users will be dissatisfied. I don't have such limitations, so if the algorithm accidentally marks as bad some good pages, that wouldn't be a problem.

    Read the article

  • Python/YACC Lexer: Token priority?

    - by Rosarch
    I'm trying to use reserved words in my grammar: reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', } tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'ID', ] + list(reserved.values()) t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' if t.value in reserved.values(): t.type = reserved[t.value] return t return None However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.

    Read the article

  • POS tagger in SharpNLP

    - by C.
    I am using SharpNLP for my POS tagging: EnglishMaximumEntropyPosTagger posTagger = new EnglishMaximumEntropyPosTagger(mModelPath); String tagSentence = posTagger.TagSentence(question); I only have 3 tags. How can I load a set of Penn treebank or some other tagging tree banks to use? Thanks :)

    Read the article

  • Measuring the performance of classification algorithm

    - by Silver Dragon
    I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account. That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless. An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training. White papers, book pointers, and PDFs much appreciated!

    Read the article

  • How to write efficient code for extracting Noun phrases?

    - by Arun Abraham
    I am trying to extract phrases using rules such as the ones mentioned below on text which has been POS tagged 1) NNP - NNP (- indicates followed by) 2) NNP - CC - NNP 3) VP - NP etc.. I have written code in this manner, Can someone tell me how i can do in a better manner. List<String> nounPhrases = new ArrayList<String>(); for (List<HasWord> sentence : documentPreprocessor) { //System.out.println(sentence.toString()); System.out.println(Sentence.listToString(sentence, false)); List<TaggedWord> tSentence = tagger.tagSentence(sentence); String lastTag = null, lastWord = null; for (TaggedWord taggedWord : tSentence) { if (lastTag != null && taggedWord.tag().equalsIgnoreCase("NNP") && lastTag.equalsIgnoreCase("NNP")) { nounPhrases.add(taggedWord.word() + " " + lastWord); //System.out.println(taggedWord.word() + " " + lastWord); } lastTag = taggedWord.tag(); lastWord = taggedWord.word(); } } In the above code, i have done only for NNP followed by NNP extraction, how can i generalise it so that i can add other rules too. I know that there are libraries available for doing this , but wanted to do this manually.

    Read the article

  • Searching Natural Language Sentence Structure

    - by Cerin
    What's the best way to store and search a database of natural language sentence structure trees? Using OpenNLP's English Treebank Parser, I can get fairly reliable sentence structure parsings for arbitrary sentences. What I'd like to do is create a tool that can extract all the doc strings from my source code, generate these trees for all sentences in the doc strings, store these trees and their associated function name in a database, and then allow a user to search the database using natural language queries. So, given the sentence "This uploads files to a remote machine." for the function upload_files(), I'd have the tree: (TOP (S (NP (DT This)) (VP (VBZ uploads) (NP (NNS files)) (PP (TO to) (NP (DT a) (JJ remote) (NN machine)))) (. .))) If someone entered the query "How can I upload files?", equating to the tree: (TOP (SBARQ (WHADVP (WRB How)) (SQ (MD can) (NP (PRP I)) (VP (VB upload) (NP (NNS files)))) (. ?))) how would I store and query these trees in a SQL database? I've written a simple proof-of-concept script that can perform this search using a mix of regular expressions and network graph parsing, but I'm not sure how I'd implement this in a scalable way. And yes, I realize my example would be trivial to retrieve using a simple keyword search. The idea I'm trying to test is how I might take advantage of grammatical structure, so I can weed-out entries with similar keywords, but a different sentence structure. For example, with the above query, I wouldn't want to retrieve the entry associated with the sentence "Checks a remote machine to find a user that uploads files." which has similar keywords, but is obviously describing a completely different behavior.

    Read the article

  • Online job-searching is tedious. Help me automate it.

    - by ehsanul
    Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience. I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions. In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved. Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one. I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?

    Read the article

  • Python: How best to parse a simple grammar?

    - by Rosarch
    Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can parse most of them. From the text, I would like to generate a graph of course pre-requisite relationships. (That part will be easy, after I have parsed the data.) Some sample inputs and outputs: "CS 2110" => ("CS", 2110) # 0 "CS 2110 and INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, 3300, 3140" => [("CS", 2110), ("CS", 3300), ("CS", 3140)] # 1 "CS 2110 or INFO 3300" => [[("CS", 2110)], [("INFO", 3300)]] # 2 "MATH 2210, 2230, 2310, or 2940" => [[("MATH", 2210), ("MATH", 2230), ("MATH", 2310)], [("MATH", 2940)]] # 3 If the entire description is just a course, it is output directly. If the courses are conjoined ("and"), they are all output in the same list If the course are disjoined ("or"), they are in separate lists Here, we have both "and" and "or". One caveat that makes it easier: it appears that the nesting of "and"/"or" phrases is never greater than as shown in example 3. What is the best way to do this? I started with PLY, but I couldn't figure out how to resolve the reduce/reduce conflicts. The advantage of PLY is that it's easy to manipulate what each parse rule generates: def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = (p[1], int(p[2])) With PyParse, it's less clear how to modify the output of parseString(). I was considering building upon @Alex Martelli's idea of keeping state in an object and building up the output from that, but I'm not sure exactly how that is best done. def addCourse(self, str, location, tokens): self.result.append((tokens[0][0], tokens[0][1])) def makeCourseList(self, str, location, tokens): dept = tokens[0][0] new_tokens = [(dept, tokens[0][1])] new_tokens.extend((dept, tok) for tok in tokens[1:]) self.result.append(new_tokens) For instance, to handle "or" cases: def __init__(self): self.result = [] # ... self.statement = (course_data + Optional(OR_CONJ + course_data)).setParseAction(self.disjunctionCourses) def disjunctionCourses(self, str, location, tokens): if len(tokens) == 1: return tokens print "disjunction tokens: %s" % tokens How does disjunctionCourses() know which smaller phrases to disjoin? All it gets is tokens, but what's been parsed so far is stored in result, so how can the function tell which data in result corresponds to which elements of token? I guess I could search through the tokens, then find an element of result with the same data, but that feel convoluted... What's a better way to approach this problem?

    Read the article

  • Keyword sorting algorithm

    - by Nai
    I have over 1000 surveys, many of which contains open-ended replies. I would like to be able to 'parse' in all the words and get a ranking of the most used words (disregarding common words) to spot a trend. How can I do this? Is there a program I can use? EDIT If a 3rd party solution is not available, it would be great if we can keep the discussion to microsoft technologies only. Cheers.

    Read the article

  • A StringToken Parser which gives Google Search style "Did you mean:" Suggestions

    - by _ande_turner_
    Seeking a method to: Take whitespace separated tokens in a String; return a suggested Word ie: Google Search can take "fonetic wrd nterpreterr", and atop of the result page it shows "Did you mean: phonetic word interpreter" A solution in any of the C* languages or Java would be preferred. Are there any existing Open Libraries which perform such functionality? Or is there a way to Utilise a Google API to request a suggested word?

    Read the article

  • Python: Trouble with YACC

    - by Rosarch
    I'm parsing sentences like: "CS 2310 or equivalent experience" The desired output: [[("CS", 2310)], ["equivalent experience"]] YACC tokenizer symbols: tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'MISC_TEXT', ] t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' terms = {'DEPT_CODE': t_DEPT_CODE, 'COURSE_NUMBER': t_COURSE_NUMBER, 'OR_CONJ': t_OR_CONJ} for name, regex in terms.items(): terms[name] = "^%s$" % regex def t_MISC_TEXT(t): r'\S+' for name, regex in terms.items(): # print "trying to match %s with regex %s" % (t.value, regex) if re.match(regex, t.value): t.type = name return t return t (MISC_TEXT is meant to match anything not caught by the other terms.) Some relevant rules from the parser: precedence = ( ('left', 'MISC_TEXT'), ) def p_statement_course_data(p): 'statement : course_data' p[0] = p[1] def p_course_data(p): 'course_data : course' p[0] = p[1] def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = make_course(p[1], int(p[2])) def p_or_phrase(p): 'or_phrase : statement OR_CONJ statement' p[0] = [[p[1]], [p[3]]] def p_misc_text(p): '''text_aggregate : MISC_TEXT MISC_TEXT | MISC_TEXT text_aggregate | text_aggregate MISC_TEXT ''' p[0] = "%s %s" % (p[0], [1]) def p_text_aggregate_statement(p): 'statement : text_aggregate' p[0] = p[1] Unfortunately, this fails: # works as it should >>> token_list("CS 2110 or equivalent experience") [LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)] # fails. bummer. >>> parser.parse("CS 2110 or equivalent experience") Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11) What am I doing wrong? I don't fully understand how to set precedence rules. Also, this is my error function: def p_error(p): print "Syntax error in input: %s" % p Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?

    Read the article

  • Agile methodologies. Is it a by-product of mind control techniques as NLP / Scientology?

    - by Bobb
    The more I read about contemporary methods combinging scrum, tdd and xp, the more I feel like I already seen the methods. I am not arguing that agile approach is much more progressive than older rigid structures like waterfall, what I am saying is that it seems to me that agile methodologies are ideal to be used as a nest for a brainwashing business. I read few articles which kept referring to authors which I checked afterwards and they call themselves - coaches, trainers (usual thing when NLP specialists are involved) with no apparent software development history. Also I met a guy who is a scrum faciltator (term widly used in relation to scientology) in a high profile company. I talked to him less than 5 min but I got complete feeling that he is either on drugs or he has been programmed by a powerful NLP specialist. The way to talk and his body movements witnessed he is not an average normal person (in terms of normal distribution :))... Please dont get me wrong. I am not a fun of conspiracy theories. But I had an experience with a member of church of scientology tried to invade a commercial firm and actually went half way through to very top in just 3 weeks. I saw his work. For now I have complete impression is that psycho manipulators are now invading IT industry through the convenient door of agile techniques. Anyone has the same feeling/thoughts?

    Read the article

  • How to disable log4j logging from Java code

    - by Erel Segal Halevi
    I use a legacy library that writes logs using log4j. My default log4j.properties file directs the log to the console, but in some specific functions of my main program, I would like to disable logging altogether (from all classes). I tried this: Logger.getLogger(BasicImplementation.class.getName()).setLevel(Level.OFF); where "BasicImplementation" is one of the main classes that does logging, but it didn't work - the logs are still written to the console. Here is my log4j.properties: log4j.rootLogger=warn, stdout log4j.logger.ac.biu.nlp.nlp.engineml=info, logfile log4j.logger.org.BIU.utils.logging.ExperimentLogger=warn log4j.appender.stdout = org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout = org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern = %-5p %d{HH:mm:ss} [%t]: %m%n log4j.appender.logfile = ac.biu.nlp.nlp.log.BackupOlderFileAppender log4j.appender.logfile.append=false log4j.appender.logfile.layout = org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern = %-5p %d{HH:mm:ss} [%t]: %m%n log4j.appender.logfile.File = logfile.log

    Read the article

  • Mixed Emotions: Humans React to Natural Language Computer

    - by Applications User Experience
    There was a big event in Silicon Valley on Tuesday, November 15. Watson, the natural language computer developed at IBM Watson Research Center in Yorktown Heights, New York, and its inventor and principal research investigator, David Ferrucci, were guests at the Computer History Museum in Mountain View, California for another round of the television game Jeopardy. You may have read about or watched on YouTube how Watson beat Ken Jennings and Brad Rutter, two top Jeopardy competitors, last February. This time, Watson swept the floor with two Silicon Valley high-achievers, one a venture capitalist with a background  in math, computer engineering, and physics, and the other a technology and finance writer well-versed in all aspects of culture and humanities. Watson is the product of the DeepQA research project, which attempts to create an artificially intelligent computing system through advances in natural language processing (NLP), among other technologies. NLP is a computing strategy that seeks to provide answers by processing large amounts of unstructured data contained in multiple large domains of human knowledge. There are several ways to perform NLP, but one way to start is by recognizing key words, then processing  contextual  cues associated with the keyword concepts so that you get many more “smart” (that is, human-like) deductions,  rather than a series of “dumb” matches.  Jeopardy questions often require more than key word matching to get the correct answer; typically several pieces of information put together, often from vastly different categories, to come up with a satisfactory word string solution that can be rephrased as a question.  Smarter than your average search engine, but is it as smart as a human? Watson was especially fast at descrambling mixed-up state capital names, and recalling and pairing movie titles where one started and the other ended in the same word (e.g., Billion Dollar Baby Boom, where both titles used the word Baby). David said they had basically removed the variable of how fast Watson hit the buzzer compared to human contestants, but frustration frequently appeared on the faces of the contestants beaten to the punch by Watson. David explained that top Jeopardy winners like Jennings achieved their success with a similar strategy, timing their buzz to the end of the reading of the clue,  and “running the board”, being first to respond on about 60% of the clues.  Similar results for Watson. It made sense that Watson would be good at the technical and scientific stuff, so I figured the venture capitalist was toast. But I thought for sure Watson would lose to the writer in categories such as pop culture, wines and foods, and other humanities. Surprisingly, it held its own. I was amazed it could recognize a word definition of a syllogism in the category of philosophy. So what was the audience reaction to all of this? We started out expecting our formidable human contestants to easily run some of their categories; however, they started off on the wrong foot with the state capitals which Watson could unscramble so efficiently. By the end of the first round, contestants and the audience were feeling a little bit, well, …. deflated. Watson was winning by about $13,000, and the humans had gone into negative dollars. The IBM host said he was going to “slow Watson down a bit,” and the humans came back with respectable scores in Double Jeopardy. This was partially thanks to a very sympathetic audience (and host, also a human) providing “group-think” on many questions, especially baseball ‘s most valuable players, which by the way, couldn’t have been hard because even I knew them.  Yes, that’s right, the humans cheated. Since Watson could speak but not hear us (it didn’t have speech recognition capability), it was probably unaware of this. In Final Jeopardy, the single question had to do with law. I was sure Watson would blow this one, but all contestants were able to answer correctly about a copyright law. In a career devoted to making computers more helpful to people, I think I may have seen how a computer can do too much. I’m not sure I’d want to work side-by-side with a Watson doing my job. Certainly listening and empathy are important traits we humans still have over Watson.  While there was great enthusiasm in the packed room of computer scientists and their friends for this standing-room-only show, I think it made several of us uneasy (especially the poor human contestants whose egos were soundly bashed in the first round). This computer system, by the way , only took 4 years to program. David Ferrucci mentioned several practical uses for Watson, including medical diagnoses and legal strategies. Are you “the expert” in your job? Imagine NLP computing on an Oracle database.   This may be the user interface of the future to enable users to better process big data. How do you think you’d like it? Postscript: There were three little boys sitting in front of me in the very first row. They looked, how shall I say it, … unimpressed!

    Read the article

  • Generate or update a PDF to include an encrypted, hidden watermark?

    - by Dave Jarvis
    Background Using LaTeX to write a book. When a user purchases the book, the PDF will be generated automatically. Problem The PDF should have a watermark that includes the person's name and contact information. Question What software meets the following criteria: Applies encrypted, invisible watermarks to a PDF Open Source Platform independent (Linux, Windows) Fast (marks a 200 page PDF in under 1 second) Batch processing (exclusively command-line driven) Collusion-attack resistant Non-fragile (e.g., PDF - EPS - PDF still contains the watermark) Well documented (shows example usages) Ideas & Resources Some thoughts and findings: Natural language processing (NLP) watermarks. Apply steganography on a randomly selected image. http://openstego.sourceforge.net/cmdline.html The problem with NLP is that grammatical errors can be introduced. The problem with steganography is that the images are sourced from an image cache, and so recreating that cache with watermarked images will impart a delay when generating the PDF (I could just delete one image from the cache, but that's not an elegant solution). Thank you!

    Read the article

  • How to generate user-specific PDF with encrypted hidden watermark?

    - by Dave Jarvis
    Background Using LaTeX to write a book. When a user purchases the book, the PDF will be generated automatically. Problem The PDF should have a watermark that includes the person's name and contact information. Question What software meets the following criteria: Applies encrypted, undetectable watermarks to a PDF Open Source Platform independent (Linux, Windows) Fast (marks a 200 page PDF in under 1 second) Batch processing (exclusively command-line driven) Collusion-attack resistant Non-fragile (e.g., PDF - EPS - PDF still contains the watermark) Well documented (shows example usages) Ideas & Resources Some thoughts and findings: Natural language processing (NLP) watermarks. Apply steganography on a randomly selected image. http://openstego.sourceforge.net/cmdline.html The problem with NLP is that grammatical errors can be introduced. The problem with steganography is that the images are sourced from an image cache, and so recreating that cache with watermarked images will impart a delay when generating the PDF (I could just delete one image from the cache, but that's not an elegant solution). Thank you!

    Read the article

  • PDF Encrypted, Hidden Watermark

    - by Dave Jarvis
    Background Using LaTeX to write a book. When a user purchases the book, the PDF will be generated automatically. Problem The PDF should have a watermark that includes the person's name and contact information. Question What software meets the following criteria: Applies encrypted, undetectable watermarks to a PDF Open Source Platform independent (Linux, Windows) Fast (marks a 200 page PDF in under 1 second) Batch processing (exclusively command-line driven) Collusion-attack resistant Non-fragile (e.g., PDF - EPS - PDF still contains the watermark) Well documented (shows example usages) Ideas & Resources Some thoughts and findings: Natural language processing (NLP) watermarks. Apply steganography on a randomly selected image. http://openstego.sourceforge.net/cmdline.html The problem with NLP is that grammatical errors can be introduced. The problem with steganography is that the images are sourced from an image cache, and so recreating that cache with watermarked images will impart a delay when generating the PDF (I could just delete one image from the cache, but that's not an elegant solution). Thank you!

    Read the article

  • when does a software become "proprietary" ?

    - by wefwgeweg
    say a company is using Open source libraries, or programs, and packaging it into a proprietary solution. or perhaps, the engineers have copy pasted certain section of those open source libraries and have compiled it now, into a very useful "proprietary" software suite. what legal troubles will this company face if any ? are you allowed to do this ? i mean the customer doesn't see the source codes, only runs the binary files on their computer. for example, i find an excellent NLP library in python, and decide to use it in my program that i am selling for $4000 USD (i write like 10 lines of code and let the library do the work). could i get into trouble ? would i need to write the NLP library myself from scratch to be considered "proprietary" ? danke

    Read the article

  • Amazon s'associe à Nokia pour créer son propre service de cartographie, un autre acteur majeur du mobile tourne le dos à Google

    Amazon s'associe à Nokia pour créer son propre service de cartographie Un autre acteur majeur du mobile tourne le dos à Google Maps Après Apple qui lâchera définitivement Google Maps dès la sortie imminente d'iOS 6, c'est maintenant au tour d'Amazon de lancer son propre service de cartographie sur ses tablettes Kindle Fire et Kindle Fire HD. Dans un communiqué adressé à la presse, le porte-parole de Nokia Dr Sebastian Kurme affirme que la société Amazon s'associe à Nokia et se base sur sa plateforme de localisation NLP pour créer un service de cartographie ...

    Read the article

  • Update a PDF to include an encrypted, hidden, unique identifier?

    - by Dave Jarvis
    Background The idea is this: Person provides contact information for online book purchase Book, as a PDF, is marked with a unique hash Person downloads book PDF passwords are annoying and extremely easy to circumvent. The ideal process would be something like: Generate hash based on contact information Store contact information and hash in database Acquire book lock Update an "include" file with hash text Generate book as PDF (using pdflatex) Apply hash to book Release book lock Send email with book download link Technologies The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host): C, Java, PHP LaTeX files PDF files Linux Question What programming techniques (or open source software) should I investigate to: Embed a unique hash (or other mark) to a PDF Create a collusion-attack resistant mark Develop a non-fragile (e.g., PDF -> EPS -> PDF still contains the mark) solution Research I have looked at the following possibilities: Steganography Natural Language Processing (NLP) Convert blank pages in PDF to images; mark those images; reassemble PDF LaTeX watermark package ImageMagick Steganograhy requires keeping a master copy of the images, and I'm not sure if the watermark would survive PDF -> EPS -> PDF, or other types of conversion. LaTeX creates an image cache, so any steganographic process would have to intercept that process somehow. NLP introduces grammatical errors. Inserting blank pages as images is immediately suspect; it is easy to replace suspicious blank pages. The LaTeX watermark package draws visible marks. ImageMagick draws visible marks. What other solutions are possible? Related Links http://www.tcpdf.org/ invisible watermarks in images Thank you!

    Read the article

  • Automatically Organize Tags in Tax/Folksonomy

    - by Rob Wilkerson
    I'm working on a process that will perform natural language processing (NLP) on one--and potentially several--of our content rich sites. What I'd like to do once the NLP is complete is to automatically organize the output (generally a set of terms that you might think of as tags given the prevalence of that metaphor) into some kind of standard or generally accepted organizational structure. In a perfect world, I'd really like this to be crowd sourced under the folksonomy concept (as opposed to a taxonomy) since the ultimate goal is to target/appeal to real people rather than "domain experts", but I'm open to ideas and best practices. For the obvious purpose of scalability, I'd like to automate the population of this tax/folksonomy so that "some guy" in the team/organization isn't responsible for looking at a bunch of words (with or without context) and arbitrarily fleshing out the contextual components of the tree. I have a few ideas for doing this that require some research to establish viability, but I have exactly zero practical experience with this sort of thing so the ideas really just boil down to stuff I made up that might perform some role in accomplishing the task. Imagining that others have vastly more experience with this sort of thing, I'm hoping that I can stand on your shoulders. Thanks for your thoughts and insights.

    Read the article

  • Thought Oracle Usability Advisory Board Was Stuffy? Wrong. Justification for Attending OUAB: ROI

    - by ultan o'broin
    Looking for reasons tell your boss why your organization needs to join the Oracle Usability Advisory Board or why you need approval to attend one of its meetings (see the requirements)? Try phrases such as "Continued Return on Investment (ROI)", "Increased Productivity" or "Happy Workers". With OUAB your participation is about realizing and sustaining ROI across the entire applications life-cycle from input to designs to implementation choices and integration, usage and performance and on measuring and improving the onboarding and support experience. If you think this is a boring meeting of middle-aged people sitting around moaning about customizing desktop forms and why the BlackBerry is here to stay, think again! How about this for a rich agenda, all designed to engage the audience in a thought-provoking and feedback-illiciting day of swirling interactions, contextual usage, global delivery, mobility, consumerizationm, gamification and tailoring your implementation to reflect real users doing real work in real environments.  Foldable, rollable ereader devices provide a newspaper-like UK for electronic news. Or a way to wrap silicon chips, perhaps. Explored at the OUAB Europe Meeting (photograph from Terrace Restaurant in TVP. Nom.) At the 7 December 2012 OUAB Europe meeting in Oracle Thames Valley Park, UK, Oracle partners and customers stepped up to the mic and PPT decks with a range of facts and examples to astound any UX conference C-level sceptic. Over the course of the day we covered much ground, but it was all related in a contextual, flexibile, simplication, engagement way aout delivering results for business: that means solving problems. This means being about the user and their tasks and how to make design and technology transforms work into a productive activity that users and bean counters will be excited by. The sessions really gelled for me: 1. Mobile design patterns and the powerful propositions for customers and partners offered by using the design guidance with Oracle ADF Mobile. Customers' and partners' developers existing ADF developers are now productive, efficient ADF Mobile developers applying proven UX guidance using ADF Mobile components and other Oracle Fusion Middleware in the development toolkit. You can find the Mobile UX Design Patterns and Guidance on Building Mobile Apps on OTN. 2. Oracle Voice and Apps. How this medium offers so much potentual in the enterprise and offers a window in Fusion Apps cloud webservices, Oracle RightNow NLP and Nuance technology. Exciting stuff, demoed live on a mobile phone. Stay tuned for more features and modalities and how you can tailor your own apps experience.  3. Oracle RightNow Natural Language Processing (NLP) Virtual Assistant technology (Ella): how contextual intervention and learning from users sessions delivers a great personalized UX for users interacting with Ella, a fifth generation VA to solve problems and seek knowledge. 4. BYOD Keynote: A balanced keynote address contrasting Fujitsu's explaining of the conceprt, challenges, and trends and setting the expectation that BYOD must be embraced in a flexible way,  with the resolute, crafted high security enterprise requirements that nuancing the BYOD concept and proposals with the realities of their world of water tight information and device sharing policies. Fascinating stuff, as well providing anecdotes to make us thing about out own DYOD Deployments. One size does not fit all. 5. Icon Cultural Surveys Results and Insights Arising: Ever wondered about the cultural appropriateness of icons used in software UIs and how these icons assessed for global use? Or considered that social media "Like" icons might be  unacceptable hand gestures in culture or enterprise? Or do the old world icons like Save floppy disk icons still find acceptable? Well the survey results told you. Challenges must be tested, over time, and context of use is critical now, including external factors such as the internet and social media adoption. Indeed the fears about global rejection of the face and hand icons was not borne out, and some of the more anachronistic icons (checkbooks, microphones, real-to-real tape decks, 3.5" floppies for "save") have become accepted metaphors for current actions. More importantly the findings brought into focus the reason for OUAB - engage with and illicit feedback though working groups before we build anything. 6. EReaders and Oracle iBook: What is the uptake and trends of ereaders? And how about a demo of an iBook with enterprise apps content?  Well received by the audience, the session included a live running poll of ereader usage. 7. Gamification Design Jam: Fun, hands on event for teams of Oracle staff, partners and customers, actually building gamified flows, a practice that can be applied right away by customers and partners.  8. UX Direct: A new offering of usability best practices, coming to an external website for you in 2013. FInd a real user, observe their tasks, design and approve, build and measure. Simple stuff to improve apps implications no end. 9. FUSE (an internal term only, basically Fusion Simplified Experience): demo of the new Face of Fusion Applications: inherently mobile, simple to use, social, personalizable and FAST, three great demos from the HCM, CRM and ICT world on how these UX designs can be used in different ways. So, a powerful breadth and depth of UX solutions and opporunities for customers and partners to engage with and explore how they can make their users happy and benefit their business reaping continued ROI from those apps investments. Find out more about the OUAB and how to get involved here ... 

    Read the article

  • details on the following Natural Language Processing terms ?

    - by wefwgeweg
    Named Entity Extraction (extract ppl, cities, organizations) Content Tagging (extract topic tags by scanning doc) Structured Data Extraction Topic Categorization (taxonomy classification by scanning doc....bayesian ) Text extraction (HTML page cleaning) are there libraries that i can use to do any of the above functions of NLP ? dont really feel like forking out cash to AlchemyAPI

    Read the article

< Previous Page | 1 2 3 4 5 6  | Next Page >