Search Results

Search found 13378 results on 536 pages for 'natural language'.

Page 52/536 | < Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >

What is PHP like as a programming language?

- by seanlinmt

I am not really familiar with PHP, but I get the impression that it is like JavaScript (syntax-wise). What are the benefits of a dynamically typed language, when compared to a strongly typed language like C# or Java, and how would this help in the context of web development? What would make a dynamically typed language so attractive? Or, does the popularity of PHP have more to do with it being free? Okay, I think I better give a little more background to get more meaningful answers, because I am not wanting a flame war. I come from a C background, and when I moved into C# and Visual Studio. Having code completion, integration with an SQL database, huge existing class libraries and easy to access documentation, as well as new tools such as LINQ and ReSharper was like heaven. I didn't enjoy JavaScript before JQuery, but now I love it as well. Recently, I ported a PHP project over to C# and I used Zend to help me debug and understand more while porting - instead of maintaining two code streams. That also cut down on the cost of the server and maintenance. Getting into PHP would be nice. I think that Visual Studio has spoiled me - but again Eclipse is also equally spoiling. It would be nice to have an answer from someone who has experience developing both under PHP and .NET.

Read the article
Hashing words to numbers with respect to definition

- by thornate

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc. As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph. Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.

Read the article
Sentiment analysis for twitter in python

- by Ran

I'm looking for an open source implementation, preferably in python, of Textual Sentiment Analysis (http://en.wikipedia.org/wiki/Sentiment_analysis). Is anyone familiar with such open source implementation I can use? I'm writing an application that searches twitter for some search term, say "youtube", and counts "happy" tweets vs. "sad" tweets. I'm using Google's appengine, so it's in python. I'd like to be able to classify the returned search results from twitter and I'd like to do that in python. I haven't been able to find such sentiment analyzer so far, specifically not in python. Are you familiar with such open source implementation I can use? Preferably this is already in python, but if not, hopefully I can translate it to python. Note, the texts I'm analyzing are VERY short, they are tweets. So ideally, this classifier is optimized for such short texts. BTW, twitter does support the ":)" and ":(" operators in search, which aim to do just this, but unfortunately, the classification provided by them isn't that great, so I figured I might give this a try myself. Thanks! BTW, an early demo is here and the code I have so far is here and I'd love to opensource it with any interested developer.

Read the article
How to implement a SIMPLE "You typed ACB, did you mean ABC?"

- by marcgg

I know this is not a straight up question, so if you need me to provide more information about the scope of it, let me know. There are a bunch of questions that address almost the same issue (they are linked here), but never the exact same one with the same kind of scope and objective - at least as far as I know. Context: I have a MP3 file with ID3 tags for artist name and song title. I have two tables Artists and Songs The ID3 tags might be slightly off (e.g. Mikaell Jacksonne) I'm using ASP.NET + C# and a MSSQL database I need to synchronize the MP3s with the database. Meaning: The user launches a script The script browses through all the MP3s The script says "Is 'Mikaell Jacksonne' 'Michael Jackson' YES/NO" The user pick and we start over Examples of what the system could find: In the database... SONGS = {"This is a great song title", "This is a song title"} ARTISTS = {"Michael Jackson"} Outputs... "This is a grt song title" did you mean "This is a great song title" ? "This is song title" did you mean "This is a song title" ? "This si a song title" did you mean "This is a song title" ? "This si song a title" did you mean "This is a song title" ? "Jackson, Michael" did you mean "Michael Jackson" ? "JacksonMichael" did you mean "Michael Jackson" ? "Michael Jacksno" did you mean "Michael Jackson" ? etc. I read some documentation from this /how-do-you-implement-a-did-you-mean and this is not exactly what I need since I don't want to check an entire dictionary. I also can't really use a web service since it's depending a lot on what I already have in my database. If possible I'd also like to avoid dealing with distances and other complicated things. I could use the google api (or something similar) to do this, meaning that the script will try spell checking and test it with the database, but I feel there could be a better solution since my database might end up being really specific with weird songs and artists, making spell checking useless. I could also try something like what has been explained on this post, using Soundex for c#. Using a regular spell checker won't work because I won't be using words but names and 'titles'. So my question is: is there a relatively simple way of doing this, and if so, what is it? Any kind of help would be appreciated. Thanks!

Read the article
What's the next big thing after LINQ?

- by Leniel Macaferi

I started using LINQ (Language Integrated Query) when it was still in beta, more specifically Microsoft .NET LINQ Preview (May 2006). Almost 4 years have passed and here we are using LINQ in a lot of projects for the most diverse tasks. I even wrote my final college project based on LINQ. You see how I like it. LINQ and more recently PLINQ (Parallel LINQ) give our jobs a great boost when it comes to more programming power and less lines of code leading us to more expressive and readable code. I keep thinking what could be the next big language improvement for C# after LINQ. I know there are some promissing language features coming as Code Contracts, etc, but nothing having the impact that LINQ had. What do you think could be the next big thing?

Read the article
Stanford Parser - Traversing the typed dependencies graph

- by pns

Hello! Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much

Read the article
Perl Lingua giving weird error on install

- by user299306

I am trying to install perl Lingua onto a unix system (ubuntu, latest version). Of course I am root. when I go into the package to install using 'perl Makefile.pl' I get this dumb error: [root@csisl27 Lingua-Lid-0.01]# perl Makefile.PL /opt/ls//lib does not exist at Makefile.PL line 48. I have tried playing with the path on line 48, nothing changes, here is what line 48-50 looks like: Line 48: die "$BASE/lib does not exist" unless -d "$BASE/lib"; Line 49: die "$BASE/include does not exist" unless -d "$BASE/include"; Line 50: die "lid.h is missing in $BASE/include" unless -e "$BASE/includ/lid.h"; The variable $BASE is declared as this: $BASE = "/opt/ls/" if ($^O eq "linux" or $^O eq "solaris"); $BASE = "/usr/local/" if ($^O eq "freebsd"); $BASE = $ENV{LID_BASE_DIR} if (defined $ENV{LID_BASE_DIR}); Now the perl program I am trying to write simply look like this (just my base): #!/usr/bin/perl use Lingua::LinkParser; use strict; print "Hello world!\n"; When I run this trying to use Lingua, here is my error: [root@csisl27 assign4]# ./perl_parser_1.pl Can't locate Lingua/LinkParser.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.10.0 /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.10.0 /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/5.10.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl .) at ./perl_parser_1.pl line 3. BEGIN failed--compilation aborted at ./perl_parser_1.pl line 3. Tried insalling this from cpan, still doesn't properly work.

Read the article
Latent Dirichlet Allocation, pitfalls, tips and programs

- by Gregg Lind

I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis? Any unexpected pitfalls or tips I should know before embarking? I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.

Read the article
Canadian to US English

- by Tinku

Does there exist something like Canadian to US english e-dictionary which I can use in my application ?

Read the article
How to make concept representation with the help of bag of words

- by agazerboy

Hi All, Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about "creating sentences with words". NO NO it is not about english grammar :) Let me explain, If I have bag of words like "person apple apple person person a eat person will apple eat hungry apple hungry" and it can generate some kind of following sentence "hungry person eat apple" I don't in which field this topic will relate. Where should I try to find an answer. I tried to search google but I only found english grammar stuff :) Any body there who can tell me which algo can work in this problem? or any program Thanks P.S: It is not an assignment :) if it would be i would ask for source code ! I don't even know in which field I should look for :)

Read the article
English Grammar Parsing in PHP (Link Grammar)

- by Chris T

Is there anyway to use the Link Grammar or AbiSource grammar checker in PHP (or C# but I'd prefer php)? I need to have a tree structure for english sentences. Any ideas? The only things I found were in C and I can't use them on a shared host.

Read the article
Determining whether values can potentially match a regular expression, given more input

- by Andreas Grech

I am currently writing an application in JavaScript where I'm matching input to regular expressions, but I also need to find a way how to match strings to parts of the regular expressions. For example: var invalid = "x", potentially = "g", valid = "ggg", gReg = /^ggg$/; gReg.test(invalid); //returns false (correct) gReg.test(valid); //returns true (correct) Now I need to find a way to somehow determine that the value of the potentially variable doesn't exactly match the /^ggg$/ expression, BUT with more input, it potentially can! So for example in this case, the potentially variable is g, but if two more g's are appended to it, it will match the regular expression /^ggg$/ But in the case of invalid, it can never match the /^ggg$/ expression, no matter how many characters you append to it. So how can I determine if a string has or doesn't have potential to match a particular regular expression?

Read the article
Building dictionary of words from large text

- by LiorH

I have a text file containing posts in English/Italian. I would like to read the posts into a data matrix so that each row represents a post and each column a word. The cells in the matrix are the counts of how many times each word appears in the post. The dictionary should consist of all the words in the whole file or a non exhaustive English/Italian dictionary. I know this is a common essential preprocessing step for NLP. Does anyone know of a tool\project that can perform this task? Someone mentioned apache lucene, do you know if lucene index can be serialized to a data-structure similar to my needs?

Read the article
Special Ocassion parser in JAVA

- by Pranav

Hey guys, I am working on a date parser in Java. Just wanted some information on if there is any java library which could parse special occasions like for example if I give input as: Christmas or new year, it returns a date for this. Thanks in advance. Regards, Pranav

Read the article
Are there any well known algorithms to detect the presence of names?

- by Rhubarb

For example, given a string: "Bob went fishing with his friend Jim Smith." Bob and Jim Smith are both names, but bob and smith are both words. Weren't for them being uppercase, there would be less indication of this outside of our knowledge of the sentence. Without doing grammar analysis, are there any well known algorithms for detecting the presence of names, at least Western names?

Read the article
Dependency parsing

- by C.

Hi I particularly like the transduce feature offered by agfl in their EP4IR http://www.agfl.cs.ru.nl/EP4IR/english.html The download page is here: http://www.agfl.cs.ru.nl/download.html Is there any way i can make use of this in a c# program? Do I need to convert classes to c#? Thanks :)

Read the article
What should every programmer know?

- by Matt Lacey

Regardless of programming language(s) or operating system(s) used or the environment they develop for, what should every programmer know? Some background: I'm interested in becoming the best programmer I can. As part of this process I'm trying to understand what I don't know and would benefit me a lot if I did. While there are loads of lists around along the lines of "n things every [insert programming language] developer should know", I have yet to find anything similar which isn't limited to a specific language. I also expect this information to be of interest and benefit to others.

Read the article
How to identify ideas and concepts in a given text

- by Nick

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Balzac is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph? It'd be great to be able to detect that the person has asked for a photograph of Mr Balzac. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like: Please, never send me a photo of Mr Balzac. Does anyone know where to start with this? Is it even possible? I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great. Thanks!

Read the article
English dictionary as txt or xml file with support of synonyms

- by Simon

Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API. Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for particular word. It would be absolutely fantastic if dictionary would be listing British and American spelling of the words where they are differ. Even if it would be small dictionary (few 000's words) that's ok, I only need it for small project. I even would be willing to buy one if the price is reasonable, and dictionary is easy to use - simple xml wold be great. Any directions please.

Read the article
Generating easy-to-remember random identifiers

- by Carl Seleborg

Hi all, As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2". My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember. Examples would be "azil3", "ulmops", "fel2way", etc. I just made these up, but they are much easier to recognize when you see many of them at once. I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose. Do you know of anything else? Thanks! Carl

Read the article
Mapping words to numbers with respect to definition

- by thornate

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc. As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph. Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.

Read the article
Problems with noobs putting my GA code into their sites

- by dclowd9901

I don't mean for the title to be derogatory, but this is a rather frustrating problem, and I'm looking for a good workaround, given a language barrier involved. I have a site set up for a plugin I wrote, and, rather than use the site's resources to write their own code, I've had people simply rip the code from the samples on the site. Normally, this wouldn't be any issue at all, but they are also taking my Google Analytics instantiation, so my Analytics data is getting very skewed by incorporating visitation data from their websites. I've been able to contact the English-speaking site owners with little issue. The problem lies in the Japanese language sites that are yanking the code. I have no idea how to ask them to take down the analytics portion. Long-term, I'm providing a package that streamlines the learning-to-use process, but in the meantime, what can I do about this language barrier? Is there a way around this problem that I haven't thought of?

Read the article
Algorithm for sentence analysis and tokenization

- by Andrea Nagar

I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

Read the article
How does Amazon's Statistically Improbable Phrases work?

- by ??iu

How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

Read the article
Entity Framework and associations between string keys

- by fredrik

Hi, I am new to Entity Framework, and ORM's for that mather. In the project that I'm involed in we have a legacy database, with all its keys as strings, case-insensitive. We are converting to MSSQL and want to use EF as ORM, but have run in to a problem. Here is an example that illustrates our problem: TableA has a primary string key, TableB has a reference to this primary key. In LINQ we write something like: var result = from t in context.TableB select t.TableA; foreach( var r in result ) Console.WriteLine( r.someFieldInTableA ); if TableA contains a primary key that reads "A", and TableB contains two rows that references TableA but with different cases in the referenceing field, "a" and "A". In our project we want both of the rows to endup in the result, but only the one with the matching case will end up there. Using the SQL Profiler, I have noticed that both of the rows are selected. Is there a way to tell Entity Framework that the keys are case insensitive? Edit:We have now tested this with NHibernate and come to the conclution that NHibernate works with case-insensitive keys. So NHibernate might be a better choice for us.I am however still interested in finding out if there is any way to change the behaviour of Entity Framework.

Read the article

< Previous Page | 48 49 50 51 52 53 54 55 56 57 58 59 | Next Page >