Search Results

Search found 221 results on 9 pages for 'ocr'.

Page 2/9 | < Previous Page | 1 2 3 4 5 6 7 8 9  | Next Page >

  • Why is OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true) causing an OCR running error?

    - by Ian Wells
    Hi folks, I am using MODI to read tiff images and do what I need to do with the text. Some images work fine and then other tiff images always cause the method, OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true) to fail. I have researched this and tried different variations such as 'false','false' in the parameter list. I have also tried SYSDEFAULT instead of English but I still get the error. Can anyone please tell me why it would fail on some tiff images and not on others? I have done some research and found this answer: One possible cause is MODI trying to process a file without any recognisable text. A blank document, or one which has only drawings/scribbles and is effectively blank, will cause this exception. Obviously this is not good enough as there is no way I can have an app that decides to OCR some images and not others. I handle the exception, but the OCR object is not then initalised so I can't do what I need to do from there. This is a bloody nightmare! Why can't the method just do it's bloody job and if the image has some unreadable pages then just ignore them? I am using Windows 7 Ultimate and Office 2007 Ultimate. Visual Studio version is 2008 Thanks, IW

    Read the article

  • Are there any program for showing tooltip via OCR

    - by Casper
    Edited for clarification based on comments: At work we have an old developed program where I need to navigate on a very long dropdown list only with numbered codes. To select the correct code, I manually look at an excel list that shows the code and the name each codes corresponds with. To make matters worse, it is not possible to type in the dropdown box (only first character is recognized when typing) So my question is: Are there any program which can show the text from a predefined list, when i mouseover the various possibilities in the dropdown list? The purpose would be to be able to identify the name which corresponds to each code, without having to look at the excel list. Thanks in advance!

    Read the article

  • Simple OCR programming tutorials/articles

    - by daddz
    I'm interested in simple OCR methods and algorithms. And with simple I mean simple! Best would be a tutorial/article/documentation without dependencies on 3rd party librarys if that's even possible. I would really like to build up my knowledge from the ground up. The programming language doesn't matter. Thanks in advance! Edit: An interesting link I found in another question: OCR and Neural Nets in JavaScript I especially like the python implemenation.

    Read the article

  • OCR an RSA key fob (security token)

    - by user130582
    I put together a quick WinForm/embedded IE browser control which logs into our company's bank website each morning and scrapes/exports the desired deposit information (the bank is a smallish regional bank). Since we have a few dozen "pseudoaccounts" that draw from the same master account, this actually takes 10-15 minutes to retrieve. Anyway, the only problem is that our business bank account reuires an RSA security token (http://www.rsa.com/node.aspx?id=1156)--if you are not familiar, it is a small device which shows a random 6 digit number every 15(?) seconds, so I have to prompt for this value before starting. This is on top of the website's login based security model, so even if you create a read-only account that can't do anything, you still have to put the RSA number in. We have 5 of these tokens for different people in the company. From our perspective this is nusiance security. I was joking about using a web camera to OCR the digits from the key fob so they didn't have to type it in -- mainly so that the scraping/export would be done before anyone arrives in the morning. Well, they asked if I could really do it. So now I ask you, how hard (how many hours) do you think it would take to OCR these digits reliably from a JPEG image produced by the camera? I already know I can get the JPEG easily. I think you get 3 tries to log in, so it really needs to hit a 99% accuracy rate. I could work on this on my off time, but they don't want me to put more than a few hours into it, so I want to leverage as much existing code as possible. This is a 7-segment display (like an alarm clock) so it's not exactly text that an OCR package would be used to seeing. Also--there is a countdown timer on the side of the display; typically when it is down to 1 bar, you wait until the next number appears and it starts over at 5 bars (like signal strength on your cell phone). So this would need to be OCRd as well but it is not text. Anyway the more I think about it as I type this, the less convinced I am that I can truly get this right, so maybe I should just work on it in my spare time?

    Read the article

  • Online OCR website for processing an entire pdf file at one time?

    - by Tim
    I am looking for an online OCR website for processing a multi-page pdf file at one time. Free preferably. I know http://www.newocr.com/. If I am correct, it can only OCR one page at a time, by manually clicking "Preview" and then clicking "OCR" for each page. After each page is OCRed, I have to copy out the text result manually too. If my pdf file has 30~ pages, it will be tedious to repeat the above process for each page. I wonder if there is some other online websites that OCR a whole pdf file, without asking me for manual operation? Thanks!

    Read the article

  • OCR Web Service

    - by sdfx
    I am searching for an OCR web service (eventually open source, preferably free) that simply receives an image and returns the text of the image in writing. I've looked at tesseract, OCRopus and GOCR but the only open server I could find is WeOCR. Unfortunately the detection rates (at least during my tests) are sub-par and the speed is not much better. Does anyone have any experience with OCR web services? I guess the license of tesseract allows the operation of such a service, are there any out there?

    Read the article

  • How to OCR PDF files with Old German Gothic (Fraktur) text?

    - by Jason
    I have been successfully using Adobe Acrobat X to OCR many scanned documents which I use for my research. However I have begun studying old German documents which use the Fraktur script, also known as Gothic. SuperUser won't let me post an image of it, but you can find examples of what it looks like in the Wikipedia article (linked above). I have read about special programs which OCR the text, such as ABBY FineReader für Fraktur, but first it works on Windows (and I use a Mac), and second I'd like to find a Fraktur plugin for Acrobat to fit into my already-existing workflow. Are there any Fraktur OCR plugins for Acrobat? Generally, where should one look for Acrobat OCR plugins?

    Read the article

  • OCR combined with font recognition?

    - by Adam
    I have a bold idea where a user could take an image like the following and in a few seconds of processing, be able to edit a document which looks roughly the same. The software would use WhatTheFont (or something similar) to recognize the fonts used, and OCR and other software to handle the font size, color, line-spacing, and of course the text content itself. In the case of the example image, there would be three separate "textboxes" produced, each starting at the upper left corner of the text, and extending as far to the bottom right as it could before running into another text box. So the user would then see something like this: (The rectangles are just used to show the boundaries of each textbox.) From here, the user would be able to edit the text in each of these boxes to create a new document. Of course there are tons of obvious uses for such an application, especially on a mobile phone with a built in camera. So my questions are the following: I doubt the answer is yes, but does anything do this already? If I'm going to try to build this, what should I write it in? Can I use Python? What would be the best OCR libraries to start with? Is there a service other than WhatTheFont for font recognition that has better API support? Anybody want to help me build it? :) etc. etc. Update: One thing I wanted to mention (but forgot) is I would also like the background to be preserved. In other words, if the example above had an image behind the text, I'd like the document to use that image with text removed. I know this complicates things a lot because that would require some image editing techniques too (something akin to Photoshop CS5' "content-aware fill"). But if we can solve diminished reality on iPhones, I think we can figure this out!

    Read the article

  • Tesseract OCR in C#

    - by Tom Allen
    Just wondering if anyone has got a sample project or compliled dll of the tesseract ocr engine running in C#? I have tried going through the tessnet2 demo (here) but for some reason, I can't install the C++ stuff in my current VS2008 installation so can't build it. Thanks!

    Read the article

  • OCR for Devanagari (Hindi / Marathi / Sanskrit)

    - by Egon
    Does anybody have any idea about any recent work being done on optical character recognition for Indian scripts using modern Machine Learning techniques ? I know of some research being done at ISI, calcutta, but nothing new has come up in the last 3-4 years to the best of my knowledge, and OCR for Devanagari is sadly lacking!

    Read the article

  • Java OCR package

    - by pstanton
    what is the best OCR package you've used with java? I need to parse mainly numbers in a single font and am looking for a well architected, lite weight library to use. thanks.

    Read the article

  • i need help in OCR

    - by miplz
    Hi, i need create a program in VB using OCR, but i cant find any example of the coding. can anyone tell me where to find the coding or anyone have the coding.please help me. Thank you

    Read the article

  • Optimized OCR black/white pixel algorithm

    - by eagle
    I am writing a simple OCR solution for a finite set of characters. That is, I know the exact way all 26 letters in the alphabet will look like. I am using C# and am able to easily determine if a given pixel should be treated as black or white. I am generating a matrix of black/white pixels for every single character. So for example, the letter I (capital i), might look like the following: 01110 00100 00100 00100 01110 Note: all points, which I use later in this post, assume that the top left pixel is (0, 0), bottom right pixel is (4, 4). 1's represent black pixels, and 0's represent white pixels. I would create a corresponding matrix in C# like this: CreateLetter("I", new List<List<bool>>() { new List<bool>() { false, true, true, true, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, true, true, true, false } }); I know I could probably optimize this part by using a multi-dimensional array instead, but let's ignore that for now, this is for illustrative purposes. Every letter is exactly the same dimensions, 10px by 11px (10px by 11px is the actual dimensions of a character in my real program. I simplified this to 5px by 5px in this posting since it is much easier to "draw" the letters using 0's and 1's on a smaller image). Now when I give it a 10px by 11px part of an image to analyze with OCR, it would need to run on every single letter (26) on every single pixel (10 * 11 = 110) which would mean 2,860 (26 * 110) iterations (in the worst case) for every single character. I was thinking this could be optimized by defining the unique characteristics of every character. So, for example, let's assume that the set of characters only consists of 5 distinct letters: I, A, O, B, and L. These might look like the following: 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 After analyzing the unique characteristics of every character, I can significantly reduce the number of tests that need to be performed to test for a character. For example, for the "I" character, I could define it's unique characteristics as having a black pixel in the coordinate (3, 0) since no other characters have that pixel as black. So instead of testing 110 pixels for a match on the "I" character, I reduced it to a 1 pixel test. This is what it might look like for all these characters: var LetterI = new OcrLetter() { Name = "I", BlackPixels = new List<Point>() { new Point (3, 0) } } var LetterA = new OcrLetter() { Name = "A", WhitePixels = new List<Point>() { new Point(2, 4) } } var LetterO = new OcrLetter() { Name = "O", BlackPixels = new List<Point>() { new Point(3, 2) }, WhitePixels = new List<Point>() { new Point(2, 2) } } var LetterB = new OcrLetter() { Name = "B", BlackPixels = new List<Point>() { new Point(3, 1) }, WhitePixels = new List<Point>() { new Point(3, 2) } } var LetterL = new OcrLetter() { Name = "L", BlackPixels = new List<Point>() { new Point(1, 1), new Point(3, 4) }, WhitePixels = new List<Point>() { new Point(2, 2) } } This is challenging to do manually for 5 characters and gets much harder the greater the amount of letters that are added. You also want to guarantee that you have the minimum set of unique characteristics of a letter since you want it to be optimized as much as possible. I want to create an algorithm that will identify the unique characteristics of all the letters and would generate similar code to that above. I would then use this optimized black/white matrix to identify characters. How do I take the 26 letters that have all their black/white pixels filled in (e.g. the CreateLetter code block) and convert them to an optimized set of unique characteristics that define a letter (e.g. the new OcrLetter() code block)? And how would I guarantee that it is the most efficient definition set of unique characteristics (e.g. instead of defining 6 points as the unique characteristics, there might be a way to do it with 1 or 2 points, as the letter "I" in my example was able to). An alternative solution I've come up with is using a hash table, which will reduce it from 2,860 iterations to 110 iterations, a 26 time reduction. This is how it might work: I would populate it with data similar to the following: Letters["01110 00100 00100 00100 01110"] = "I"; Letters["00100 01010 01110 01010 01010"] = "A"; Letters["00100 01010 01010 01010 00100"] = "O"; Letters["01100 01010 01100 01010 01100"] = "B"; Now when I reach a location in the image to process, I convert it to a string such as: "01110 00100 00100 00100 01110" and simply find it in the hash table. This solution seems very simple, however, this still requires 110 iterations to generate this string for each letter. In big O notation, the algorithm is the same since O(110N) = O(2860N) = O(N) for N letters to process on the page. However, it is still improved by a constant factor of 26, a significant improvement (e.g. instead of it taking 26 minutes, it would take 1 minute). Update: Most of the solutions provided so far have not addressed the issue of identifying the unique characteristics of a character and rather provide alternative solutions. I am still looking for this solution which, as far as I can tell, is the only way to achieve the fastest OCR processing. I just came up with a partial solution: For each pixel, in the grid, store the letters that have it as a black pixel. Using these letters: I A O B L 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 You would have something like this: CreatePixel(new Point(0, 0), new List<Char>() { }); CreatePixel(new Point(1, 0), new List<Char>() { 'I', 'B', 'L' }); CreatePixel(new Point(2, 0), new List<Char>() { 'I', 'A', 'O', 'B' }); CreatePixel(new Point(3, 0), new List<Char>() { 'I' }); CreatePixel(new Point(4, 0), new List<Char>() { }); CreatePixel(new Point(0, 1), new List<Char>() { }); CreatePixel(new Point(1, 1), new List<Char>() { 'A', 'B', 'L' }); CreatePixel(new Point(2, 1), new List<Char>() { 'I' }); CreatePixel(new Point(3, 1), new List<Char>() { 'A', 'O', 'B' }); // ... CreatePixel(new Point(2, 2), new List<Char>() { 'I', 'A', 'B' }); CreatePixel(new Point(3, 2), new List<Char>() { 'A', 'O' }); // ... CreatePixel(new Point(2, 4), new List<Char>() { 'I', 'O', 'B', 'L' }); CreatePixel(new Point(3, 4), new List<Char>() { 'I', 'A', 'L' }); CreatePixel(new Point(4, 4), new List<Char>() { }); Now for every letter, in order to find the unique characteristics, you need to look at which buckets it belongs to, as well as the amount of other characters in the bucket. So let's take the example of "I". We go to all the buckets it belongs to (1,0; 2,0; 3,0; ...; 3,4) and see that the one with the least amount of other characters is (3,0). In fact, it only has 1 character, meaning it must be an "I" in this case, and we found our unique characteristic. You can also do the same for pixels that would be white. Notice that bucket (2,0) contains all the letters except for "L", this means that it could be used as a white pixel test. Similarly, (2,4) doesn't contain an 'A'. Buckets that either contain all the letters or none of the letters can be discarded immediately, since these pixels can't help define a unique characteristic (e.g. 1,1; 4,0; 0,1; 4,4). It gets trickier when you don't have a 1 pixel test for a letter, for example in the case of 'O' and 'B'. Let's walk through the test for 'O'... It's contained in the following buckets: // Bucket Count Letters // 2,0 4 I, A, O, B // 3,1 3 A, O, B // 3,2 2 A, O // 2,4 4 I, O, B, L Additionally, we also have a few white pixel tests that can help: (I only listed those that are missing at most 2). The Missing Count was calculated as (5 - Bucket.Count). // Bucket Missing Count Missing Letters // 1,0 2 A, O // 1,1 2 I, O // 2,2 2 O, L // 3,4 2 O, B So now we can take the shortest black pixel bucket (3,2) and see that when we test for (3,2) we know it is either an 'A' or an 'O'. So we need an easy way to tell the difference between an 'A' and an 'O'. We could either look for a black pixel bucket that contains 'O' but not 'A' (e.g. 2,4) or a white pixel bucket that contains an 'O' but not an 'A' (e.g. 1,1). Either of these could be used in combination with the (3,2) pixel to uniquely identify the letter 'O' with only 2 tests. This seems like a simple algorithm when there are 5 characters, but how would I do this when there are 26 letters and a lot more pixels overlapping? For example, let's say that after the (3,2) pixel test, it found 10 different characters that contain the pixel (and this was the least from all the buckets). Now I need to find differences from 9 other characters instead of only 1 other character. How would I achieve my goal of getting the least amount of checks as possible, and ensure that I am not running extraneous tests?

    Read the article

  • What is the typical method to separate connected letters in a word using OCR

    - by Maysam
    I am very new to OCR and almost know nothing about the algorithms used to recognize words. I am just getting familiar to that. Could anybody please advise on the typical method used to recognize and separate individual characters in connected form (I mean in a word where all letters are linked together)? Forget about handwriting, supposing the letters are connected together using a known font, what is the best method to determine each individual character in a word? When characters are written separately there is no problem, but when they are joined together, we should know where every single character starts and ends in order to go to the next step and match them individually with a letter. Is there any known algorithm for that?

    Read the article

  • Looking for OCR for VB.NET (VS 2008 PRO)

    - by Avraham
    Looking for a component, dll, etc, for OCR for a VB.NET program. Using VS PRO 2008. The source is a bunch of small png images, and I am just getting a price out of them. Very simple. Tried tessnet2, but could not get png to work. Don't mind commercial, but not too expensive - maybe about $100. Want something simple to use and preferably with support if needed. This is for a commercial application. Thank you!

    Read the article

  • Can Acrobat 11 be made to do OCR using multiple CPU cores?

    - by tarcman.
    OCR processing takes time. Using multiple CPU cores would speed up processing. Acrobat 10 was not a multithreaded application. How about Acrobat 11? Does 11 by default do OCR using multiple CPU cores (if available)? If not, are there any workarounds, e.g. scripting, to help make Acrobat 11 do OCR using multiple CPU cores? Either through Acrobat's built in scripting language or using external scripts that launch and direct multiple single thread instances of Acrobat to in parallell to parts of the processing job. Note: This question is not too localized (not limited to a specific moment in time) because (1) Adobe does not release new major Acrobat versions very often (Acrobat 10 was released two years ago) and (2) Adobe Acrobat is a widely used application.

    Read the article

  • Batch OCR for many PDF files (not already OCRed) ?

    - by David
    Hello, I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed (http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651) So I would like to OCR many of my PDF files that are not already OCRed. My goal : I give the program a folder and it search alone in the subfolders the PDF files that need to be converted into PDF-OCRed files. Note: In the past, if a PDF file was password protected, I removed the password with another batch (paying) tool: verypdf.com "pwdremover" Any (not too much expensive) idea ? I already tried : Finereader 6 pro on xp at the time, but there was no batch processor included... Paperfile paperfile.net which uses Tesseract code.google.com/p/tesseract-ocr/. But the OCR is only PDF to text, not PDF to PDF! There is also another project code.google.com/p/ocropus Thanks in advance ;)

    Read the article

  • How to know if a PDF contains only images or has been OCR scanned for searching?

    - by Bratch
    I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one large image, even where the whole page is entirely text. Others were scanned with OCR and contain images and searchable text where text is present. In many cases even words in the images were made searchable. I want to make an automated process to recognize the text in all of the scanned documents using OCR, with Acrobat 8 Pro, but I don't want to re-OCR the files that have already been through the OCR process in the past. Does anyone know if there is a way to tell which ones contain only images, and which ones already contain searchable text? I'm planning on doing this in C# or VB.NET but I don't think being able to tell the two kinds of files apart is language dependent.

    Read the article

  • OCR with Neural network: data extraction

    - by Sebastian Hoitz
    I'm using the AForge library framework and its neural network. At the moment when I train my network I create lots of images (one image per letter per font) at a big size (30 pt), cut out the actual letter, scale this down to a smaller size (10x10 px) and then save it to my harddisk. I can then go and read all those images, creating my double[] arrays with data. At the moment I do this on a pixel basis. So once I have successfully trained my network I test the network and let it run on a sample image with the alphabet at different sizes (uppercase and lowercase). But the result is not really promising. I trained the network so that RunEpoch had an error of about 1.5 (so almost no error), but there are still some letters left that do not get identified correctly in my test image. Now my question is: Is this caused because I have a faulty learning method (pixelbased vs. the suggested use of receptors in this article: http://www.codeproject.com/KB/cs/neural_network_ocr.aspx - are there other methods I can use to extract the data for the network?) or can this happen because my segmentation-algorithm to extract the letters from the image to look at is bad? Does anyone have ideas on how to improve it?

    Read the article

  • Create a bitonal image for OCR using Image Magick

    - by Greg
    nice /usr/local/bin/convert -colors 2 -colorspace gray -compress group4 /var/www/html/uploads/pokemon.jpg /var/www/html/uploads/pokemontest.jpg This command worked with a really OLD version of Image Magick. With the newest version this method produces a completely black image. nice /usr/local/bin/convert -colorspace gray -compress group4 /var/www/html/uploads/pokemon.jpg /var/www/html/uploads/pokemontest.jpg nice /usr/local/bin/convert -colors 2 /var/www/html/uploads/pokemontest.jpg /var/www/html/uploads/pokemontestfinal.jpg This results in a bitonal gray and black image, but it's really rough. NOT clean at all.

    Read the article

  • Java OCR Help Needed

    - by maSnun
    Hello, How do I detect all the characters in an image? The image is in png and the font is constant. For simplicity, lets assume that the image has only numeric digits and there are only 4 digits on an image. I need to read all of them and output the text. Can you help? Thanks in advance.

    Read the article

  • Good free OCR with GUI for correcting mistakes? (for Windows)

    - by Hugh Allen
    I've used SimpleOCR, which has a nice GUI for correcting mistakes. Unfortunately it makes a lot of mistakes! (and suffers other bugs and limitations) On the other hand Tesseract is more accurate but has no GUI at all. Is there a free OCR program for Windows which has a nice GUI and a low error rate? I want it to highlight suspect words (by OCR uncertainty, not just spell checking) and show the original (bitmap) word while I'm editing the OCRed word similar to what SimpleOCR does. Open-source would be best, followed by freeware, then trial / demo / crippleware a long way behind.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9  | Next Page >