Search Results

Search found 221 results on 9 pages for 'ocr'.

Page 3/9 | < Previous Page | 1 2 3 4 5 6 7 8 9  | Next Page >

  • Is there a font that can't be recognised by an OCR?

    - by user1820564
    I am trying to write a document that can only be read by humans. The document content can't be copied. For that purpose, I am converting its pages to pictures and adding them back to a PDF file. The main issue is that any OCR program can get back the whole written text, especially that the page is going to be clear (as opposed to a scanned book) which will increase the OCR accuracy. So, is there a font that can't be recognized by an OCR. Otherwise, is there a technique that will make my document only readable by humans, yet unrecognised by an OCR? (for instance, adding a specific background, etc...) Thank you in advance.

    Read the article

  • Fun with RadCaptcha for ASP.NET AJAX and OCR software

    A friend of mine was evaluating OCR software and finally decided to go with FineReader. I was curious what would happen if we put the RadCaptcha control in. Will the advanced OCR manage to decode it or not? At first he showed me a test run with the RadCaptcha demo description, to get an idea of the basic output:    Naturally, the captured description text was no problem - only a few characters were misread but then corrected with the spellcheck. Next, the real test was performed:    These were only a couple of the results, but there is no need to post the rest of the tests - none of the RadCaptcha images were recognized by the OCR software. Here are the CaptchaImage settings used in the tests: Background Noise Level: Low /default value Line Noise Level: Low /default value Font Warp Factor: Low /Medium is default value...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • How to OCR a specific region of a MODI.Document?

    - by Mark Kadlec
    I need to OCR a specific region of a scanned document and I am using MODI (Microsoft's Document Imaging COM object). My code currently OCR's the entire page (quite accurately!), but I would like to target a specific region of the page where the text is always static (order number). How can I do this? Here is my code for the page: MODI.Document md = new MODI.Document(); md.Create("c:\\temp\\mpk.tiff"); md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); MODI.Image image = (MODI.Image)md.Images[0]; FileStream createFile = new FileStream("c:\\temp\\mpk.txt", FileMode.CreateNew); StreamWriter writeFile = new StreamWriter(createFile); writeFile.Write(image.Layout.Text); writeFile.Close(); md.Close(); Can I somehow specify the region of the image? Any help would be greatly appreciated!

    Read the article

  • When running Adobe Acrobat's OCR on a PDF document, which downsampling produces a higher quality: 600 dpi or 72 dpi?

    - by Ricardo Altamirano
    I have a large PDF document that consists of scanned pages of a textbook. I want to run Adobe Acrobat 9's text recognition function on it, but I'm presented with this menu when I do. I'm confused by the options in the highlighted menu. What option will produce the highest quality/most readable text? I thought 600 dpi implies a higher quality image than 72 dpi, so I'm confused by "High (72 dpi)" and "Lowest (600 dpi)."

    Read the article

  • It's possible make an OCR in Python to check words...

    - by Shady
    in opened applications? I want to automate firefox in some web page and I don't have a way to "know" if the page already load completely or if it still loading... I was thinking about making an OCR to check the status bar... it's difficult ? For example, when the word DONE appears at the status bar, the program continues to the next command...

    Read the article

  • How to turn a pdf into a text searchable pdf?

    - by don.joey
    I have a number of scanned documents in pdf and I want to be able to search them. How can I do that? Essentially I have to OCR the pdf and then blend the extracted text back into a new pdf. I have unsuccesfully tried pdfocr (which gives me this issue: https://github.com/gkovacs/pdfocr/issues/7) pdfsandwich (of which the software center says it is a poor package and I should not install it) Is there a software package I am unaware of? Or a script that does this?

    Read the article

  • How can I install and launch tesseract-ocr using PHP

    - by paulrajj
    I am looking for a OCR component that converts images of text into characters using php. I got a script tesseract-ocr from google code. How can I install and launch tesseract-ocr through php ? As I am a beginner in PHP, I cant come up with the documentation they provided. I need some simple steps to install and launch ? thanks in advance.

    Read the article

  • Is their an optimal config/format for a TIFF when using Tesseract or other OCR?

    - by Zando
    I'm having a bizarre problem with Tesseract. I have a name, "Janice" that is in a 200x40 pixel tiff, that Tesseract interprets as a blank. I'm running hundreds of names through Tesseract and they are processed fine. What I'm actually doing, though, is breaking up a larger TIFF into smaller tiffs of one word each. In the larger TIFF, tesseract recognizes "Janice". What could cause it to hiccup in a TIFF that solely contains that word (and there's enough space around the word to not truncate any of the pixels)? I'm using ImageMagick to split the big TIFF, are there options I should set when reconstituting the new TIFF files?

    Read the article

  • MODI leaking memory

    - by Khragg
    I have an app where I'm using MODI 2007 to OCR several multi-page tiff files. I have found that when I kick it off on a directory that contains several good tiffs but also some tiffs that cannot be opened in Windows Picture and Fax Viewer, then MODI also fails to OCR those "bad" tiffs. When this happens, the app is unable to reclaim any of the memory that was used by MODI to OCR those tiffs. After the tool tries to OCR too many of these "bad" tiffs, the machine runs out of memory and the app crashes. I have tried several code fixes from the web that supposedly fix any MODI memory leaks, but so far none have worked for me. I am pasting in the part of the code below that does the OCRing: StringBuilder strRecText = new StringBuilder(10000); MODI.Document doc1 = new MODI.Document(); doc1.Create(name); try { doc1.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); // this will ocr all pages of a multi-page tiff file } catch (Exception e) { doc1.Close(false); // clean up if (doc1 != null) { GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); GC.WaitForPendingFinalizers(); System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1); doc1 = null; } } MODI.Images images = doc1.Images; for (int imageCounter = 0; imageCounter < images.Count; imageCounter++) { if (imageCounter > 0) { if (!noPageBreakFlag) { strRecText.Append((char)pageBreakChar); } } MODI.Image image = (MODI.Image)images[imageCounter]; MODI.Layout layout = image.Layout; strRecText.Append(layout.Text); GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); GC.WaitForPendingFinalizers(); if (layout != null) { System.Runtime.InteropServices.Marshal.FinalReleaseComObject(layout); layout = null; } if (image != null) { System.Runtime.InteropServices.Marshal.FinalReleaseComObject(image); image = null; } } File.AppendAllText(ocrFile, strRecText.ToString()); // write the OCR file out to disk GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); GC.WaitForPendingFinalizers(); if (images != null) { System.Runtime.InteropServices.Marshal.FinalReleaseComObject(images); images = null; } GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); GC.WaitForPendingFinalizers(); doc1.Close(false); // clean up if (doc1 != null) { System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1); doc1 = null; } GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); GC.WaitForPendingFinalizers();

    Read the article

  • How to setup RAM disk drive using python or WMI?

    - by Ming Xie
    Hi, The background of my question is associated with Tesseract, the free OCR engine (1985-1995 by HP, now hosting in Google). It specifically requires an input file and an output file; the argument only takes filename (not stream / binary string), so in order to use the wrapper API such as pytesser and / or python-tesser.py, the OCR temp files must be created. I, however, have a lot of images need to OCR; frequent disk write and remove is inevitable (and of course the performance hit). The only choice I could think about is changing the wrapper class and point the temp file to RAM disk, which bring this problem up. If you have better solution, please let me know. Thanks a lot. -M

    Read the article

  • Automate Reading Lotto Numbers

    - by neiling
    When we buy a large qty of Lotto tickets, is there a way to read all those numbers into a spreadsheet so that they can be checked against the winning numbers thru formulas/macros? I am looking for an OCR application that can read the scanned PDF/JPG file and dump them into a file. (This might apply not only to Lotto, but also to other scanned documents.) As for checking for winning numbers, I know how to do it once I have them in a CSV/XLS file.

    Read the article

  • PDF Text Extraction Approach Using OCR

    - by Jon
    Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written. I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable. I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible). Any help would be appreciated.

    Read the article

  • .NET OCRing an Image

    - by Kirschstein
    I'm trying to use MODI to OCR a window's program. It works fine for screenshots I grab programmatically using win32 interop like this: public string SaveScreenShotToFile() { RECT rc; GetWindowRect(_hWnd, out rc); int width = rc.right - rc.left; int height = rc.bottom - rc.top; Bitmap bmp = new Bitmap(width, height); Graphics gfxBmp = Graphics.FromImage(bmp); IntPtr hdcBitmap = gfxBmp.GetHdc(); PrintWindow(_hWnd, hdcBitmap, 0); gfxBmp.ReleaseHdc(hdcBitmap); gfxBmp.Dispose(); string fileName = @"c:\temp\screenshots\" + Guid.NewGuid().ToString() + ".bmp"; bmp.Save(fileName); return fileName; } This image is then saved to a file and ran through MODI like this: private string GetTextFromImage(string fileName) { MODI.Document doc = new MODI.DocumentClass(); doc.Create(fileName); doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); MODI.Image img = (MODI.Image)doc.Images[0]; MODI.Layout layout = img.Layout; StringBuilder sb = new StringBuilder(); for (int i = 0; i < layout.Words.Count; i++) { MODI.Word word = (MODI.Word)layout.Words[i]; sb.Append(word.Text); sb.Append(" "); } if (sb.Length > 1) sb.Length--; return sb.ToString(); } This part works fine, however, I don't want to OCR the entire screenshot, just portions of it. I try cropping the image programmatically like this: private string SaveToCroppedImage(Bitmap original) { Bitmap result = original.Clone(new Rectangle(0, 0, 250, 250), original.PixelFormat); var fileName = "c:\\" + Guid.NewGuid().ToString() + ".bmp"; result.Save(fileName, original.RawFormat); return fileName; } and then OCRing this smaller image, however MODI throws an exception; 'OCR running error', the error code is -959967087. Why can MODI handle the original bitmap but not the smaller version taken from it?

    Read the article

  • Stripping Non-Text from a Scanned, OCRd PDF

    - by Daniel S.
    I have a PDF created from a scanned document. OCR was used to recognize text. In Acrobat, if I select text, and click 'copy with formatting', I can paste the formatted text into Word, so it seems that fonts and colors are also embedded in the document in addition to just plain text and possibly the size. Is there any way to use this information to create a PDF that just contains the formatted OCRd text, without the scanned image. Currently, my document only shows the scanned image, and the text is on an invisible layer. I would like to create a PDF document that removes the image that was scanned, and displays the formatted text that is currently hidden. The following post has a section on "How can we make the invisible text visible?" PDF has an extra blank in all words after running through Ghostscript However, doing this does not show the correct text formatting (that is retained when pasting in Word), and I also would like to remove the scanned image so that the final PDF just contains formatted (color, font, size) vector fonts, and no images.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9  | Next Page >