Search Results

Search found 1 results on 1 pages for 'eaglefarm'.

Page 1/1 | 1 

  • How to preprocess text to do OCR error correction

    - by eaglefarm
    Here is what I'm trying to accomplish: I need to get a several large text files from a computer that is not networked and has no other output except a printer. I tried printing the text, then scanning the printout with OCR to recover the text on another computer but the OCR gets lots of errors (1 vs l, o vs 0, O vs D, etc). To solve this I am thinking of writing a program to process (annotate?) the text file, before printing it, so that the errors can be corrected from the text output of the OCR program. For example, for 1 (number one) vs l (letter L), I could change the text like this: sample inserting \nnn after characters that are frequently wrong in the OCR results: sampl\108e Then I can write another program to examine the file, looking for \nnn and check the character before the \nnn (where nnn is the ascii code in decimal) and fix it if necessary. Of course the program will have to recognize that the \nnn may have errors too but at least it knows that the nnn are digits and can easily correct them. I think I would add a CRC on each line so that any line that isn't corrected perfectly can be flagged as having a problem. Has anyone done anything like this? If there is an existing way of doing this I'd rather not reinvent the wheel. Or any suggestions for annotation format that would help solve this problem would be helpful too.

    Read the article

1