Is their an optimal config/format for a TIFF when using Tesseract or other OCR?
        Posted  
        
            by Zando
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Zando
        
        
        
        Published on 2010-04-19T22:29:02Z
        Indexed on 
            2010/04/19
            22:33 UTC
        
        
        Read the original article
        Hit count: 394
        
I'm having a bizarre problem with Tesseract. I have a name, "Janice" that is in a 200x40 pixel tiff, that Tesseract interprets as a blank. I'm running hundreds of names through Tesseract and they are processed fine.
What I'm actually doing, though, is breaking up a larger TIFF into smaller tiffs of one word each. In the larger TIFF, tesseract recognizes "Janice".
What could cause it to hiccup in a TIFF that solely contains that word (and there's enough space around the word to not truncate any of the pixels)? I'm using ImageMagick to split the big TIFF, are there options I should set when reconstituting the new TIFF files?
© Stack Overflow or respective owner