Search Results

Search found 14 results on 1 pages for 'pdfbox'.

Page 1/1 | 1 

  • Preserve "long" spaces in PDFBox text extraction

    - by Thilo
    I am using PDFBox to extract text from PDF. The PDF has a tabular structure, which is quite simple and columns are also very widely spaced from each-other This works really well, except that all kinds of horizontal space gets converted into a single space character, so that I cannot tell columns apart anymore (space within words in a column looks

    Read the article

  • PDFBox: Problem with converting pdf page into image

    - by user552910
    My mission is pretty simple: converting every single page of a pdf file into images. I tried using icepdf open source version to generate the images but they don't generate the image with the correct font. So I start using PDFBox instead. The code is the following: PDDocument document = PDDocument.load(new File("testing.pdf"));

    Read the article

  • PDFBox Pagebreak strange Nullpointer exception

    - by schneiti
    I currently try printing out text on multiple pages. For this, I count the number of rows and when they reach a fixed amount a method called pagebreakis executed. After the first pagebreak, when I try setting a font using contentstream.setFont(PDType1Font.HELVETICA, 12); it yields the following errormessage occuring at the described

    Read the article

  • How to use PDFBox 1.0 in .net / C# environment using IKVM

    - by Evan
    Id like to use PDFBox to generate PDF highlight files in my .net project. PDFBox states that it can be used in .net via IKVM http://www.pdfbox.org/userguide/dot_net.html BUT running ikvmc (latest version) to generate the DLLs on PDFBOX.1.0.0.jar generates a whole lot of NoClassDefFound warnings. How should I fix this, and what

    Read the article

  • Using Java PDFBox library to write Russian PDF

    - by Brad
    I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It seems the problem is in the font used, but i am not so sure about that, so i hope if anyone could guide me through this. Here

    Read the article

  • Using Java PDFBox library to write Russian PDF

    - by Brad
    Hello , I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It seems the problem is in the font used, but i am not so sure about that, so i hope if anyone could guide me through

    Read the article

  • not readable PDF files

    - by Michal_R
    Hello, I am writing Master's thesis - NLP system. I have one component - extractor. It is extracting a plain text from PDF files. There are a few PDF files that can not be extracted correctly. Extractor (PDFBox library) returns a string like this: "¦xDn¦if|d+gDF"Ti&cD+lh d FÁhis~n +xd f«"d¦ffih »h" or

    Read the article

  • PDF to Image Conversion in Java

    - by Geertjan
    In the past, I created a NetBeans plugin for loading images as slides into NetBeans IDE. That means you had to manually create an image from each slide first. So, this time, I took it a step further. You can choose a PDF file, which is then automatically converted to an image for each page, each of which is

    Read the article

  • Export PDF pages to a series of images in Java

    - by dasp
    I need to export the pages of an arbitrary PDF document into a series of individual images in jpeg/png/etc format. I need to do this in in Java. Although I do know about iText, PDFBox and various other java pdf libraries, I am hoping for a pointer to some working example, or some how-to. Thanks.

    Read the article

  • PDF Text Extraction Approach Using OCR

    - by Jon
    Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written. I'm familiar with pdfbox, which

    Read the article

1