PDF text search and split library

Posted by Horace Ho on Stack Overflow See other posts from Stack Overflow or by Horace Ho
Published on 2010-04-21T07:23:53Z Indexed on 2010/04/21 7:43 UTC
Read the original article Hit count: 273

Filed under:

I am look for a server side PDF library (or command line tool) which can:

  • split a multi-page PDF file into individual PDF files, based on
  • a search result of the PDF file content

Examples:

  • Search "Page ???" pattern in text and split the big PDF into 001.pdf, 002,pdf, ... ???.pdf

A server program will scan the PDF, look for the search pattern, save the page(s) which match the patten, and save the file in the disk.

It will be nice with integration with PHP / Ruby. Command line tool is also acceptable. It will be a server side (linux or win32) batch processing tool. GUI/login is not supported. i18n support will be nice but no required. Thanks~

© Stack Overflow or respective owner

Related posts about pdf