PHP Explode with an Unicode character as separator

Posted by Young Roger on Stack Overflow See other posts from Stack Overflow or by Young Roger
Published on 2012-09-02T09:36:06Z Indexed on 2012/09/02 9:37 UTC
Read the original article Hit count: 464

Filed under:
|
|
|
|

XPDFs pdftotext converts pdf to text and outputs it at command line level. If needed it inserts PageBreaks between the pages as specified in TextOutputDev.cc:

eopLen = uMap->mapUnicode(0x0c, eop, sizeof(eop));

This Unicode symbol is encoding independent, -enc ASCII7 wouldn't change it. I'm currently willing to use PHP for converting and splitting the PDF file into several TXT pages for database storage. However, the following function does work, but takes twice as long as a conversion of the whole book in one time.

for($i = 1; $i <= $pages[0]; $i++)
$page[$i] = shell_exec('/usr/bin/pdftotext sample.pdf -f '.$i.' -l '.$i.' -');

How am I supposed to explode(0x0c, $wholePDF) with an Unicode character as separator? Currently, page[$i] doesn't seem to retrieve those weird Unicode PageBreak characters from the shell_exec(). I tried several headers for encoding (UTF-8 especially) but it didn't work out so far.

© Stack Overflow or respective owner

Related posts about php

Related posts about unicode