I'm trying to run some PHP scripts as CLI instead of over HTTP. How do I make them play nice?
- by gnfti
Hi everyone. I'm using some PHP scripts from FeedForAll to join together RSS feeds (RSSmesh) and display them as HTML (RSS2HTML).
Because I intend to run these scripts fairly intensively and don't want the resulting HTTP requests and bandwidth to count towards my hosting quota, I am in the process of moving to running them on the web host's server in an umbrella PHP "batch" script, and call this script via cron (this is a Linux server, by the way).
Here's a (working) sample request over HTTP:
http://www.mydomain.com/a/rss2htmlcore/rss2html2.php?XMLFILE=http://www.mydomain.com/a/myapp/xmlcache/feed.xml&TEMPLATE=template.html
This will produce the desired HTML output. An example of how I want this to work on the command line:
/srv/customers/mycustomer#/mydomain.com/www/a/rss2htmlcore/rss2html2-cli.php /srv/customers/mycustomer#/mydomain.com/www/a/myapp/xmlcache/feed.xml /srv/customers/mycustomer#/mydomain.com/www/a/template.html
This is with the correct shebang line added to "rss2html2-cli.php". I could just as well specify the executable ("/usr/local/bin/php") in the request, I doubt it makes a difference because I am able to run another script (that I wrote myself) either way without problems.
Now, RSS2HTML and RSSmesh are different in that, for starters, they include secondary files -- for example, both include an XML parser script -- and I suspect that this is where I am getting a bit in over my head.
Right now I'm calling exec() from the "umbrella" batch script, like so:
exec("/srv/customers/mycustomer#/mydomain.com/www/a/rss2htmlcore/rss2html2-cli.php /srv/customers/mycustomer#/mydomain.com/www/a/myapp/xmlcache/feed.xml /srv/customers/mycustomer#/mydomain.com/www/a/template.html", $output)
But no output is being produced. What's the best way to go about this and what "gotchas" should I keep in mind? Is exec() the right way to approach this? It works fine for the other (simple) script but that writes its own output. For this I want to get the output and write it to a file from within the umbrella script if possible. I've also tried output buffering but to no avail.
Do I need to pay attention to anything specific with regard to the includes? Right now they're specified in the scripts as include_once("FeedForAll_XMLParser.inc.php"); and the specified files are indeed in the same folder.
Further info:
-This is a Linux server.
-I have no direct access to the shell, so I can't test things directly on a command line, everything is via crontab.
-I will admit that support for the FeedForAll scripts leaves a lot to be desired, but I'd like to keep using their scripts if at all possible, if only because I know them and have been using them for a while. I have looked into Simplepie, but the FFA scripts do some things that I've seen no obvious solutions for with Simplepie, like limiting the number of items per individual feed (RSSmesh) or limiting the description length (RSS2HTML).
-Yahoo! Pipes is out, they cache their data for too long for my application.
Should you want to take a look at the code, here are the scripts as txt files. RSS2HTML2 and RSSmesh are the FeedForAll scripts, FeedForAll_XMLParser... is the included parser. Note that I have not yet amended these to handle $argv etc. I have however in "scraper-universal-rss-cli", which works fine with CLI. 
If anyone has any thoughts to share on this it would be very much appreciated. Thank you in advance.