Hadoop streaming with Python and python subprocess

Posted by Ganesh on Stack Overflow See other posts from Stack Overflow or by Ganesh
Published on 2012-03-07T07:04:03Z Indexed on 2014/08/25 16:20 UTC
Read the original article Hit count: 398

I have established a basic hadoop master slave cluster setup and able to run mapreduce programs (including python) on the cluster.

Now I am trying to run a python code which accesses a C binary and so I am using the subprocess module. I am able to use the hadoop streaming for a normal python code but when I include the subprocess module to access a binary, the job is getting failed.

As you can see in the below logs, the hello executable is recognised to be used for the packaging, but still not able to run the code.

. . packageJobJar: [/tmp/hello/hello, /app/hadoop/tmp/hadoop-unjar5030080067721998885/] [] /tmp/streamjob7446402517274720868.jar tmpDir=null

JarBuilder.addNamedStream hello
.
.
12/03/07 22:31:32 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/07 22:31:32 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
12/03/07 22:31:32 INFO streaming.StreamJob: Running job: job_201203062329_0057
12/03/07 22:31:32 INFO streaming.StreamJob: To kill this job, run:
12/03/07 22:31:32 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=master:54311 -kill job_201203062329_0057
12/03/07 22:31:32 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201203062329_0057
12/03/07 22:31:33 INFO streaming.StreamJob:  map 0%  reduce 0%
12/03/07 22:32:05 INFO streaming.StreamJob:  map 100%  reduce 100%
12/03/07 22:32:05 INFO streaming.StreamJob: To kill this job, run:
12/03/07 22:32:05 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=master:54311 -kill job_201203062329_0057

12/03/07 22:32:05 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201203062329_0057
12/03/07 22:32:05 ERROR streaming.StreamJob: Job not Successful!

12/03/07 22:32:05 INFO streaming.StreamJob: killJob...
Streaming Job Failed!

Command I am trying is :

hadoop jar contrib/streaming/hadoop-*streaming*.jar -mapper /home/hduser/MARS.py -reducer /home/hduser/MARS_red.py -input /user/hduser/mars_inputt -output /user/hduser/mars-output -file /tmp/hello/hello -verbose

where hello is the C executable. It is a simple helloworld program which I am using to check the basic functioning.

My Python code is :

#!/usr/bin/env python
import subprocess
subprocess.call(["./hello"])

Any help with how to get the executable run with Python in hadoop streaming or help with debugging this will get me forward in this.

Thanks,

Ganesh

© Stack Overflow or respective owner

Related posts about python-subprocess-module

Related posts about hadoop-streaming