Process not Listed by PS or in /proc/
- by Hammer Bro.
I'm trying to figure out how to operate a rather large Java program, 'prog'. If I go to its /bin/ dir and configure its setenv.sh and prog.sh to use local directories and my current user account. Then I try to run it via "./prog.sh start". Here are all the relevant bits of prog.sh:
USER=(my current account)
_CMD="/opt/jdk/bin/java -server -Xmx768m -classpath "${CLASSPATH}" -jar "${DIR}/prog.jar""
case "${ACTION}" in
    start)
        nohup su ${USER} -c "exec ${_CMD} >>${_LOGFILE} 2>&1" >/dev/null &
        echo $! >${_PID}
        echo "Prog running. PID="`cat ${_PID}`
        ;;
    stop)
        PID=`cat ${_PID} 2>/dev/null`
        echo "Shutting down prog: ${PID}
        kill -QUIT ${PID} 2>/dev/null
        kill ${PID} 2>/dev/null
        kill -KILL ${PID} 2>/dev/null
        rm -f ${_PID}
        echo "STOPPED `date`" >>${_LOGFILE}
        ;;
When I actually do ./prog.sh start, it starts. But I can't find it at all on the process list. Nor can I kill it manually, using the same command the shell script uses. But I can tell it's running, because if I do ./prog.sh stop, it stops (and some temporary files elsewhere clean themselves out).
./prog.sh start
Prog running. PID=1234
ps eaux | grep 1234
ps eaux | grep -i prog.jar
ps eaux >> pslist.txt
(It's not there either by PID or any clear name I can find: prog, java or jar.)
cd /proc/1234/
-bash: cd: /proc/1234/: No such file or directory
kill -QUIT 1234
kill 1234
kill -KILL 1234
-bash: kill: (1234) - No such process
./prog.sh stop
Shutting down prog: 1234
As far as I can tell, the process is running yet not in any way listed by the system. I can't find it in ps or /proc/, nor can I kill it. But the shell script can still stop it properly. So my question is, how can something like this happen? Is the process supremely hidden, actually unlisted, or am I just missing it in some fashion? I'm trying to figure out what makes this program tick, and I can barely prove that it's ticking!
Edit:
ps eu | grep prog.sh (after having restarted; so random PID)
50038    19381  0.0  0.0  4412  632 pts/3    S+   16:09   0:00 grep prog.sh HOSTNAME=machine.server.com TERM=vt100 SHELL=/bin/bash HISTSIZE=1000 SSH_CLIENT=::[STUFF] 1754 22 CVSROOT=:[DIR] SSH_TTY=/dev/pts/3 ANT_HOME=/opt/apache-ant-1.7.1 USER=[USER] LS_COLORS=[COLORS] SSH_AUTH_SOCK=[DIR] KDEDIR=/usr MAIL=[DIR] PATH=[DIRS] INPUTRC=/etc/inputrc PWD=[PWD] JAVA_HOME=/opt/jdk1.6.0_21 LANG=en_US.UTF-8 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass M2_HOME=/opt/apache-maven-2.2.1 SHLVL=1 HOME=[~] LOGNAME=[USER] SSH_CONNECTION=::[STUFF] LESSOPEN=|/usr/bin/lesspipe.sh %s G_BROKEN_FILENAMES=1 _=/bin/grep OLDPWD=[DIR]
I just realized that the stop) part of prog.sh isn't actually a guarantee that the process it claims to be stopping is running -- it just tries to kill the PID and suppresses all output then deletes the temporary file and manually inserts STOPPED into the log file. So I'm no longer so certain that the process is always running when I ps for it, although the code sample above indicates that it at least runs erratically. I'll continue looking into this undocumented behemoth when I return to work tomorrow.