How to debug silent hang on shutdown of Solaris 10?

Posted by jblaine on Server Fault See other posts from Server Fault or by jblaine
Published on 2011-02-21T14:57:25Z Indexed on 2011/02/21 15:26 UTC
Read the original article Hit count: 280

Filed under:
|
|
|

We're experiencing a mysterious hang on shutdown of a newly-imaged Oracle/Sun Solaris 10 SPARC box. It is repeatable (in the same spot ... from what we can tell). We let it try to work itself out multiple times for 5-10 minutes and it never progressed.

I've never seen this happen before.

The last thing displayed on the console is that syslogd was sent signal 15. Prior to us disabling snmpdx on the box, the last thing on the console was that snmpdx was sent signal 15 (after syslogd was sent signal 15).

While very rare to find, in Solaris days past, I'd have a better idea from experience where the problem might be, and could then narrow things down further with silly (but effective) debugging echo statments in /etc/*.d scripts. With SMF in the picture, I'm not really quite sure where to start.

We forced a crash dump via sync at the {ok} prompt for later analysis, and then let the box come up because it's a production server and our scheduled outage window was closing.

/var/adm/messages shows nothing of use.

How would you debug this situation?

mdb ps of the savecore shows the following processes were running at hang time (afsd is the OpenAFS client and that many are expected):

> > ::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R      0      0      0      0      0 0x00000001 00000000018387c0 sched
R    108      0      0      0      0 0x00020001 00000600110fe010 zpool-silmaril-p
R      3      0      0      0      0 0x00020001 0000060010b29848 fsflush
R      2      0      0      0      0 0x00020001 0000060010b2a468 pageout
R      1      0      0      0      0 0x4a024000 0000060010b2b088 init
R   1327      1   1327    329      0 0x4a024002 00000600176ab0c0 reboot
R    747      1      7      7      0 0x42020001 0000060017f9d0e0 afsd
R    749      1      7      7      0 0x42020001 00000600180104d0 afsd
R    752      1      7      7      0 0x42020001 0000060017cb44b8 afsd
R    754      1      7      7      0 0x42020001 0000060017fc8068 afsd
R    756      1      7      7      0 0x42020001 0000060017fcb0e8 afsd
R    760      1      7      7      0 0x42020001 00000600177f4048 afsd
R    762      1      7      7      0 0x42020001 000006001800f8b0 afsd
R    764      1      7      7      0 0x42020001 000006001800ec90 afsd
R    378      1    378    378      0 0x42020000 0000060013aee480 inetd
R      7      1      7      7      0 0x42020000 0000060010b28008 svc.startd
R    329      7    329    329      0 0x4a024000 00000600110ff850 sh
Z    317      7    317    317      0 0x4a014002 0000060013b3a490 sac 

© Server Fault or respective owner

Related posts about unix

Related posts about solaris