Websites down EC2 inaccessible via SSH CPU utilisation 100% last few hours - what should I do?

Posted by fuzzybee on Server Fault See other posts from Server Fault or by fuzzybee
Published on 2012-11-13T11:55:05Z Indexed on 2012/11/14 5:02 UTC
Read the original article Hit count: 381

Filed under:

I have multiple websites hosted on 1 single EC2 instance.

  • 1 website "abc" were down for a few hours, sometimes threw database connection error and sometimes just took too long to respond.
  • 1 website "def" were incredibly slow but still up and running
  • the rest of the websites had the same symptoms has "abc"

I can afford 15 min or less down time for "def".

Should I then (in AWS console)

  • reboot my instance
  • or
  • create an AMI image from my instance and launch it and associate my elastic IP to the new instance
  • or
  • "launch more like this"

Background on what may have happened to my ec2

  • The last time I made changes for 21 hours ago.
  • A cronjob to create snapshots ran around 19 hours ago and it has been running for a long time.
  • Google Analytics shows traffic to my websites such as kidlander.sg has been nothing exceptional.

Is there any other actions I should take or better options I could have?
(I have already contacted AWS support but their turnaround is 12 hours so I appreciate all the help I could get)

Update I got everything back up and running and CPU utilisation back to normal, around 30%.

There is 1 difference between "def" and "abc" as well as my other websites "def"'s database is hosted on RDS "abc"'s database is hosted on an EC2 instance (different from my web server instance) configured by myself

Nevertheless, I checked the EC2 instance I'm using as MySQL server yesterday and it was absolutely fine during the incident

  • low CPU ultilisation
  • I could log in using linux command line

© Server Fault or respective owner

Related posts about amazon-ec2