How can I create multiple identical AWS EC2 server instances with large amounts of persistent data?

Posted by mojones on Server Fault See other posts from Server Fault or by mojones
Published on 2012-09-19T14:33:56Z Indexed on 2012/09/19 15:41 UTC
Read the original article Hit count: 253

I have a CPU-intensive data-processing application that I want to run across many (~100,000) input files. The application needs a large (~20GB) data file in order to run. What I would like to do is

  • create an EC2 machine image that has my application and associated data files installed
  • boot up a large number (e.g. 100) of instances of this image
  • split my input files up into 100 batches and send one batch to be processed on each instance

I am having trouble figuring out the best way to ensure that each instance has access to the large data file. The data file is too big to fit on the root filesystem of an AMI. I could use Block Storage, but a given Block Storage volume can only be attached to a single instance, so I would need 100 clones.

Is there some way to create a custom image that has more space on the root filsystem so that I can include my large data file? Or is there a better way to tackle this problem?

© Server Fault or respective owner

Related posts about amazon-ec2

Related posts about amazon-web-services