Distributed storage and computing

Posted by Tim van Elteren on Server Fault See other posts from Server Fault or by Tim van Elteren
Published on 2013-10-28T14:22:10Z Indexed on 2013/10/28 15:57 UTC
Read the original article Hit count: 297

Filed under:

After researching a number of distributed file systems for deployment in a production environment with the main purpose of performing both batch and real-time distributed computing I've identified the following list as potential candidates, mainly on maturity, license and support:

Ceph
Lustre
GlusterFS
HDFS
FhGFS
MooseFS
XtreemFS

The key properties that our system should exhibit:

an open source, liberally licensed, yet production ready, e.g. a mature, reliable, community and commercially supported solution;
ability to run on commodity hardware, preferably be designed for it;
provide high availability of the data with the most focus on reads;
high scalability, so operation over multiple data centres, possibly on a global scale;
removal of single points of failure with the use of replication and distribution of (meta-)data, e.g. provide fault-tolerance.

The sensitivity points that were identified, and resulted in the following questions, are:

transparency to the processing layer / application with respect to data locality, e.g. know where data is physically located on a server level, mainly for resource allocation and fast processing, high performance, how can this be accomplished? Do you from experience know what solutions provide this transparency and to what extent?
posix compliance, or conformance, is mentioned on the wiki pages of most of the above listed solutions. The question here mainly is, how relevant is support for the posix standard? Hadoop for example isn't posix compliant by design, what are the pro's and con's?
what about the difference between synchronous and asynchronous opeartion of a distributed file system. Though a synchronous distributed file system has the preference because of reliability it also imposes certain limitations with respect to scalability. What would be, from your expertise, the way to go on this?

I'm looking forward to your replies. Thanks in advance! :)

With kind regards,

Tim van Elteren

Developer IT

Distributed storage and computing - Developer IT

Distributed storage and computing

distributed-filesystems

distributed-computing

Related posts about distributed-filesystems

How to install ceph on EC2 Amazon Linux AMI

Experience with MooseFS?

GlusterFS vs Ceph, which is better for production use for the moment?

How stable is POHMELFS?

gpfs: adding a new nsd server to a cluster

Related posts about distributed-computing

Distributed Computing Framework (.NET) - Specifically for CPU Instensive operations

Where do I start with distributed computing?

running a python script where dependencies are not avail: distributed computing

Java - System design with distributed Queues and Locks

Can an issue tracking system be distributed?

Categories cloud