Search Results

Search found 56 results on 3 pages for 'glusterfs'.

Page 2/3 | < Previous Page | 1 2 3  | Next Page >

  • Mounting Replicated Gluster Multi-AZ Storage

    - by Roman Newaza
    I have Replicated Gluster Storage which is used by Auto scaling Servers. Both, Auto scaling and Storage are allocated in two Availability zones. Gluster: Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gluster01:/storage/1a # Zone A Brick2: gluster02:/storage/1b # Zone B Brick3: gluster03:/storage/2a # Zone A Brick4: gluster04:/storage/2b # Zone B Brick5: gluster01:/storage/3a # Zone A Brick6: gluster02:/storage/3b # Zone B Brick7: gluster03:/storage/4a # Zone A Brick8: gluster04:/storage/4b # Zone B I used Round Robin DNS for Gluster entry point, so DNS name resolves to all of the storage server addresses which are returned in different order all the time: # host storage.domain.com storage.domain.com has address xx.xx.xx.x1 storage.domain.com has address xx.xx.xx.x2 storage.domain.com has address xx.xx.xx.x3 storage.domain.com has address xx.xx.xx.x4 The Storage is mounted with Native Gluster Client: # grep storage /etc/fstab storage.domain.com:/storage /storage glusterfs defaults,log-level=WARNING,log-file=/var/log/gluster.log 0 0 I have heard Gluster might be mounted with the first Server IP and after that it will fetch its configuration with the rest of Servers. Personally, I never tested single Server mount setup and I don't know how Gluster handles this. On EC2, traffic among single Availability zone is free and between different zones is not. When Client in zone A writes to storage and IP of Storage in zone B is returned, it will cost me twice more for data transfer: Client (Zone A) - Storage Server (Zone B) - Replication to Storage Server (Zone A). Question: Would it be better to mount Storage Server of the same zone, so that data transfer charges apply only for replication (A - A - B)?

    Read the article

  • How to organise storage for media content such as video and music?

    - by thor
    Currently, we have a single server hosting all content: music, video and software. This content is downloaded by users through HTTP. Now free space is coming to an end and we are exploring different ways of extending our storage capacity. We want to do it cheap, simple and reliable (protected from disk/ server faults). Currenly, we see two ways: Add a couple of cheap servers with 4 disks (RAID1 ?), run some distributed file-system on top, like GlusterFS. Pros: hopefully, we will see all our disks as single flat file system, just dump content into it and be done. Cons: could be tricky in configuration and handling of faults. Add a couple of cheap servers, all running HTTP servers. Each piece of content (be it a music file or video) is placed on randomly selected two servers. Pros: don't have to deal with RAID, as content is duplicated; single server failure does not bring down any part of content; doubled distribution capacity (as any signle file could be downloaded from any of two servers hosting it). Cons: requires some scripting on part of distribution of content, adding/ removing servers. Do we miss any other ways? Which of the aforementioned options seems to be the best?

    Read the article

  • Gluster strange issue with shared mount point like seprate mount.

    - by Satish
    I have two nodes and for experiment i have install glusterfs and create volume and successfully mounted on own node, but if i create file in node1 it is not showing in node2, look like both behaving like they are separate. node1 10.101.140.10:/nova-gluster-vol 2.0G 820M 1.2G 41% /mnt node2 10.101.140.10:/nova-gluster-vol 2.0G 33M 2.0G 2% /mnt volume info split brian $ sudo gluster volume heal nova-gluster-vol info split-brain Gathering Heal info on volume nova-gluster-vol has been successful Brick 10.101.140.10:/brick1/sdb Number of entries: 0 Brick 10.101.140.20:/brick1/sdb Number of entries: 0 test node1 $ echo "TEST" > /mnt/node1 $ ls -l /mnt/node1 -rw-r--r-- 1 root root 5 Oct 27 17:47 /mnt/node1 node2 (file isn't there, while they are shared mount) $ ls -l /mnt/node1 ls: cannot access /mnt/node1: No such file or directory What i am missing??

    Read the article

  • Optimal directory structure for filesystem

    - by Pankaj
    We have large scale web application which has millions of customer. Each customer can have document based on document type. We may have 20-30 types of documents. We are planning to use GlusterFS for storing these documents. I'm trying to find out what are the limitations of Gluster as far as number of files/directories ? Do we need to have hierarchical directory structure ? What would be the optimal directory structure ? Does this make sense - CustmerId Documenttype File1 File2

    Read the article

  • Experience with MooseFS?

    - by brown.2179
    Anyone have any experience using MooseFS? I want an easy distributed storage platform to store static data archive of about 10 TB and serve it to 20-40 nodes. Also I want to be able to add storage as the archive grows without having to rebuild the filesystem. I don't care if it's a bit slow. I just want it to be simple and stable. Basically from what I can see for OS X it's between MooseFS and Gluster. Any other suggestions?

    Read the article

  • Gluster bricks are offline and errors in logs

    - by Roman Newaza
    I have substituted all the IP addresses with hostnames and renamed configs (IP to hostname) in /var/lib/glusterd by my shell script. After that I restarted Gluster Daemon and the volume. Then I checked if all the peers are connected: root@GlusterNode1a:~# gluster peer status Number of Peers: 3 Hostname: gluster-1b Uuid: 47f469e2-907a-4518-b6a4-f44878761fd2 State: Peer in Cluster (Connected) Hostname: gluster-2b Uuid: dc3a3ff7-9e30-44ac-9d15-00f9dab4d8b9 State: Peer in Cluster (Connected) Hostname: gluster-2a Uuid: 72405811-15a0-456b-86bb-1589058ff89b State: Peer in Cluster (Connected) I could see mounted volumes size change on all the nodes when I execute df command, so new data is coming. But recently I noticed error messages in app log: copy(/storage/152627/dat): failed to open stream: Structure needs cleaning readfile(/storage/1438227/dat): failed to open stream: Input/output error unlink(/storage/189457/23/dat): No such file or directory Finally, I have found out some bricks are offline: root@GlusterNode1a:~# gluster volume status Status of volume: storage Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster-1a:/storage/1a 24009 Y 1326 Brick gluster-1b:/storage/1b 24009 N N/A Brick gluster-2a:/storage/2a 24009 N N/A Brick gluster-2b:/storage/2b 24009 N N/A Brick gluster-1a:/storage/3a 24011 Y 1332 Brick gluster-1b:/storage/3b 24011 N N/A Brick gluster-2a:/storage/4a 24011 N N/A Brick gluster-2b:/storage/4b 24011 N N/A NFS Server on localhost 38467 Y 24670 Self-heal Daemon on localhost N/A Y 24676 NFS Server on gluster-2b 38467 Y 4339 Self-heal Daemon on gluster-2b N/A Y 4345 NFS Server on gluster-2a 38467 Y 1392 Self-heal Daemon on gluster-2a N/A Y 1402 NFS Server on gluster-1b 38467 Y 2435 Self-heal Daemon on gluster-1b N/A Y 2441 What can I do about that? I need to fix it. Note: CPU and Network usage of all the four nodes are about the same.

    Read the article

  • Pitfalls to using Gluster as a home/profile directory server?

    - by Bart Silverstrim
    I was asking recently about options for divvying up access to file servers, as we have a NAS solution that gets fairly bogged down when our users (with giant profiles, especially) all log in nearly simultaneously. I ran across Gluster and it looks like it can cluster different physical storage media into a single virtual volume and share it out like a virtual NAS from the client perspective and it support CIFS. My question is whether something like this would be feasible to use for home and profile directories in an active directory environment. I was worried about ACL's, primarily, as I didn't think CIFS was fine-grained enough to support NTFS permissions and it didn't look like Gluster exports those permission levels, just the base permissions for basic file sharing. I got the impression that using Gluster would allow for data to be redundant across multiple servers and would speed up access to the files under heavy load, while allowing us to dynamically boost storage capacity by just adding another server and telling Gluster's master node to add that server. Maybe I'm wrong with my understanding of it though. Anyone else use it or care to share how feasible this is?

    Read the article

  • gluster IOPS performance

    - by Gotys
    Can "gluster" serve FLV files without any front-end server layers just using the built-in HTTP protocol ? How would the IOPS compare to standard apache serving? Would gluster help me with I/O limits of harddrives, if I chained a lot of machines together? Thank you!

    Read the article

  • How exactly are Distributed File Systems used in cloud environment?

    - by vaab
    How exactly are Distributed File Systems used in cloud environment ? More precisely: Are live VMs images (or their filesystem) usually located in the DFS ? Are VMs usually used to run the backbone (actual code) of DFS structure ? Precise example citing DFS (ceph, Gluster, GFS, GPFS, Lustre) or cloud environment (Openstack , CloudStack, ...) would be appreciated, even if I'm more interessted by ceph on OpenStack for now.

    Read the article

  • Experience with MooseFS?

    - by brown.2179
    Anyone have any experience using MooseFS? I want an easy distributed storage platform to store static data archive of about 10 TB and serve it to 20-40 nodes. Also I want to be able to add storage as the archive grows without having to rebuild the filesystem. I don't care if it's a bit slow. I just want it to be simple and stable. Basically from what I can see for OS X it's between MooseFS and Gluster. Any other suggestions?

    Read the article

  • Unusually high dentry cache usage

    - by Wolfgang Stengel
    Problem A CentOS machine with kernel 2.6.32 and 128 GB physical RAM ran into trouble a few days ago. The responsible system administrator tells me that the PHP-FPM application was not responding to requests in a timely manner anymore due to swapping, and having seen in free that almost no memory was left, he chose to reboot the machine. I know that free memory can be a confusing concept on Linux and a reboot perhaps was the wrong thing to do. However, the mentioned administrator blames the PHP application (which I am responsible for) and refuses to investigate further. What I could find out on my own is this: Before the restart, the free memory (incl. buffers and cache) was only a couple of hundred MB. Before the restart, /proc/meminfo reported a Slab memory usage of around 90 GB (yes, GB). After the restart, the free memory was 119 GB, going down to around 100 GB within an hour, as the PHP-FPM workers (about 600 of them) were coming back to life, each of them showing between 30 and 40 MB in the RES column in top (which has been this way for months and is perfectly reasonable given the nature of the PHP application). There is nothing else in the process list that consumes an unusual or noteworthy amount of RAM. After the restart, Slab memory was around 300 MB If have been monitoring the system ever since, and most notably the Slab memory is increasing in a straight line with a rate of about 5 GB per day. Free memory as reported by free and /proc/meminfo decreases at the same rate. Slab is currently at 46 GB. According to slabtop most of it is used for dentry entries: Free memory: free -m total used free shared buffers cached Mem: 129048 76435 52612 0 144 7675 -/+ buffers/cache: 68615 60432 Swap: 8191 0 8191 Meminfo: cat /proc/meminfo MemTotal: 132145324 kB MemFree: 53620068 kB Buffers: 147760 kB Cached: 8239072 kB SwapCached: 0 kB Active: 20300940 kB Inactive: 6512716 kB Active(anon): 18408460 kB Inactive(anon): 24736 kB Active(file): 1892480 kB Inactive(file): 6487980 kB Unevictable: 8608 kB Mlocked: 8608 kB SwapTotal: 8388600 kB SwapFree: 8388600 kB Dirty: 11416 kB Writeback: 0 kB AnonPages: 18436224 kB Mapped: 94536 kB Shmem: 6364 kB Slab: 46240380 kB SReclaimable: 44561644 kB SUnreclaim: 1678736 kB KernelStack: 9336 kB PageTables: 457516 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 72364108 kB Committed_AS: 22305444 kB VmallocTotal: 34359738367 kB VmallocUsed: 480164 kB VmallocChunk: 34290830848 kB HardwareCorrupted: 0 kB AnonHugePages: 12216320 kB HugePages_Total: 2048 HugePages_Free: 2048 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 5604 kB DirectMap2M: 2078720 kB DirectMap1G: 132120576 kB Slabtop: slabtop --once Active / Total Objects (% used) : 225920064 / 226193412 (99.9%) Active / Total Slabs (% used) : 11556364 / 11556415 (100.0%) Active / Total Caches (% used) : 110 / 194 (56.7%) Active / Total Size (% used) : 43278793.73K / 43315465.42K (99.9%) Minimum / Average / Maximum Object : 0.02K / 0.19K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 221416340 221416039 3% 0.19K 11070817 20 44283268K dentry 1123443 1122739 99% 0.41K 124827 9 499308K fuse_request 1122320 1122180 99% 0.75K 224464 5 897856K fuse_inode 761539 754272 99% 0.20K 40081 19 160324K vm_area_struct 437858 223259 50% 0.10K 11834 37 47336K buffer_head 353353 347519 98% 0.05K 4589 77 18356K anon_vma_chain 325090 324190 99% 0.06K 5510 59 22040K size-64 146272 145422 99% 0.03K 1306 112 5224K size-32 137625 137614 99% 1.02K 45875 3 183500K nfs_inode_cache 128800 118407 91% 0.04K 1400 92 5600K anon_vma 59101 46853 79% 0.55K 8443 7 33772K radix_tree_node 52620 52009 98% 0.12K 1754 30 7016K size-128 19359 19253 99% 0.14K 717 27 2868K sysfs_dir_cache 10240 7746 75% 0.19K 512 20 2048K filp VFS cache pressure: cat /proc/sys/vm/vfs_cache_pressure 125 Swappiness: cat /proc/sys/vm/swappiness 0 I know that unused memory is wasted memory, so this should not necessarily be a bad thing (especially given that 44 GB are shown as SReclaimable). However, apparently the machine experienced problems nonetheless, and I'm afraid the same will happen again in a few days when Slab surpasses 90 GB. Questions I have these questions: Am I correct in thinking that the Slab memory is always physical RAM, and the number is already subtracted from the MemFree value? Is such a high number of dentry entries normal? The PHP application has access to around 1.5 M files, however most of them are archives and not being accessed at all for regular web traffic. What could be an explanation for the fact that the number of cached inodes is much lower than the number of cached dentries, should they not be related somehow? If the system runs into memory trouble, should the kernel not free some of the dentries automatically? What could be a reason that this does not happen? Is there any way to "look into" the dentry cache to see what all this memory is (i.e. what are the paths that are being cached)? Perhaps this points to some kind of memory leak, symlink loop, or indeed to something the PHP application is doing wrong. The PHP application code as well as all asset files are mounted via GlusterFS network file system, could that have something to do with it? Please keep in mind that I can not investigate as root, only as a regular user, and that the administrator refuses to help. He won't even run the typical echo 2 > /proc/sys/vm/drop_caches test to see if the Slab memory is indeed reclaimable. Any insights into what could be going on and how I can investigate any further would be greatly appreciated. Updates Some further diagnostic information: Mounts: cat /proc/self/mounts rootfs / rootfs rw 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devtmpfs /dev devtmpfs rw,relatime,size=66063000k,nr_inodes=16515750,mode=755 0 0 devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /dev/shm tmpfs rw,relatime 0 0 /dev/mapper/sysvg-lv_root / ext4 rw,relatime,barrier=1,data=ordered 0 0 /proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0 /dev/sda1 /boot ext4 rw,relatime,barrier=1,data=ordered 0 0 tmpfs /phptmp tmpfs rw,noatime,size=1048576k,nr_inodes=15728640,mode=777 0 0 tmpfs /wsdltmp tmpfs rw,noatime,size=1048576k,nr_inodes=15728640,mode=777 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0 cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0 /etc/glusterfs/glusterfs-www.vol /var/www fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0 /etc/glusterfs/glusterfs-upload.vol /var/upload fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 172.17.39.78:/www /data/www nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=38467,timeo=600,retrans=2,sec=sys,mountaddr=172.17.39.78,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=172.17.39.78 0 0 Mount info: cat /proc/self/mountinfo 16 21 0:3 / /proc rw,relatime - proc proc rw 17 21 0:0 / /sys rw,relatime - sysfs sysfs rw 18 21 0:5 / /dev rw,relatime - devtmpfs devtmpfs rw,size=66063000k,nr_inodes=16515750,mode=755 19 18 0:11 / /dev/pts rw,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 20 18 0:16 / /dev/shm rw,relatime - tmpfs tmpfs rw 21 1 253:1 / / rw,relatime - ext4 /dev/mapper/sysvg-lv_root rw,barrier=1,data=ordered 22 16 0:15 / /proc/bus/usb rw,relatime - usbfs /proc/bus/usb rw 23 21 8:1 / /boot rw,relatime - ext4 /dev/sda1 rw,barrier=1,data=ordered 24 21 0:17 / /phptmp rw,noatime - tmpfs tmpfs rw,size=1048576k,nr_inodes=15728640,mode=777 25 21 0:18 / /wsdltmp rw,noatime - tmpfs tmpfs rw,size=1048576k,nr_inodes=15728640,mode=777 26 16 0:19 / /proc/sys/fs/binfmt_misc rw,relatime - binfmt_misc none rw 27 21 0:20 / /cgroup/cpuset rw,relatime - cgroup cgroup rw,cpuset 28 21 0:21 / /cgroup/cpu rw,relatime - cgroup cgroup rw,cpu 29 21 0:22 / /cgroup/cpuacct rw,relatime - cgroup cgroup rw,cpuacct 30 21 0:23 / /cgroup/memory rw,relatime - cgroup cgroup rw,memory 31 21 0:24 / /cgroup/devices rw,relatime - cgroup cgroup rw,devices 32 21 0:25 / /cgroup/freezer rw,relatime - cgroup cgroup rw,freezer 33 21 0:26 / /cgroup/net_cls rw,relatime - cgroup cgroup rw,net_cls 34 21 0:27 / /cgroup/blkio rw,relatime - cgroup cgroup rw,blkio 35 21 0:28 / /var/www rw,relatime - fuse.glusterfs /etc/glusterfs/glusterfs-www.vol rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 36 21 0:29 / /var/upload rw,relatime - fuse.glusterfs /etc/glusterfs/glusterfs-upload.vol rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 37 21 0:30 / /var/lib/nfs/rpc_pipefs rw,relatime - rpc_pipefs sunrpc rw 39 21 0:31 / /data/www rw,relatime - nfs 172.17.39.78:/www rw,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=38467,timeo=600,retrans=2,sec=sys,mountaddr=172.17.39.78,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=172.17.39.78 GlusterFS config: cat /etc/glusterfs/glusterfs-www.vol volume remote1 type protocol/client option transport-type tcp option remote-host 172.17.39.71 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume remote2 type protocol/client option transport-type tcp option remote-host 172.17.39.72 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume remote3 type protocol/client option transport-type tcp option remote-host 172.17.39.73 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume remote4 type protocol/client option transport-type tcp option remote-host 172.17.39.74 option ping-timeout 10 option transport.socket.nodelay on # undocumented option for speed # http://gluster.org/pipermail/gluster-users/2009-September/003158.html option remote-subvolume /data/www end-volume volume replicate1 type cluster/replicate option lookup-unhashed off # off will reduce cpu usage, and network option local-volume-name 'hostname' subvolumes remote1 remote2 end-volume volume replicate2 type cluster/replicate option lookup-unhashed off # off will reduce cpu usage, and network option local-volume-name 'hostname' subvolumes remote3 remote4 end-volume volume distribute type cluster/distribute subvolumes replicate1 replicate2 end-volume volume iocache type performance/io-cache option cache-size 8192MB # default is 32MB subvolumes distribute end-volume volume writeback type performance/write-behind option cache-size 1024MB option window-size 1MB subvolumes iocache end-volume ### Add io-threads for parallel requisitions volume iothreads type performance/io-threads option thread-count 64 # default is 16 subvolumes writeback end-volume volume ra type performance/read-ahead option page-size 2MB option page-count 16 option force-atime-update no subvolumes iothreads end-volume

    Read the article

  • GlusterFS vs Ceph, which is better for production use for the moment?

    - by Mickey Shine
    I am evaluating GlusterFS and Ceph, seems Gluster is FUSE based which means it may be not as fast as Ceph. But looks like Gluster got a very friendly control panel and is ease to use. Ceph was merged into linux kernel a few days ago and this indicates that it has much more potential energy and may be a good choice in the future. I am wondering which(even out of the two?) is a better choice for production use? It would be nice if you could share your practical experiences

    Read the article

  • Please help to find a solution for two way, real-time synchronization on Centos 5.5 64Bit

    - by Vipul Limbachiya
    I am in need of a real time, two way synchronization software for Centos 5.5 / 64Bit. Here's little explanation: It needs to be able to perform: Two way synchronization. It must be realtime. By realtime means it can be almost realtime, i.e. a delay of 1 second for example is fine. And the folders are on the same server. I am currently using GlusterFS across two webservers. However, it has extremely poor small file read performance and it's slowing down my website. There's nothing more that can be done to improve this, I have already tested many configurations. As a solution, I was going to mount a RAM drive (tmpfs) that mirrors the GlusterFS web files but get the webserver to use the RAM drive. The issue is that I need two way realtime mirroring or replication between glusterfs and the RAM drive. I need this is as Apache writes files as wells. As I said, realtime two way synchronization across two folders. Which are in fact 2 different mounts points. The RAM (tmpfs) mount poing and the GlusterFS mount point. I already know about: Rsync - Which is one way Unison - Which is not realtime Please suggest me any solution free or paid. Thanks in advance

    Read the article

  • How should we serve files in a small bioinformatics cluster?

    - by cespinoza
    We have a small cluster of six ubuntu servers. We run bioinformatics analyses on these clusters. Each analysis takes about 24 hours to complete, each core i7 server can handle 2 at a time, takes as input about 5GB data and outputs about 10-25GB of data. We run dozens of these a week. The software is a hodgepodge of custom perl scripts and 3rd party sequence alignment software written in C/C++. Currently, files are served from two of the compute nodes (yes, we're using compute nodes as file servers)-- each node has 5 1TB sata drives mounted separately (no raid) and is pooled via glusterfs 2.0.1. They each have as 3 bonded intel ethernet pci gigabit ethernet cards, attached to a d-link DGS-1224T switch ($300 24 port consumer-level). We are not currently using jumbo frames (not sure why, actually). The two file-serving compute nodes are then mirrored via glusterfs. Each of the four other nodes mounts the files via glusterfs. The files are all large (4gb+), and are stored as bare files (no database/etc) if that matters. As you can imagine, this is a bit of a mess that grew organically without forethought and we want to improve it now that we're running out of space. Our analyses are I/O intensive and it is a bottle neck-- we're only getting 140mB/sec between the two fileservers, maybe 50mb/sec from the clients (which only have single NICs). We have a flexible budget which I can probably get up $5k or so. How should we spend our budget? We need at least 10TB of storage fast enough to serve all nodes. How fast/big does the cpu/memory of such a file server have to be? Should we use NFS, ATA over Ethernet, iSCSI, Glusterfs, or something else? Should we buy two or more servers and create some sort of storage cluster, or is 1 server enough for such a small number of nodes? Should we invest in faster NICs (say, PCI-express cards with multiple connectors)? The switch? Should we use raid, if so, hardware or software? and which raid (5, 6, 10, etc)? Any ideas appreciated. We're biologists, not IT gurus.

    Read the article

  • What are the typical methods used to scale up/out email storage servers?

    - by nareshov
    Hi, What I've tried: I have two email storage architectures. Old and new. Old: courier-imapds on several (18+) 1TB-storage servers. If one of them show signs of running out of disk space, we migrate a few email accounts to another server. the servers don't have replicas. no backups either. New: dovecot2 on a single huge server with 16TB (SATA) storage and a few SSDs we store fresh mails on the SSDs and run a doveadm purge to move mails older than a day to the SATA disks there is an identical server which has a max-15min-old rsync backup from the primary server higher-ups/management wanted to pack in as much storage as possible per server in order to minimise the cost of SSDs per server the rsync'ing is done because GlusterFS wasn't replicating well under that high small/random-IO. scaling out was expected to be done with provisioning another pair of such huge servers on facing disk-crunch issues like in the old architecture, manual moving of email accounts would be done. Concerns/doubts: I'm not convinced with the synchronously-replicated filesystem idea works well for heavy random/small-IO. GlusterFS isn't working for us yet, I'm not sure if there's another filesystem out there for this use case. The idea was to keep identical pairs and use DNS round-robin for email delivery and IMAP/POP3 access. And if one the servers went down for whatever reasons (planned/unplanned), we'd move the IP to the other server in the pair. In filesystems like Lustre, I get the advantage of a single namespace whereby I do not have to worry about manually migrating accounts around and updating MAILHOME paths and other metadata/data. Questions: What are the typical methods used to scale up/out with the traditional software (courier-imapd / dovecot)? Do traditional software that store on a locally mounted filesystem pose a roadblock to scale out with minimal "problems"? Does one have to re-write (parts of) these to work with an object-storage of some sort - such as OpenStack object storage?

    Read the article

  • Reverse proxy setup for distributed storage

    - by vise
    I have 4 file servers that I want to access under a single mount point from another server. This server has a web application that should serve content from the mounted point. I think I can achieve this with glusterfs. Considering that the file servers have fairly powerful hardware, I want to install a webserver on each of them and serve those files via a reverse proxy. Any thoughts on how I may be able to do so?

    Read the article

  • Recommendations for distributed processing/distributed storage systems

    - by Eddie
    At my organization we have a processing and storage system spread across two dozen linux machines that handles over a petabyte of data. The system right now is very ad-hoc; processing automation and data management is handled by a collection of large perl programs on independent machines. I am looking at distributed processing and storage systems to make it easier to maintain, evenly distribute load and data with replication, and grow in disk space and compute power. The system needs to be able to handle millions of files, varying in size between 50 megabytes to 50 gigabytes. Once created, the files will not be appended to, only replaced completely if need be. The files need to be accessible via HTTP for customer download. Right now, processing is automated by perl scripts (that I have complete control over) which call a series of other programs (that I don't have control over because they are closed source) that essentially transforms one data set into another. No data mining happening here. Here is a quick list of things I am looking for: Reliability: These data must be accessible over HTTP about 99% of the time so I need something that does data replication across the cluster. Scalability: I want to be able to add more processing power and storage easily and rebalance the data on across the cluster. Distributed processing: Easy and automatic job scheduling and load balancing that fits with processing workflow I briefly described above. Data location awareness: Not strictly required but desirable. Since data and processing will be on the same set of nodes I would like the job scheduler to schedule jobs on or close to the node that the data is actually on to cut down on network traffic. Here is what I've looked at so far: Storage Management: GlusterFS: Looks really nice and easy to use but doesn't seem to have a way to figure out what node(s) a file actually resides on to supply as a hint to the job scheduler. GPFS: Seems like the gold standard of clustered filesystems. Meets most of my requirements except, like glusterfs, data location awareness. Ceph: Seems way to immature right now. Distributed processing: Sun Grid Engine: I have a lot of experience with this and it's relatively easy to use (once it is configured properly that is). But Oracle got its icy grip around it and it no longer seems very desirable. Both: Hadoop/HDFS: At first glance it looked like hadoop was perfect for my situation. Distributed storage and job scheduling and it was the only thing I found that would give me the data location awareness that I wanted. But I don't like the namename being a single point of failure. Also, I'm not really sure if the MapReduce paradigm fits the type of processing workflow that I have. It seems like you need to write all your software specifically for MapReduce instead of just using Hadoop as a generic job scheduler. OpenStack: I've done some reading on this but I'm having trouble deciding if it fits well with my problem or not. Does anyone have opinions or recommendations for technologies that would fit my problem well? Any suggestions or advise would be greatly appreciated. Thanks!

    Read the article

  • How do you create large, growable, shared filesystems on Linux at AWS?

    - by Reece
    What are acceptable/reasonable/best ways to provide large, growable, shared storage at AWS, exposed as a single filesystem? We're currently making 1TB EBS volumes ~biweekly and NFS exporting with no_subtree_check and nohide. In this setup, distinct exports appear under a single mount on the client. This arrangement does not scale well. The options we've considered: LVM2 with ext4. resize2fs is too slow. Btrfs on Linux. not obviously ready for prime time yet. ZFS on Linux. not obviously ready for prime time yet (although LLNL uses it) ZFS on Solaris. future of this combo is uncertain (to me), and new OS in the mix glusterfs. heard mostly good but two scary (and maybe old?) stories. The ideal solution would provide sharing, a single fs view, easy expandability, snapshots, and replication. Thanks for sharing ideas and experience.

    Read the article

  • best filesystem for an aws s3 like service

    - by gucki
    Hi! I need to build a fault tolerant, highly available key/value storage (no posix, only same functionaluty as S3) using cheap existing hardware. The storage should be able to handle several billions of items. The maximum size of items is around 1GB, most are only several KB. What's the best software/ filesystem for this task? I already had a brief look at mogilefs, mongodb (grid-fs) & glusterfs but I'm not really sure which is stable & fault tolerant enough. The simpler the setup and later expansion the better :). Corin

    Read the article

  • Distributed, Parallel, Fault-tolerant File System

    - by Eddified
    There are so many choices that it's hard to know where to start. My requirements are these: Runs on Linux Most of the files will be between 5-9 MB in size. There will also be a significant number of small-ish jpgs (100px x 100px). All of the files need to be available over http. Redundancy -- ideally it would provide the space efficiency similar to RAID 5 of 75% (in RAID 5 this would be calculated thus: with 4 identical disks, 25% of the space is used for parity = 75% efficent) Must support several petabytes of data scalable runs on commodity hardware In addition, I look for these qualities, though they are not "requirements": Stable, mature file system Lots of momentum and support etc I would like some input as to which file system works best for the given requirements. Some people at my organization are leaning towards MogileFS, but I'm not convinced of the stability and momentum of that project. GlusterFS and Lustre, based on my limited research, appear to be better supported... Thoughts?

    Read the article

  • fstab mount after network initialization

    - by Philip
    I'm automatically mounting a NFS with fstab. Sometimes the mount fails because the hostname of the NFS mount cannot be resolved (getaddrinfo failed). I'm assuming that this happens because the network initialization is slower than the mounts. Is there any way to initialize the network before mounting any devices? I'm already using _netdev as a mount option but this does not help. This is my current fstab file: # <file system> <mount point> <type> <options> <dump> <pass> /dev/md2 / ext3 errors=remount-ro 0 1 /dev/md1 /boot ext3 errors=remount-ro 0 1 /dev/sda3 swap swap defaults 0 0 /dev/sdb3 swap swap defaults 0 0 connect.mygluster.net:/data /var/gluster/data glusterfs ro,_netdev 0

    Read the article

  • Is it necessary to have firewalls rules between trusted nodes communicating on their backend interfaces?

    - by Tom
    I have 6 nodes that have internet access on eth1 and private access to one another on eth0. Currently I have firewall rules for eth0, for things like memcached and NFS. Is this necessary? It's a real headache as NFS for example communicates on loads of different ports, and I recently introduced glusterfs which needs more still. Is the headache of figuring out what backend ports to unblock worth the security enhancement? I should mention that I will of course still have a firewall rule on eth0 to block servers owned by others in the same datacenter. Thanks

    Read the article

  • In-House DropBox

    - by beardedlinuxgeek
    Dropbox is perfect, but as a company, no one can host anything worthwhile on servers that we don't control. So I've been tasked with coming up with a Dropbox alternative, something in house. GlusterFS is nice, but no offline access. SparkleShare uses Git which isn't great for large files. It also doesn't have windows ports. Any other options? If I were to roll out my own from scratch, what do you think the based way to go about doing this would be?

    Read the article

  • In-House DropBox

    - by beardedlinuxgeek
    Dropbox is perfect, but as a company, no one can host anything worthwhile on servers that we don't control. So I've been tasked with coming up with a Dropbox alternative, something in house. GlusterFS is nice, but no offline access. SparkleShare uses Git which isn't great for large files. It also doesn't have windows ports. Any other options? If I were to roll out my own from scratch, what do you think the based way to go about doing this would be?

    Read the article

  • updatedb & locate command problem - Files from external hard drive are no longer indexed after rebooting

    - by user784637
    Files from my external hard drive are no longer indexed after rebooting. I have to remount and then run # updatedb after each reboot. The problem is updatedb takes a few minutes for my external hard drives. Is there any way I can retain indexing for my externals after I reboot so that the locate command can search through my externals? EDIT: Per Request here are my specs: $ cat /etc/updatedb.conf PRUNE_BIND_MOUNTS="yes" # PRUNENAMES=".git .bzr .hg .svn" PRUNEPATHS="/tmp /var/spool /media" PRUNEFS="NFS nfs nfs4 rpc_pipefs afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre_lite tmpfs usbfs udf fuse.glusterfs fuse.sshfs ecryptfs fusesmb devtmpfs" # mount /dev/sda5 on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,noexec,nosuid,nodev) none on /sys type sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/fuse/connections type fusectl (rw) none on /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type securityfs (rw) none on /dev type devtmpfs (rw,mode=0755) none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) none on /dev/shm type tmpfs (rw,nosuid,nodev) none on /var/run type tmpfs (rw,nosuid,mode=0755) none on /var/lock type tmpfs (rw,noexec,nosuid,nodev) none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev) gvfs-fuse-daemon on /home/me/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=me) /dev/sdb1 on /media/me type fuseblk (rw,nosuid,nodev,allow_other,blksize=4096,default_permissions) /dev/sdd1 on /media/Little Boy type fuseblk (rw,nosuid,nodev,allow_other,blksize=4096,default_permissions) /dev/sde1 on /media/Fat Man type fuseblk (rw,nosuid,nodev,allow_other,blksize=4096,default_permissions) # on_ac_power; echo $? 255

    Read the article

< Previous Page | 1 2 3  | Next Page >