Search Results

Search found 78 results on 4 pages for 'drbd'.

Page 3/4 | < Previous Page | 1 2 3 4 | Next Page >

Turn computer into DAS (Direct Attached Storage)

- by Damon

Can we build a direct attached storage by taking a computer/server, adding an HBA, and installing some appropriate software? We would use Debian as a host OS for both the DAS and the server. If so, what software do we use? And do we simply need a HBA for the DAS and the Server? Or do we need more hardware? The goal is to use an older server that does not have enough room for drives but does have ECC memory, server processors, redundant power supplies, dual nics, etc. Then find any boxes, server or not, the key being having enough room for 8-12 drives, fans, etc. and turning them into a DAS; build two of these DAS's and have them connected to the server. Eventually we want to have two servers using DRBD and associated services like heartbeat and pace maker to create an HA setup for our server(s) but that will take a long time to configure since I have no experience with anything related to DRBD (yet) and have a learning curve I have to get past, not to mention the additional cost of more hardware (two servers vs one).

Read the article
Slow NFS and GFS2 performance

- by Tiago

Recently I've designed and configured a 4 node cluster for a webapp that does lots of file handling. The cluster have been broken down into 2 main roles, webserver and storage. Each role is replicated to a second server using drbd in active/passive mode. The webserver does a NFS mount of the data directory of the storage server and the latter also has a webserver running to serve files to browser clients. In the storage servers I've created a GFS2 FS to hold the data which is wired to drbd. I've chose GFS2 mainly because the announced performance and also because the volume size which has to be pretty high. Since we entered production I've been facing two problems that I think are deeply connected. First of all, the NFS mount on the webservers keeps hanging for a minute or so and then resumes normal operations. By analyzing the logs I've found out that NFS stops answering for a while and outputs the following log lines: Oct 15 18:15:42 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:44 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:46 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:47 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:47 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:47 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:48 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:48 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:51 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:52 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:52 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:55 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:55 <server hostname> kernel: nfs: server active.storage.vlan not responding, still trying Oct 15 18:15:58 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK Oct 15 18:15:59 <server hostname> kernel: nfs: server active.storage.vlan OK In this case, the hang lasted for 16 seconds but sometimes it takes 1 or 2 minutes to resume normal operations. My first guess was this was happening due to heavy load of the NFS mount and that by increasing RPCNFSDCOUNT to a higher value, this would become stable. I've increased it several times and apparently, after a while, the logs started appearing less times. The value is now on 32. After further investigating the issue, I've came across a different hang, despite the NFS messages still appear in the logs. Sometimes, the GFS2 FS simply hangs which causes both the NFS and the storage webserver to serve files. Both stay hang for a while and then they resume normal operations. This hangs leaves no trace on client side (also leaves no NFS ... not responding messages) and, on the storage side, the log system appears to be empty, even though the rsyslogd is running. The nodes connect themselves through a 10Gbps non-dedicated connection but I don't think this is an issue because the GFS2 hang is confirmed but connecting directly to the active storage server. I've been trying to solve this for a while now and I've tried different NFS configuration options, before I've found out the GFS2 FS is also hanging. The NFS mount is exported as such: /srv/data/ <ip_address>(rw,async,no_root_squash,no_all_squash,fsid=25) And the NFS client mounts with: mount -o "async,hard,intr,wsize=8192,rsize=8192" active.storage.vlan:/srv/data /srv/data After some tests, these were the configurations that yielded more performance to the cluster. I am desperate to find a solution for this as the cluster is already in production mode and I need to fix this so that this hangs won't happen in the future and I don't really know for sure what and how I should be benchmarking. What I can tell is that this is happening due to heavy loads as I have tested the cluster earlier and this problems weren't happening at all. Please tell me if you need me to provide configuration details of the cluster, and which do you want me to post. As last resort I can migrate the files to a different FS but I need some solid pointers on whether this will solve this problems as the volume size is extremely large at this point. The servers are being hosted by a third-party enterprise and I don't have physical access to them. Best regards. EDIT 1: The servers are physical servers and their specs are: Webservers: Intel Bi Xeon E5606 2x4 2.13GHz 24GB DDR3 Intel SSD 320 2 x 120GB Raid 1 Storage: Intel i5 3550 3.3GHz 16GB DDR3 12 x 2TB SATA Initially there was a VRack setup between the servers but we've upgraded one of the storage servers to have more RAM and it wasn't inside the VRack. They connect through a shared 10Gbps connection between them. Please note that it is the same connection that is used for public access. They use a single IP (using IP Failover) to connect between them and to allow for a graceful failover. NFS is therefore over a public connection and not under any private network (it was before the upgrade, were the problem still existed). The firewall was configured and tested thoroughly but I disabled it for a while to see if the problem still occurred, and it did. From my knowledge the hosting provider isn't blocking or limiting the connection between either the servers and the public domain (at least under a given bandwidth consumption threshold that hasn't been reached yet). Hope this helps figuring out the problem. EDIT 2: Relevant software versions: CentOS 2.6.32-279.9.1.el6.x86_64 nfs-utils-1.2.3-26.el6.x86_64 nfs-utils-lib-1.1.5-4.el6.x86_64 gfs2-utils-3.0.12.1-32.el6_3.1.x86_64 kmod-drbd84-8.4.2-1.el6_3.elrepo.x86_64 drbd84-utils-8.4.2-1.el6.elrepo.x86_64 DRBD configuration on storage servers: #/etc/drbd.d/storage.res resource storage { protocol C; on <server1 fqdn> { device /dev/drbd0; disk /dev/vg_storage/LV_replicated; address <server1 ip>:7788; meta-disk internal; } on <server2 fqdn> { device /dev/drbd0; disk /dev/vg_storage/LV_replicated; address <server2 ip>:7788; meta-disk internal; } } NFS Configuration in storage servers: #/etc/sysconfig/nfs RPCNFSDCOUNT=32 STATD_PORT=10002 STATD_OUTGOING_PORT=10003 MOUNTD_PORT=10004 RQUOTAD_PORT=10005 LOCKD_UDPPORT=30001 LOCKD_TCPPORT=30001 (can there be any conflict in using the same port for both LOCKD_UDPPORT and LOCKD_TCPPORT?) GFS2 configuration: # gfs2_tool gettune <mountpoint> incore_log_blocks = 1024 log_flush_secs = 60 quota_warn_period = 10 quota_quantum = 60 max_readahead = 262144 complain_secs = 10 statfs_slow = 0 quota_simul_sync = 64 statfs_quantum = 30 quota_scale = 1.0000 (1, 1) new_files_jdata = 0 Storage network environment: eth0 Link encap:Ethernet HWaddr <mac address> inet addr:<ip address> Bcast:<bcast address> Mask:<ip mask> inet6 addr: <ip address> Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:957025127 errors:0 dropped:0 overruns:0 frame:0 TX packets:1473338731 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2630984979622 (2.3 TiB) TX bytes:1648430431523 (1.4 TiB) eth0:0 Link encap:Ethernet HWaddr <mac address> inet addr:<ip failover address> Bcast:<bcast address> Mask:<ip mask> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 The IP addresses are statically assigned with the given network configurations: DEVICE="eth0" BOOTPROTO="static" HWADDR=<mac address> ONBOOT="yes" TYPE="Ethernet" IPADDR=<ip address> NETMASK=<net mask> and DEVICE="eth0:0" BOOTPROTO="static" HWADDR=<mac address> IPADDR=<ip failover> NETMASK=<net mask> ONBOOT="yes" BROADCAST=<bcast address> Hosts file to allow for a graceful NFS failover in conjunction with NFS option fsid=25 set on both storage servers: #/etc/hosts <storage ip failover address> active.storage.vlan <webserver ip failover address> active.service.vlan As you can see, packet errors are down to 0. I've also ran ping for a long time without any packet loss. MTU size is the normal 1500. As there is no VLan by now, this is the MTU used to communicate between servers. The webservers' network environment is similar. One thing I forgot to mention is that the storage servers handle ~200GB of new files each day through the NFS connection, which is a key point for me to think this is some kind of heavy load problem with either NFS or GFS2. If you need further configuration details please tell me. EDIT 3: Earlier today we had a major filesystem crash on the storage server. I couldn't get the details of the crash right away because the server stop responding. After the reboot, I noticed the filesystem was extremely slow, and I was not being able to serve a single file through either NFS or httpd, perhaps due to cache warming or so. Nevertheless, I've been monitoring the server closely and the following error came up in dmesg. The source of the problem is clearly GFS, which is waiting for a lock and ends up starving after a while. INFO: task nfsd:3029 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000000000000000 0 3029 2 0x00000080 ffff8803814f79e0 0000000000000046 0000000000000000 ffffffff8109213f ffff880434c5e148 ffff880624508d88 ffff8803814f7960 ffffffffa037253f ffff8803815c1098 ffff8803814f7fd8 000000000000fb88 ffff8803815c1098 Call Trace: [<ffffffff8109213f>] ? wake_up_bit+0x2f/0x40 [<ffffffffa037253f>] ? gfs2_holder_wake+0x1f/0x30 [gfs2] [<ffffffff814ff42e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ff2cb>] mutex_lock+0x2b/0x50 [<ffffffffa0379f21>] gfs2_log_reserve+0x51/0x190 [gfs2] [<ffffffffa0390da2>] gfs2_trans_begin+0x112/0x1d0 [gfs2] [<ffffffffa0369b05>] ? gfs2_dir_check+0x35/0xe0 [gfs2] [<ffffffffa0377943>] gfs2_createi+0x1a3/0xaa0 [gfs2] [<ffffffff8121aab1>] ? avc_has_perm+0x71/0x90 [<ffffffffa0383d1e>] gfs2_create+0x7e/0x1a0 [gfs2] [<ffffffffa037783f>] ? gfs2_createi+0x9f/0xaa0 [gfs2] [<ffffffff81188cf4>] vfs_create+0xb4/0xe0 [<ffffffffa04217d6>] nfsd_create_v3+0x366/0x4c0 [nfsd] [<ffffffffa0429703>] nfsd3_proc_create+0x123/0x1b0 [nfsd] [<ffffffffa041a43e>] nfsd_dispatch+0xfe/0x240 [nfsd] [<ffffffffa025a5d4>] svc_process_common+0x344/0x640 [sunrpc] [<ffffffff810602a0>] ? default_wake_function+0x0/0x20 [<ffffffffa025ac10>] svc_process+0x110/0x160 [sunrpc] [<ffffffffa041ab62>] nfsd+0xc2/0x160 [nfsd] [<ffffffffa041aaa0>] ? nfsd+0x0/0x160 [nfsd] [<ffffffff81091de6>] kthread+0x96/0xa0 [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffff81091d50>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Read the article
where to put core services in two-node cluster

- by Veniamin

I'm currently configuring two-node HA cluster based on CentOS with DRBD. Most services are packed in virtual machines with migration available. I have not made decision where to put some core services as: dhcp, ldap, dns - which are critical for all network infrastructure. There are two possibilities: Configure them as redundant HA services on cluster hosts. Pack them all into dedicated virtual machine. What is the best practice?

Read the article
Is it safe to run two instances of svnserve on one repository, or only one?

- by fredden

We've two nodes running heartbeat/drbd, and one of the services we're using is subversion. What I want to know is: is it safe to run svnserve on both nodes all the time, or should it only run on the active node? Does svnserve use file-level locking, or is it all in memory? What are the implications of running svnserve without its repositories accessible? Please let me know if this isn't clear, and I'll try my best to rephrase/clarify. :)

Read the article
Ensuring ethernet is configured before continuing init scripts.

- by Pete Ashdown

Is there a better way to ensure that an ethernet port is configured before continuing through startup init scripts? When 802.3ad bonded ethernet is configured on Ubuntu, it takes some time before it finishes protocol negotiation and starts passing packets, because the networking script just configures, but does not verify that traffic is being passed. As a result, this can throw off some of the other network dependent scripts, like the init for drbd. Right now, I just have a loop that pings the gateway in a startup script, but this seems less than optimal: GATEWAYIP=10.0.0.1 while ( ! ping -c 1 $GATEWAYIP ); do echo gateway not up done

Read the article
NFS caching on Ubuntu

- by stream

We run a bunch of ubuntu servers (mostly 8.04 LTS) which all mount an nfs share at /nfs. We use the nfs primarily for two purposes: symlinking config files (such as apache vhosts) reading & writing uploaded files This all works great except it makes us fully dependent on the central NFS server (which is a DRBD cluster with heartbeat failover from primary to secondary, but we've still seen issues). What we'd like is if we could mount the NFS through some local caching layer which would make any file which had previously been read remain available even if /nfs isn't. Writes could be disabled for this period. Searching around it looks like cachefilesd may be an option. Unfortunately, it's only packaged for ubuntu 9.10 & 10.04 it looks like. I was also looking for a FUSE-based solution which might fit the bill, but hadn't found anything yet. Any suggestions would be greatly appreciated!

Read the article
Best Solution for Load Balancing NFS File Access?

- by DairyKnight

I'm trying to find an optimum solution for accessing the NFS file share in my company. We have a central file server in North America and has 30GB~50GB of updated data everyday. And it's very slow for our Europe and Asia branches to access directly. Therefore, I'm trying to setup two replicate servers in those continents. I'm currently using rsync, but wonder if there exists a better solution acts more like a distributed RAID, which allows the user to transparently access the file whether synced or not. And user request will be dispatched to remote server if the file is not yet synced. I'm now looking into DRBD, but it seems not to have the functionality of auto-dispatching requests. Does anyone know if there's a better solution?

Read the article
Best Solution for Load Balancing geographically distributed NFS File Access?

- by DairyKnight

I'm trying to find an optimum solution for accessing the NFS file share in my company. We have a central file server in North America and has 30GB~50GB of updated data everyday. And it's very slow for our Europe and Asia branches to access directly. Therefore, I'm trying to setup two replicate servers in those continents. I'm currently using rsync, but wonder if there exists a better solution acts more like a distributed RAID, which allows the user to transparently access the file whether synced or not. And user request will be dispatched to remote server if the file is not yet synced. I'm now looking into DRBD, but it seems not to have the functionality of auto-dispatching requests. Does anyone know if there's a better solution?

Read the article
(simple) linux HA with vmware vsphere?

- by derhelge

I hope my upcoming question is specific enough, and you are able and willing to support :-) We have several openSUSE VMs in an ESX-Cluster (three ESX-Servers) with an attached iSCSI-SAN. All of those Linux VMs are "single point of failure"-configured, which means in the case of a Web-Server: LAMP, storage, etc. everything on this machine. This was very simple and in case of a failure (in the last years: kernel panics or apache crashes) a simple reboot triggered by a script did it. But the problem is: How to upgrade/maintain the w(eb-)application or the underlying OS without downtime? This wasn't really managable and i did this in the early morning ;) How can i achieve a "simple" High-Availability Cluster now? I thought of: DRBD with heartbeat with 2 VMs. And for the storage a RDM (raw device mapped) LUN and change the read-write-permissions for both VMs. Is this a good idea? Anyone has a better solution?

Read the article
Can I run mysqld on top of glusterfs?

- by Richard Holloway

I have been playing with glusterfs recently. What I want to try is to run mysqld on top of the glusterfs in a similar way as it is possible to run MySQL on top of DRBD. I am familiar with MySQL replication and the advantages of using that instead of this approach and I am also aware of MongoDB and other NoSQL solutions. However, it would be an easy solution to a few specific projects I have coming up if I could leave MySQL as it is and replicate the underlying file system. Is this possible and if it is where can I find out how?

Read the article
RAID5 on SmartArray P410i online resize

- by datacompboy

I have P410i+256M Cache without battery backup. My RAID5 was build over 3*136GB disks, now all disks were replaced to a 3*300GB array. How can I extend it to use the whole space? HPacucli doesn't allows that, I think this might be because no battery is present. I have a redundant power supply. All data is mirrored over DRBD to a secondary server, so I can try to resize with a chance of loss of data in case of power failure, but I prefer to have an online resize.

Read the article
Solaris to Linux conversion: Use VxFS or GFS?

- by w00t

We're a Solaris shop looking at RedHat Enterprise Linux and one of the things we're wondering is if we should keep Veritas Volume Manager + FileSystem or go with LVM+ext3 or RedHat's preferred cluster filesystem solution, GFS. One of the things we like about Veritas is that it can use Veritas Volume Replicator to have a remote copy of important filesystems. This functionality seems to be missing from RedHat, DRBD doesn't seem to be packaged in RHEL... So my questions are: Does anybody use VxFS/VxVM/VVR on Linux? Thoughts, experiences? Comparison with LVM+ext3? Anybody using GFS? Thoughts, experiences? Do you do remote replication for disaster recovery, and if so, how? Is there a standard RedHat way?

Read the article
distributed, fault-tolerant network block device

- by gucki

I'm looking for a distributed, fault-tolerant network storage system which exposes block devices (not filesystems) on the clients. A client's block device should write simultaneously to several storage nodes A client's block device should not fail as long as not all storage nodes backing it went down The master should automatically redistribute storages' data when a storage node fails or gets added/ removed A single master (which is for metadata only) is fine So ideally the architecture would be very similar to moosefs (http://www.moosefs.org/) but instead of exposing a real filesystem mounted using a fuse client it'd expose block devices on the clients. I know of iscsi and drbd but both don't seem to offer what I'm looking for. Or am I missing something?

Read the article
When NOT to use virtualisation? [closed]

- by Nils

When virtualisation was new, we tried to virtualized everything. Then came the cases where the virtual machine was very much slower than a physical one. It boils down to the following ruleset (with us) when not to virtualize: Network-io-intesive applications (i.e. with many interrupts/packets) Disk-io-intensive (if not on SAN storage) RAM-intensive (this is the most precious resource) Now this is true for a combination of XEN using local DRBD storage. The same seems to be true for Hyper-V using DAS. I wonder - is it true for all combinations - and what are your limits on these combinations?

Read the article
What's New in 5.6 RC and more from MySQL Connect conference

- by Rob Young

Keeping with the tradition of great MySQL Community events, the first annual MySQL Connect conference is now in the books. It was great to see so many familiar faces in the crowd and at the podium sharing their ideas and thoughts on the evolution of MySQL under Oracle. The headliner of the conference was Tomas' keynote announcement of the fully featured and fully enabled MySQL 5.6 Release Candidate. This new article on the MySQL DevZone summarizes all of the great new features ready for Community adoption, all MySQL Engineering blogs and where and how to download all of the bits. As always, early adoption and feedback on the 5.6 RC is appreciated and the sooner we get your feedback the sooner we release the "ready for production" sanctioned GA product. Also available now, Cluster 7.3 provides support for Foreign Keys, node.js NoSQL access to underlying data and a new Auto Installer that helps you quickly and easily get up and running with Cluster 7.2 and 7.3. The 7.3 downloads are provided in the first 7.3 Development Milestone Release (under "Development Releases" tab) and via the MySQL Labs. Oracle also announced key new additions to MySQL Enterprise Edition: New policy-based compliance Auditing. MySQL Enterprise Edition Audit adds policy-based auditing compliance to existing MySQL applications without the need to change any code. This new plugin is available for MySQL 5.5.28 and higher; existing MySQL Enterprise Edition customers can download the upgrade from the My Oracle Support portal and all can download for evaluation from Oracle's Software Delivery Cloud. New MySQL Enterprise High Available additions provide even more options for ensuring MySQL applications remain available and running a their peak: Oracle Linux + DRBD Oracle Solaris Clustering for MySQL All in all, the first MySQL Connect conference was a great success and with refinements planned in response to attendee, sponsor and speaker feedback we expect it to grow and improve going forward. As always, thanks for your continued support of MySQL!

Read the article
Unexpected start of already-primary server processes when heartbeat on secondary is stopped.

- by vorik

Hi, I've got an active-passive Heartbeat cluster with Apache, MySQL, ActiveMQ and DRBD. Today, I wanted to perform hardware-maintenance on the secondary node (node04), so I stopped the heartbeat service before shutting it down. Then, the primary node (node03) received a shutdown notice from the secondary node (node04). This logging comes from the primary node: node03 heartbeat[4458]: 2010/03/08_08:52:56 info: Received shutdown notice from 'node04.companydomain.nl'. heartbeat[4458]: 2010/03/08_08:52:56 info: Resources being acquired from node04.companydomain.nl. harc[27522]: 2010/03/08_08:52:56 info: Running /etc/ha.d/rc.d/status status heartbeat[27523]: 2010/03/08_08:52:56 info: Local Resource acquisition completed. mach_down[27567]: 2010/03/08_08:52:56 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[27567]: 2010/03/08_08:52:56 info: mach_down takeover complete for node node04.companydomain.nl. heartbeat[4458]: 2010/03/08_08:52:56 info: mach_down takeover complete. harc[27620]: 2010/03/08_08:52:56 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[27620]: 2010/03/08_08:52:56 received ip-request-resp drbddisk OK yes ResourceManager[27645]: 2010/03/08_08:52:56 info: Acquiring resource group: node03.companydomain.nl drbddisk Filesystem::/dev/drbd0::/data::ext3 mysql apache::/etc/httpd/conf/httpd.conf LVSSyncDaemonSwap::master monitor activemq tivoli-cluster MailTo::[email protected]::DRBDFailureDrisAcc MailTo::[email protected]::DRBDFailureDrisAcc 1.2.3.212 ResourceManager[27645]: 2010/03/08_08:52:56 info: Running /etc/ha.d/resource.d/drbddisk start Filesystem[27700]: 2010/03/08_08:52:57 INFO: Running OK ResourceManager[27645]: 2010/03/08_08:52:57 info: Running /etc/ha.d/resource.d/mysql start mysql[27783]: 2010/03/08_08:52:57 Starting MySQL[ OK ] apache[27853]: 2010/03/08_08:52:57 INFO: Running OK ResourceManager[27645]: 2010/03/08_08:52:57 info: Running /etc/ha.d/resource.d/monitor start monitor[28160]: 2010/03/08_08:52:58 ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/activemq start activemq[28210]: 2010/03/08_08:52:58 Starting ActiveMQ Broker... ActiveMQ Broker is already running. ResourceManager[27645]: 2010/03/08_08:52:58 ERROR: Return code 1 from /etc/ha.d/resource.d/activemq ResourceManager[27645]: 2010/03/08_08:52:58 CRIT: Giving up resources due to failure of activemq ResourceManager[27645]: 2010/03/08_08:52:58 info: Releasing resource group: node03.companydomain.nl drbddisk Filesystem::/dev/drbd0::/data::ext3 mysql apache::/etc/httpd/conf/httpd.conf LVSSyncDaemonSwap::master monitor activemq tivoli-cluster MailTo::[email protected]::DRBDFailureDrisAcc MailTo::[email protected]::DRBDFailureDrisAcc 1.2.3.212 ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/IPaddr 1.2.3.212 stop IPaddr[28329]: 2010/03/08_08:52:58 INFO: ifconfig eth0:0 down IPaddr[28312]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/MailTo [email protected] DRBDFailureDrisAcc stop MailTo[28378]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/MailTo [email protected] DRBDFailureDrisAcc stop MailTo[28433]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/tivoli-cluster stop ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/activemq stop activemq[28503]: 2010/03/08_08:53:01 Stopping ActiveMQ Broker... Stopped ActiveMQ Broker. ResourceManager[27645]: 2010/03/08_08:53:01 info: Running /etc/ha.d/resource.d/monitor stop monitor[28681]: 2010/03/08_08:53:01 ResourceManager[27645]: 2010/03/08_08:53:01 info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncmaster down LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncbackup up LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncmaster released ResourceManager[27645]: 2010/03/08_08:53:02 info: Running /etc/ha.d/resource.d/apache /etc/httpd/conf/httpd.conf stop apache[28782]: 2010/03/08_08:53:03 INFO: Killing apache PID 18390 apache[28782]: 2010/03/08_08:53:03 INFO: apache stopped. apache[28771]: 2010/03/08_08:53:03 INFO: Success ResourceManager[27645]: 2010/03/08_08:53:03 info: Running /etc/ha.d/resource.d/mysql stop mysql[28851]: 2010/03/08_08:53:24 Shutting down MySQL.....................[ OK ] ResourceManager[27645]: 2010/03/08_08:53:24 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop Filesystem[29010]: 2010/03/08_08:53:25 INFO: Running stop for /dev/drbd0 on /data Filesystem[29010]: 2010/03/08_08:53:25 INFO: Trying to unmount /data Filesystem[29010]: 2010/03/08_08:53:25 ERROR: Couldn't unmount /data; trying cleanup with SIGTERM Filesystem[29010]: 2010/03/08_08:53:25 INFO: Some processes on /data were signalled Filesystem[29010]: 2010/03/08_08:53:27 INFO: unmounted /data successfully Filesystem[28999]: 2010/03/08_08:53:27 INFO: Success ResourceManager[27645]: 2010/03/08_08:53:27 info: Running /etc/ha.d/resource.d/drbddisk stop heartbeat[4458]: 2010/03/08_08:53:29 WARN: node node04.companydomain.nl: is dead heartbeat[4458]: 2010/03/08_08:53:29 info: Dead node node04.companydomain.nl gave up resources. heartbeat[4458]: 2010/03/08_08:53:29 info: Link node04.companydomain.nl:eth0 dead. heartbeat[4458]: 2010/03/08_08:53:29 info: Link node04.companydomain.nl:eth1 dead. hb_standby[29193]: 2010/03/08_08:53:57 Going standby [foreign]. heartbeat[4458]: 2010/03/08_08:53:57 info: node03.companydomain.nl wants to go standby [foreign] Soo... What just happened here??? Heartbeat on node04 stopped and told node03, which was the active node at the time. Somehow, node03 decided to start the cluster processes that were already running. (For the processes that are not critical, I always return a 0 from the startupscript so it does not stops the entire cluster when a non-essential part fails.) When starting ActiveMQ, it returns status 1 because it is already running. This fails the node and shuts everything down. As heartbeat is not running on the secondary node, it cannot failover to there. When I tried to run ha_takeover to restart the resources, absolutely nothing happened. Only after I restarted heartbeat on the primary node the resources could be started (after a delay of 2 minutes). These are my questions: Why does heartbeat on the primary node try to start the cluster processes again? Why did ha_takeover not work? What can I do to prevent this from happening? Server configuration: DRBD: version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by [email protected], 2010-01-20 09:14:48 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate B r---- ns:0 nr:6459432 dw:6459432 dr:0 al:0 bm:301 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 uname -a Linux node04 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux haresources node03.companydomain.nl \ drbddisk \ Filesystem::/dev/drbd0::/data::ext3 \ mysql \ apache::/etc/httpd/conf/httpd.conf \ LVSSyncDaemonSwap::master \ monitor \ activemq \ tivoli-cluster \ MailTo::[email protected]::DRBDFailureDrisAcc \ MailTo::[email protected]::DRBDFailureDrisAcc \ 1.2.3.212 ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 500ms deadtime 30 warntime 10 initdead 120 udpport 694 mcast eth0 225.0.0.3 694 1 0 mcast eth1 225.0.0.4 694 1 0 auto_failback off node node03.companydomain.nl node node04.companydomain.nl respawn hacluster /usr/lib64/heartbeat/dopd apiauth dopd gid=haclient uid=hacluster Thank you very much in advance, Ger Apeldoorn

Read the article
Heartbeat won't successfully start up resources from a cold boot when a failed node is present

- by Matthew

I currently have two ubuntu servers running Heartbeat and DRBD. The servers are directory connected with a 1000Mbps crossover cable on eth1 and have access to an IP camera LAN on eth0. Now, let's say that one node is down and the remaining functional node is booting after having been shut down. The node that is still functioning won't start up heartbeat and provide access to the drbd resource from a cold boot. I have to manually restart heartbeat by sudo service heartbeat restart to get everything up and running. How can I get it to start fine from a cold start, when only one server is present? Here is the ha.cf: debug /var/log/ha-debug logfile /var/log/ha-log logfacility none keepalive 2 deadtime 10 warntime 7 initdead 60 ucast eth1 192.168.2.2 ucast eth0 10.1.10.201 node EMserver1 node EMserver2 respawn hacluster /usr/lib/heartbeat/ipfail ping 10.1.10.22 10.1.10.21 10.1.10.11 auto_failback off Some material from the syslog: harc[4604]: 2012/11/27_13:54:49 info: Running /etc/ha.d//rc.d/status status mach_down[4632]: 2012/11/27_13:54:49 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[4632]: 2012/11/27_13:54:49 info: mach_down takeover complete for node emserver2. Nov 27 13:54:49 EMserver1 heartbeat: [4586]: info: Initial resource acquisition complete (T_RESOURCES(us)) Nov 27 13:54:49 EMserver1 heartbeat: [4586]: info: mach_down takeover complete. IPaddr[4679]: 2012/11/27_13:54:49 INFO: Resource is stopped Nov 27 13:54:49 EMserver1 heartbeat: [4605]: info: Local Resource acquisition completed. harc[4713]: 2012/11/27_13:54:49 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp ip-request-resp[4713]: 2012/11/27_13:54:49 received ip-request-resp IPaddr::10.1.10.254 OK yes ResourceManager[4732]: 2012/11/27_13:54:50 info: Acquiring resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server IPaddr[4759]: 2012/11/27_13:54:50 INFO: Resource is stopped ResourceManager[4732]: 2012/11/27_13:54:50 info: Running /etc/ha.d/resource.d/IPaddr 10.1.10.254 start IPaddr[4816]: 2012/11/27_13:54:50 INFO: Using calculated nic for 10.1.10.254: eth0 IPaddr[4816]: 2012/11/27_13:54:50 INFO: Using calculated netmask for 10.1.10.254: 255.255.255.0 IPaddr[4816]: 2012/11/27_13:54:50 INFO: eval ifconfig eth0:0 10.1.10.254 netmask 255.255.255.0 broadcast 10.1.10.255 IPaddr[4804]: 2012/11/27_13:54:50 INFO: Success ResourceManager[4732]: 2012/11/27_13:54:50 info: Running /etc/ha.d/resource.d/drbddisk r0 start Filesystem[4965]: 2012/11/27_13:54:50 INFO: Resource is stopped ResourceManager[4732]: 2012/11/27_13:54:50 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /shr ext4 start Filesystem[5039]: 2012/11/27_13:54:50 INFO: Running start for /dev/drbd1 on /shr Filesystem[5033]: 2012/11/27_13:54:51 INFO: Success ResourceManager[4732]: 2012/11/27_13:54:51 info: Running /etc/init.d/nfs-kernel-server start Nov 27 13:55:00 EMserver1 heartbeat: [4586]: info: Local Resource acquisition completed. (none) Nov 27 13:55:00 EMserver1 heartbeat: [4586]: info: local resource transition completed. Nov 27 13:57:46 EMserver1 heartbeat: [4586]: info: Heartbeat shutdown in progress. (4586) Nov 27 13:57:46 EMserver1 heartbeat: [5286]: info: Giving up all HA resources. ResourceManager[5301]: 2012/11/27_13:57:46 info: Releasing resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server ResourceManager[5301]: 2012/11/27_13:57:46 info: Running /etc/init.d/nfs-kernel-server stop ResourceManager[5301]: 2012/11/27_13:57:46 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /shr ext4 stop Filesystem[5372]: 2012/11/27_13:57:46 INFO: Running stop for /dev/drbd1 on /shr Filesystem[5372]: 2012/11/27_13:57:47 INFO: Trying to unmount /shr Filesystem[5372]: 2012/11/27_13:57:47 INFO: unmounted /shr successfully Filesystem[5366]: 2012/11/27_13:57:47 INFO: Success ResourceManager[5301]: 2012/11/27_13:57:47 info: Running /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager[5301]: 2012/11/27_13:57:47 info: Running /etc/ha.d/resource.d/IPaddr 10.1.10.254 stop IPaddr[5509]: 2012/11/27_13:57:47 INFO: ifconfig eth0:0 down IPaddr[5497]: 2012/11/27_13:57:47 INFO: Success Nov 27 13:57:47 EMserver1 heartbeat: [5286]: info: All HA resources relinquished. Nov 27 13:57:48 EMserver1 heartbeat: [4586]: info: killing /usr/lib/heartbeat/ipfail process group 4603 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBFIFO process 4589 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4590 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4591 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4592 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4593 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4594 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4595 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4596 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4597 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4598 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4599 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4589 exited. 11 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4596 exited. 10 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4598 exited. 9 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4590 exited. 8 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4595 exited. 7 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4591 exited. 6 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4592 exited. 5 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4593 exited. 4 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4597 exited. 3 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4594 exited. 2 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4599 exited. 1 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: emserver1 Heartbeat shutdown complete. Here is some more from the log ResourceManager[2576]: 2012/11/28_16:32:42 info: Acquiring resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server IPaddr[2602]: 2012/11/28_16:32:42 INFO: Running OK Filesystem[2653]: 2012/11/28_16:32:43 INFO: Running OK Nov 28 16:32:52 EMserver1 heartbeat: [1695]: WARN: node emserver2: is dead Nov 28 16:32:52 EMserver1 heartbeat: [1695]: info: Dead node emserver2 gave up resources. Nov 28 16:32:52 EMserver1 ipfail: [1807]: info: Status update: Node emserver2 now has status dead Nov 28 16:32:52 EMserver1 heartbeat: [1695]: info: Link emserver2:eth1 dead. Nov 28 16:32:53 EMserver1 ipfail: [1807]: info: NS: We are still alive! Nov 28 16:32:53 EMserver1 ipfail: [1807]: info: Link Status update: Link emserver2/eth1 now has status dead Nov 28 16:32:55 EMserver1 ipfail: [1807]: info: Asking other side for ping node count. Nov 28 16:32:55 EMserver1 ipfail: [1807]: info: Checking remote count of ping nodes. Nov 28 16:32:57 EMserver1 heartbeat: [1695]: info: Heartbeat shutdown in progress. (1695) Nov 28 16:32:57 EMserver1 heartbeat: [2734]: info: Giving up all HA resources. ResourceManager[2751]: 2012/11/28_16:32:57 info: Releasing resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server ResourceManager[2751]: 2012/11/28_16:32:57 info: Running /etc/init.d/nfs-kernel-server stop ResourceManager[2751]: 2012/11/28_16:32:57 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /shr ext4 stop Filesystem[2829]: 2012/11/28_16:32:57 INFO: Running stop for /dev/drbd1 on /shr Filesystem[2829]: 2012/11/28_16:32:57 INFO: Trying to unmount /shr Filesystem[2829]: 2012/11/28_16:32:58 INFO: unmounted /shr successfully Filesystem[2823]: 2012/11/28_16:32:58 INFO: Success ResourceManager[2751]: 2012/11/28_16:32:58 info: Running /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager[2751]: 2012/11/28_16:32:58 info: Running /etc/ha.d/resource.d/IPaddr 10.1.10.254 stop IPaddr[2971]: 2012/11/28_16:32:58 INFO: ifconfig eth0:0 down IPaddr[2958]: 2012/11/28_16:32:58 INFO: Success Nov 28 16:32:58 EMserver1 heartbeat: [2734]: info: All HA resources relinquished. Nov 28 16:32:59 EMserver1 heartbeat: [1695]: info: killing /usr/lib/heartbeat/ipfail process group 1807 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBFIFO process 1777 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1778 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1779 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1780 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1781 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1782 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1783 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1784 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1785 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1786 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1787 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1778 exited. 11 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1779 exited. 10 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1780 exited. 9 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1781 exited. 8 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1782 exited. 7 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1783 exited. 6 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1784 exited. 5 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1785 exited. 4 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1786 exited. 3 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1787 exited. 2 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1777 exited. 1 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: emserver1 Heartbeat shutdown complete. If I restarted heartbeat at this point... the resources heartbeat controls would start up fine.... please help!

Read the article
Cheapest iSCSI SAN for Windows 2008/SQL Server clustering?

- by MichaelGG

Are there any production-quality iSCSI SANs suitable for use with Windows Server 2008/SQL Server for failover clustering? So far, I've only seen Dell's MD3000i, and HP's MSA 2000 (2012i), which both are around $6K with a minimal disk configuration. Buffalo (yea, I know), has a $1000 device with iSCSI support, but they say it will not work for 2008 failover clustering. I'm interested in seeing something suitable for failover in a production environment, but with very low IO requirements. (Clustering, say, a 30GB DB.) As for using software: On Windows, StarWind seems to have a great solution. But it's actually more money than buying a hardware SAN. (As I understand, only the enterprise edition supports having replicas, and that's $3000 a license.) I was thinking I could use Linux, something like DRBD + an iSCSI target would be fine. However, I haven't seen any free or low-cost iSCSI software that supports SCSI-3 persistent reservations, which Windows 2008 needs for failover clustering. I know $6K isn't much at all, just curious to see if there are practical cheaper solutions out there. And finally, yes, the software is expensive, but many small business get MS BizSpark, so the Windows 2008 Enterprise / SQL 2008 licenses are completely free.

Read the article
samba "username map" stopped to work

- by Kris_R

It was time to upgrade our group server (new HDs, problems with old installation of DRBD, etc..). Going as usually for CentOS i upgraded whole system from 6.3 to 6.4 The later one came with samba 3.6 as the old one was 3.5. I transferred most of users by copying /etc/password, /etc/shadow and samba accounts with pdbedit. Homes were on nfs-drive. The translation of unix accounts to samba accounts are located in /etc/samba/smbusers. Strangely enough on some windows clients there was problem to connect to samba-shares. In one case the only thing that worked was, instead of giving windows name, to use the unix account. In another one, it was possible to mount network drive and to open it in Windows Explorer, however other applications like "Total commander" at the attempt of opening this drive gave the message "Cannot connect to z:" (sometimes at this moment user/pass were requested). The smb.conf has following entries: [global] security = user passdb backend = tdbsam username map = /etc/samba/smbusers ... [Kris] comment = Kris's Private path = /SMB/Users/Kris writeable = yes read only = no browseable = yes users = krisr printable = no security mask = 0777 force security mode = 0 directory security mask = 0777 force directory security mode = 0 force create mode = 0775 force directory mode = 6775 The smbusers: # Unix_name = SMB_name1 SMB_name2 ... krisr = Kris Of course testparm runs without any errors. I was used from samba 3.5 to outputs of form Mapped user Kris to krisr. Nothing like this happens now. Just message check_sam_security: Couldn't find user Kris in passdb. I read on web that some guys had problem with 3.6 and security = ADS, but these were not helpful for me. I'm seriously thinking about downgrading back to samba 3.5 but before this step I wanted to ask if somebody knows the solution of these problems. p.s. i've asked this question at serverfault but no answer came. Maybe I have more luck with this forum. Sorry for duplicate if any of you reads both.

Read the article
samba "username map" stopped to work after upgrade to 3.6

- by Kris_R

It was time to upgrade our group server (new HDs, problems with old installation of DRBD, etc..). Going as usually for CentOS i upgraded whole system from 6.3 to 6.4 The later one came with samba 3.6 as the old one was 3.5. I transferred most of users by copying /etc/password, /etc/shadow and samba accounts with pdbedit. Homes were on nfs-drive. The translation of unix accounts to samba accounts are located in /etc/samba/smbusers. Strangely enough on some windows clients there was problem to connect to samba-shares. In one case the only thing that worked was, instead of giving windows name, to use the unix account. In another one, it was possible to mount network drive and to open it in Windows Explorer, however other applications like "Total commander" at the attempt of opening this drive gave the message "Cannot connect to z:" (sometimes at this moment user/pass were requested). The smb.conf has following entries: [global] security = user passdb backend = tdbsam username map = /etc/samba/smbusers ... [Kris] comment = Kris's Private path = /SMB/Users/Kris writeable = yes read only = no browseable = yes users = krisr printable = no security mask = 0777 force security mode = 0 directory security mask = 0777 force directory security mode = 0 force create mode = 0775 force directory mode = 6775 The smbusers: # Unix_name = SMB_name1 SMB_name2 ... krisr = Kris Of course testparm runs without any errors. I was used from samba 3.5 to outputs of form Mapped user kris to krisr. Nothing like this happens now. Just message check_sam_security: Couldn't find user Kris in passdb. I read on web that some guys had problem with 3.6 and security = ADS, but these were not helpful for me. I'm seriously thinking about downgrading back to samba 3.5 but before this step I wanted to ask if somebody knows the solution of these problems.

Read the article
Virtual machines with failover setup

- by kimmmo

We have three servers and our plan is to run a number of virtual machines on them in such manner, that if one of the nodes blow up, we can either quickly or seamlessly get a spare running on another node. In addition to the normal networking, they're interconnected via dual 10Gbit NIC's, so networked raid/mirroring shouldn't be a problem. The guest VM's are mostly going to be running text mode linux, but of course it wouldn't hurt to be able to spin up a non-mission critical windows guest for running Visual Studio or checking IE compatibility of a web app. We've spent some time trying to get some magical cloud setup running using Stackops and Crowbar but it started to look like they were offering way too much and were too complicated for our needs. The next candidate, I think, is Ubuntu 11.04 server + KVM + Ganeti + Drbd, unless you can come up with a suggestion for a better solution that we have missed. Requirements: Installation should be simple or at least understandable without being in the dev team A browser interface for creating and managing VM's is a nice bonus Single node's hardware failure should cause minimal downtime for VM's that were running on that node Adding more nodes should be possible without shutting down the VM's.

Read the article
DAS vs SAN storage for serving 2 to 4 nodes

- by Luke404

We currently have 4 Linux nodes with local storage, arranged in two active/passive pairs with storage mirrored using DRBD, running virtual machines (actually using Xen Hypervisor) for typical hosting workloads (mail, web, a couple VPS, etc.). We're approaching the (presumed) maximum IOPS of those servers, and we're planning to migrate to an external storage solution with two active nodes, with capacity for up to four active nodes. Since we're an all-Dell shop I've done some research and found the MD3200 / MD3200i products should be the ones we're looking for. We are pretty sure we won't be attaching more than 4 hosts on a single storage and I'm wondering if there is any clear advantage for one or the other. In theory I should be able to attach 4 SAS hosts to a single MD3200 (single links on a single controller MD3200, or dual redundant SAS links from each host to a dual-controller MD3200), or 4 iSCSI hosts to a single MD3200i (directly on its 4 GigE ports without any switch, again with dual links for the dual controller option). Both setups should let us implement live VM migration since all hosts can access all the LUNs at the same time, and also some shared filesystem like GFS2 or OCFS2. Also, both setups should allow full redundancy of the whole system (assuming dual controllers in the storage). One difference I can see is that the DAS solution is actually limited to 4 hosts while the iSCSI one should be able to grow to more hosts (adding two GigE switches to the mix). One point for the iSCSI solution is that it would allow us to start out with our current nodes and upgrade them at a later time (we can't add other SAS controllers, but they already have 4 GigE ports each). With the right (iSCSI|SAS) controllers I should be able to connect diskless nodes and boot them off the external storage which I think is a good thing (get rid of any local storage). On the other hand, I would have thought the SAS one to be cheaper but it seems like an MD3200 actually costs a little less than an MD3200i (?) (please note: I've used Dell gear in my examples since that's what we're looking for but I assume the same goes with other vendors) I would like to know if my assumptions above are correct, and if I'm missing any important difference between the two setups.

Read the article
How can I automatically synchronize a directory tree on multiple machines?

- by Blacklight Shining

I have two Mac laptops and a Debian server, each with a directory that I would like to keep in sync between the three. The solution should meet the following criteria (in rough order of importance): It must not use any third-party service (e.g. Dropbox, SugarSync, Google whatever). This does not include installing additional software (as long as it's free). It must not require me to use specific directories or change my way of storing things. (Dropbox does this IIRC) It must work in all directions (changes made on /any/ machine should be pushed to the others) All data sent must be encrypted (I have ssh keypairs set up already) It must work even when not all machines are available (changes should be pushed to a machine when it comes back online) It must work even when the /directories/ on some machines are not available (they may be stored on disk images which will not always be mounted) This can be solved for Macs by using launchd to automatically launch and kill (or in some way change the behavior of) whatever daemon is used for syncing when the images are mounted and unmounted. It must be immediate (using an event-based system, not a periodic one like cron) It must be flexible (if more machines are added, I should be able to incorporate them easily) I also have some preferences that I would like to be fulfilled, but do not have to be: It should notify me somehow if there are conflicts or other errors. It should recognize symbolic and hard links and create corresponding ones. It should allow me to create a list of exceptions (subdirectories which will not be synced at all). It should not require me to set up port forwarding or otherwise reconfigure a network. This can be solved by using an ssh tunnel with reverse port forwarding. If you have a solution that meets some, but not all of the criteria, please contribute it in the comments as it might be useful in some way, and it might be possible to meet some of the criteria separately. What I tried, and why it didn't work: rsync and lsyncd do not support bidirectional synchronization csync2 is designed for server clusters and does not appear to work with machines with dynamic IPs DRBD (suggested by amotzg) involves installing a kernel module and does not appear to work on systems running OS X

Read the article
Replicated filesystem and EC2 MySQL

- by El Yobo

I'm currently investigating migrating our infrastructure over to run on Amazon's EC2 and am trying to figure out the best way to set up a MySQL service. I'm leaning towards running our own MySQL instances, rather than going with Amazon's RDS, but am still considering the best approach for performance and cost on the instance itself. In order to have persistent data, the MySQL data needs to be on an EBS volume (with some form of striped RAID, e.g. RAID0 or RAID10) to improve persistence. However, EBS IO is limited by the network interface (gigabit, so a theoretical maximum of 128 MB/s), while the ephemeral volumes have no such problem. I did see a suggestion for running two MySQL servers on an instance, with a master running on the ephemeral disk (which we would also RAID) and a slave storing changes to an EBS volume, but this has some additional overhead and complexity (two servers). What I was imagining is using some form of replicated file system such that I could have a filesystem on top of a RAID0 of ephemeral volumes to maximise performance all changes from the above immediately replicated to another RAID1 volume backed by multiple EBS volumes to ensure no data loss The advantages of this would be best possible IO performance for the DB server; no network delay in IO decreased IO on EBS volumes (as all read IO will be done on the ephemeral volumes) so decreased cost good data security, as it's backed onto redundant EBS volumes However, I haven't seen an appropriate system to replicate all changes from one volume to the other; is there a filesystem, or any other approach, which will do this? The distributed file systems, e.g. GlusterFS, DRBD etc seem to focus on replicating disks between servers, can they be set up to do what I'm interested in here? I also haven't seen anything about other's taking this approach. Do I have a solution in need of a problem here (i.e. is performance good enough, so this whole idea is redundant)? Is there some flaw in the plan?

Read the article
recommendations for efficient offsite remote backup solution of vm's

- by senorsmile

I am looking for recommendations for backing up my current 6 vm's(and soon to grow to up to 20). Currently I am running a two node proxmox cluster(which is a debian base using kvm for virtualization with a custom web front end to administer). I have two nearly identical boxes with amd phenom II x4's and asus motherboards. Each has 4 500 GB sata2 hdd's, 1 for the os and other data for the proxmox install, and 3 using mdadm+drbd+lvm to share the 1.5 TB's of storage between the two machines. I mount lvm images to kvm for all of the virtual machines. I currently have the ability to do live transfer from one machine to the other, typically within seconds(it takes about 2 minutes on the largest vm running win2008 with m$ sql server). I am using proxmox's built-in vzdump utility to take snapshots of the vm's and store those on an external harddrive on the network. I then have jungledisk service (using rackspace) to sync the vzdump folder for remote offsite backup. This is all fine and dandy, but it's not very scalable. For one, the backups themselves can take up to a few hours every night. With jungledisk's block level incremental transfers, the sync only transfers a small portion of the data offsite, but that still takes at least a half an hour. The much better solution would of course be something that allows me to instantly take the difference of two time points (say what was written from 6am to 7am), zip it, then send that difference file to the backup server which would instantly transfer to the remote storage on rackspace. I have looked a little into zfs and it's ability to do send/receive. That coupled with a pipe of the data in bzip or something would seem perfect. However, it seems that implementing a nexenta server with zfs would essentially require at least one or two more dedicated storage servers to serve iSCSI block volumes (via zvol's???) to the proxmox servers. I would prefer to keep the setup as minimal as possible (i.e. NOT having separate storage servers) if at all possible. I have also briefly read about zumastor. It looks like it could also do what I want, but it appears to have halted development in 2008. So, zfs, zumastor or other?

Read the article

Search Results

Search found 78 results on 4 pages for 'drbd'.

Page 3/4 | < Previous Page | 1 2 3 4 | Next Page >

- by Damon

- by Tiago

- by Veniamin

- by fredden

- by Pete Ashdown

- by stream

- by DairyKnight

- by DairyKnight

- by derhelge

- by Richard Holloway

- by datacompboy

- by w00t

- by gucki

- by Nils

- by Rob Young

- by vorik

- by Matthew

- by MichaelGG

- by Kris_R

- by Kris_R

- by kimmmo

- by Luke404

- by Blacklight Shining

- by El Yobo

- by senorsmile

< Previous Page | 1 2 3 4 | Next Page >