Search Results

Search found 6 results on 1 pages for 'openais'.

Page 1/1 | 1

RHEL Cluster FAIL after changing time on system

- by Eugene S

I've encountered a strange issue. I had to change the time on my Linux RHEL cluster system. I've done it using the following command from the root user: date +%T -s "10:13:13" After doing this, some message appeared relating to <emerg> #1: Quorum Dissolved however I didn't capture it completely. In order to investigate the issue I looked at /var/log/messages and I've discovered the following: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering GATHER state from 0. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Creating commit token because I am the rep. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Storing new sequence id for ring 354 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering COMMIT state. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering RECOVERY state. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] position [0] member 192.168.1.49: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] previous ring seq 848 rep 192.168.1.49 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] aru 61 high delivered 61 received flag 1 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Did not need to originate any messages in recovery. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Sending initial ORF token Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] CLM CONFIGURATION CHANGE Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] New Configuration: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.49) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Left: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.51) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Joined: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CMAN ] quorum lost, blocking activity Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] CLM CONFIGURATION CHANGE Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] New Configuration: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.49) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Left: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Joined: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [SYNC ] This node is within the primary component and will provide service. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering OPERATIONAL state. Mar 22 16:40:42 hsmsc50sfe1a kernel: dlm: closing connection to node 2 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] got nodejoin message 192.168.1.49 Mar 22 16:40:42 hsmsc50sfe1a clurgmgrd[25809]: <emerg> #1: Quorum Dissolved Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CPG ] got joinlist message from node 1 Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Cluster is not quorate. Refusing connection. Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Error while processing connect: Connection refused Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Invalid descriptor specified (-21). Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Someone may be attempting something evil. Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Error while processing disconnect: Invalid request descriptor Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering GATHER state from 9. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Creating commit token because I am the rep. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Storing new sequence id for ring 358 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering COMMIT state. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering RECOVERY state. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] position [0] member 192.168.1.49: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] previous ring seq 852 rep 192.168.1.49 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] aru f high delivered f received flag 1 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] position [1] member 192.168.1.51: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] previous ring seq 852 rep 192.168.1.51 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] aru f high delivered f received flag 1 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Did not need to originate any messages in recovery. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] Sending initial ORF token Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] CLM CONFIGURATION CHANGE Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] New Configuration: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.49) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Left: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Joined: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] CLM CONFIGURATION CHANGE Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] New Configuration: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.49) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.51) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Left: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] Members Joined: Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] #011r(0) ip(192.168.1.51) Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [SYNC ] This node is within the primary component and will provide service. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [TOTEM] entering OPERATIONAL state. Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [MAIN ] Node chb_sfe2a not joined to cman because it has existing state Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] got nodejoin message 192.168.1.49 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CLM ] got nodejoin message 192.168.1.51 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CPG ] got joinlist message from node 1 Mar 22 16:40:42 hsmsc50sfe1a openais[25715]: [CPG ] got joinlist message from node 2 Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Cluster is not quorate. Refusing connection. Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Error while processing connect: Connection refused Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Invalid descriptor specified (-111). Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Someone may be attempting something evil. Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Error while processing get: Invalid request descriptor Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Invalid descriptor specified (-21). Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Someone may be attempting something evil. Mar 22 16:40:42 hsmsc50sfe1a ccsd[25705]: Error while processing disconnect: Invalid request descriptor How could this be related to the time change procedure I performed?

Read the article
SBD killing both cluster nodes when there are even small SAN network problems

- by Wieslaw Herr

I am having problems with stonith SBD in a openais-based cluster. Some background: The active/passive cluster has two nodes, node1 and node2. They are configured to provide an NFS service to users. To avoid problems with split-brain, they are both configured to use SBD. SBD is using two 1MB disks available to the hosts via an multipath fibre-channel network. The problems start if something happens with the SAN network. For example, today one of the brocade switches got rebooted and both nodes lost 2 out of 4 paths to each disks, which resulted in both nodes committing suicide and rebooting. This, of course, was highly undesirable because a) there were paths left b) even if the switch would be out for 10-20 seconds a reboot cycle of both nodes would take 5-10 minutes and all NFS-locks would be lost. I tried increasing the SBD timeout values (to 10sec+ values, dump attached at the end), however a "WARN: Latency: No liveness for 4 s exceeds threshold of 3 s" hints that something isn't working as I would it expect to. Here is what I would like to know: a) Is SBD working as it should killing nodes when 2 paths are available? b) If not, is the multipath.conf file attached correct? The storage controller we use is an IBM SVC (IBM 2145), should there be any specific configuration for it? (as in multipath.conf.defaults) c) How should I go about increasing the timeouts in SBD attachements: Multipath.conf and sbd dump (http://hpaste.org/69537)

Read the article
MySQL: Pacemaker cannot start the failed master as a new slave?

- by quanta

I'm going to setup failover for MySQL replication (1 master and 1 slave) follow this guide: https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst Here're the output of crm configure show: node serving-6192 \ attributes p_mysql_mysql_master_IP="192.168.6.192" node svr184R-638.localdomain \ attributes p_mysql_mysql_master_IP="192.168.6.38" primitive p_mysql ocf:percona:mysql \ params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" replication_user="repl" replication_passwd="x" test_user="test_user" test_passwd="x" \ op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \ op monitor interval="2s" role="Slave" timeout="30s" OCF_CHECK_LEVEL="1" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" primitive writer_vip ocf:heartbeat:IPaddr2 \ params ip="192.168.6.8" cidr_netmask="32" \ op monitor interval="10s" \ meta is-managed="true" ms ms_MySQL p_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="Master" is-managed="true" colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start property $id="cib-bootstrap-options" \ dc-version="1.0.12-unknown" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1341801689" property $id="mysql_replication" \ p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338" crm_mon: Last updated: Mon Jul 9 10:30:01 2012 Stack: openais Current DC: serving-6192 - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ serving-6192 svr184R-638.localdomain ] Master/Slave Set: ms_MySQL Masters: [ serving-6192 ] Slaves: [ svr184R-638.localdomain ] writer_vip (ocf::heartbeat:IPaddr2): Started serving-6192 Editing /etc/my.cnf on the serving-6192 of wrong syntax to test failover and it's working fine: svr184R-638.localdomain being promoted to become the master writer_vip switch to svr184R-638.localdomain Current state: Last updated: Mon Jul 9 10:35:57 2012 Stack: openais Current DC: serving-6192 - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ serving-6192 svr184R-638.localdomain ] Master/Slave Set: ms_MySQL Masters: [ svr184R-638.localdomain ] Stopped: [ p_mysql:0 ] writer_vip (ocf::heartbeat:IPaddr2): Started svr184R-638.localdomain Failed actions: p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7, status=complete): not running p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7, status=complete): not running p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed Out): unknown exec error Remove the wrong syntax from /etc/my.cnf on serving-6192, and restart corosync, what I would like to see is serving-6192 was started as a new slave but it doesn't: Failed actions: p_mysql:0_start_0 (node=serving-6192, call=4, rc=1, status=complete): unknown error Here're snippet of the logs which I'm suspecting: Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output: (p_mysql:0:start:stderr) Error performing operation: The object/attribute does not exist Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked: /usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0 The full logs: http://fpaste.org/AyOZ/ The strange thing is I can starting it manually: export OCF_ROOT=/usr/lib/ocf export OCF_RESKEY_config="/etc/my.cnf" export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid" export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock" export OCF_RESKEY_replication_user="repl" export OCF_RESKEY_replication_passwd="x" export OCF_RESKEY_test_user="test_user" export OCF_RESKEY_test_passwd="x" sh -x /usr/lib/ocf/resource.d/percona/mysql start: http://fpaste.org/RVGh/ Did I make something wrong?

Read the article
ldirectord ipvsadm not show reals ip and not work wtih pacemaker and corosync

- by miguer27

first thanks for your time. I'm having a problem with ldirectord that I can not solve, I comment my situation: I have two nodes with pace maker and corosync and configure somes resources: root@ldap1:/home/mamartin# crm status Last updated: Tue Jun 3 12:58:30 2014 Last change: Tue Jun 3 12:23:47 2014 via cibadmin on ldap1 Stack: openais Current DC: ldap2 - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 7 Resources configured. Online: [ ldap1 ldap2 ] Resource Group: IPV_LVS IPV_4 (ocf::heartbeat:IPaddr2): Started ldap1 IPV_6 (ocf::heartbeat:IPv6addr): Started ldap1 lvs (ocf::heartbeat:ldirectord): Started ldap1 Clone Set: clon_IPV_lo [IPV_lo] Started: [ ldap2 ] Stopped: [ IPV_lo:1 ] root@ldap1:/home/mamartin# crm configure show node ldap2 \ attributes standby="off" node ldap1 \ attributes standby="off" primitive IPV-lo_4 ocf:heartbeat:IPaddr \ params ip="192.168.1.10" cidr_netmask="32" nic="lo" \ op monitor interval="5s" primitive IPV-lo_6 ocf:heartbeat:IPv6addrLO \ params ipv6addr="[fc00:1::3]" cidr_netmask="64" \ op monitor interval="5s" primitive IPV_4 ocf:heartbeat:IPaddr2 \ params ip="192.168.1.10" nic="eth0" cidr_netmask="25" lvs_support="true" \ op monitor interval="5s" primitive IPV_6 ocf:heartbeat:IPv6addr \ params ipv6addr="[fc00:1::3]" nic="eth0" cidr_netmask="64" \ op monitor interval="5s" primitive lvs ocf:heartbeat:ldirectord \ params configfile="/etc/ldirectord.cf" \ op monitor interval="20" timeout="10" \ meta target-role="Started" group IPV_LVS IPV_4 IPV_6 lvs group IPV_lo IPV-lo_6 IPV-lo_4 clone clon_IPV_lo IPV_lo \ meta interleave="true" target-role="Started" location cli-prefer-IPV_LVS IPV_LVS \ rule $id="cli-prefer-rule-IPV_LVS" inf: #uname eq ldap1 colocation LVS_no_IPV_lo -inf: clon_IPV_lo IPV_LVS property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1401264327" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" The problem is in the ipvsadm only show a one real IP, when i configured two now, show the ldirector.cf: root@ldap1:/home/mamartin# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags - RemoteAddress:Port Forward Weight ActiveConn InActConn TCP ldap-maqueta.cica.es:ldap wrr - ldap2.cica.es:ldap Route 4 0 0 TCP [[fc00:1::3]]:ldap wrr - [[fc00:1::2]]:ldap Route 4 0 0 root@ldap1:/home/mamartin# cat /etc/ldirectord.cf checktimeout=10 checkinterval=2 autoreload=yes logfile="/var/log/ldirectord.log" quiescent=yes #ipv4 virtual=192.168.1.10:389 real=192.168.1.11:389 gate 4 real=192.168.1.12:389 gate 4 scheduler=wrr protocol=tcp checktype=on #ipv6 virtual6=[[fc00:1::3]]:389 real6=[[fc00:1::1]]:389 gate 4 real6=[[fc00:1::2]]:389 gate 4 scheduler=wrr protocol=tcp checkport=389 checktype=on and in the logs I see nothing clear: root@ldap1:/home/mamartin# ldirectord -d /etc/ldirectord.cf start DEBUG2: Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.11:389 -g -w 0) Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.11:389 -g -w 0) DEBUG2: Quiescent real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 0) Quiescent real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 0) DEBUG2: Disabled real server=on:tcp:192.168.1.11:389:::4:gate:\/: (virtual=tcp:192.168.1.10:389) DEBUG2: Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 0) Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 0) DEBUG2: Quiescent real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 0) Quiescent real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 0) DEBUG2: Disabled real server=on:tcp:192.168.1.12:389:::4:gate:\/: (virtual=tcp:192.168.1.10:389) DEBUG2: Checking on: Real servers are added without any checks DEBUG2: Resetting soft failure count: 192.168.1.12:389 (tcp:192.168.1.10:389) Resetting soft failure count: 192.168.1.12:389 (tcp:192.168.1.10:389) DEBUG2: Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 4) Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 4) Destination already exists root@ldap1:/home/mamartin# cat /var/log/ldirectord.log [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Quiescent real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 0) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Quiescent real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 0) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: 192.168.1.12:389 (tcp:192.168.1.10:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 4) failed: [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Added real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 4) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: 192.168.1.11:389 (tcp:192.168.1.10:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Restored real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 4) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: [[fc00:1::2]]:389 (tcp:[[fc00:1::3]]:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] system(/sbin/ipvsadm -a -t [[fc00:1::3]]:389 -r [[fc00:1::2]]:389 -g -w 4) failed: [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Added real server: [[fc00:1::2]]:389 ([[fc00:1::3]]:389) (Weight set to 4) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: [[fc00:1::1]]:389 (tcp:[[fc00:1::3]]:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Restored real server: [[fc00:1::1]]:389 ([[fc00:1::3]]:389) (Weight set to 4) do not know if this is a bug or a configuration error, can anyone help? Regards.

Read the article
NFS v4, HA Migration, and stale handles on clients

- by Karl Katzke

I'm managing a server running NFS v4 with Pacemaker/OpenAIS. NFS is configured to use TCP. When I migrate the NFS server to another node in the Pacemaker cluster, even though the metadata is persisted, connections from the clients 'hang' and eventually time out after 90 seconds. After that 90 seconds, the old mountpoint becomes 'stale' and the mounted files can no longer be accessed. The 90 second grace period seems to be part of the server configuration and not the client configuration. I see this message on the server: kernel: NFSD: starting 90-second grace period If I restart the NFS client on the client nodes after I migrate (unmounting and then remounting the share), then I don't experience the problem, but connections and file transfers still interrupted. Three questions: What is the 90 second grace period? What's it there for? How can I keep the files from going stale on the clients without restarting them after I migrate the NFS server to another node? Is it actually possible to migrate the NFS server without having large file uploads drop?

Read the article
Corosync :: Restarting some resources after Lan connectivity issue

- by moebius_eye

I am currently looking into corosync to build a two-node cluster. So, I've got it working fine, and it does what I want to do, which is: Lost connectivity between the two nodes gives the first node '10node' both Failover Wan IPs. (aka resources WanCluster100 and WanCluster101 ) '11node' does nothing. He "thinks" he still has his Failover Wan IP. (aka WanCluster101) But it doesn't do this: '11node' should restart the WanCluster101 resource when the connectivity with the other node is back. This is to prevent a condition where node10 simply dies (and thus does not get 11node's Failover Wan IP), resulting in a situation where none of the nodes have 10node's failover IP because 10node is down 11node has "given back" his failover Wan IP. Here's the current configuration I'm working on. node 10sch \ attributes standby="off" node 11sch \ attributes standby="off" primitive LanCluster100 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.100" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive LanCluster101 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.101" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive Ping100 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive Ping101 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive WanCluster100 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.100" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive WanCluster101 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.101" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive Website0 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" options="-DSSL" \ operations $id="Website-one" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" primitive Website1 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf.1" options="-DSSL" \ operations $id="Website-two" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" group All100 WanCluster100 LanCluster100 group All101 WanCluster101 LanCluster101 location AlwaysPing100WithNode10 Ping100 \ rule $id="AlWaysPing100WithNode10-rule" inf: #uname eq 10sch location AlwaysPing101WithNode11 Ping101 \ rule $id="AlWaysPing101WithNode11-rule" inf: #uname eq 11sch location NeverLan100WithNode11 LanCluster100 \ rule $id="RAND1083308" -inf: #uname eq 11sch location NeverPing100WithNode11 Ping100 \ rule $id="NeverPing100WithNode11-rule" -inf: #uname eq 11sch location NeverPing101WithNode10 Ping101 \ rule $id="NeverPing101WithNode10-rule" -inf: #uname eq 10sch location Website0NeedsConnectivity Website0 \ rule $id="Website0NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 location Website1NeedsConnectivity Website1 \ rule $id="Website1NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 colocation Never -inf: LanCluster101 LanCluster100 colocation Never2 -inf: WanCluster100 LanCluster101 colocation NeverBothWebsitesTogether -inf: Website0 Website1 property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1408954702" \ maintenance-mode="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" \ migration-threshold="3" I also have a less important question concerning this line: colocation NeverBothLans -inf: LanCluster101 LanCluster100 How do I tell it that this collocation only applies to '11node'.

Read the article

1