Search Results

Search found 12 results on 1 pages for 'corosync'.

Page 1/1 | 1

Changing Corosync/Heartbeat pair's active node based on MySQL/Galera cluster state

- by Hace

Background I'm planning on building a High Availability "cluster" for our Zabbix instance by placing two physical servers in one server room and two in another server room. In each server room one of the physical servers will run Zabbix on RHEL and the other will run Zabbix's MySQL database, also on RHEL. I'd prefer synchronous replication for the MySQL nodes so I'm planning on using Galera in a master-slave configuration. The Zabbix instances on the two Zabbix servers would be controlled by Heartbeat/Corosync (although Red Hat Cluster Suite is also an option...) If the Zabbix server in Server Room A goes down, the one in Server Room B becomes active (and vice versa). Ditto for the MySQL servers/instances. If either of those cases happen, however, the connection between the Zabbix server and the MySQL server becomes significantly slower as ti has to travel over WAN. Question Is it possible to configure the Heartbeat/CoroSync pair to instruct the MySQL/Galera cluster to change the master node to switch to (if available) the one that's in the server room as the active Heartbeat/Corosync -node and (more challengingly) is it possible to do the same in the other direction, i.e have the Galera cluster change the active Heartbeat/CoroSync server to be in the same room as the active MySQL master server in case of a failover in over to avoid unnecessary WAN transfers between the application and its DB? Theories Most likely I can get CoroSync to run something that'd log in to one of the DB nodes to change the MySQL/Galera master but I don't know if it's really possible to do anything similar in the other direction in Galera. Is it possible to define a "service" in CoroSync/Heartbeat so that both the service and its MySQL service would migrate as one if possible. Using the DB server that's behind WAN should still be a better option to DB downtime. Am I just using too many tools to solve a problem that'd be far simpler with something else?

Read the article
could corosync can support unicast heartbeat mode?

- by Emre He

could corosync can support unicast heartbeat mode? from another thread in serverfault, some guy raised below corosync conf: totem { version: 2 secauth: off interface { member { memberaddr: 10.xxx.xxx.xxx } member { memberaddr: 10.xxx.xxx.xxx } ringnumber: 0 bindnetaddr: 10.xxx.xxx.xxx mcastport: 694 } transport: udpu } is this conf type means unicast mode? thanks, Emre

Read the article
Corosync :: Restarting some resources after Lan connectivity issue

- by moebius_eye

I am currently looking into corosync to build a two-node cluster. So, I've got it working fine, and it does what I want to do, which is: Lost connectivity between the two nodes gives the first node '10node' both Failover Wan IPs. (aka resources WanCluster100 and WanCluster101 ) '11node' does nothing. He "thinks" he still has his Failover Wan IP. (aka WanCluster101) But it doesn't do this: '11node' should restart the WanCluster101 resource when the connectivity with the other node is back. This is to prevent a condition where node10 simply dies (and thus does not get 11node's Failover Wan IP), resulting in a situation where none of the nodes have 10node's failover IP because 10node is down 11node has "given back" his failover Wan IP. Here's the current configuration I'm working on. node 10sch \ attributes standby="off" node 11sch \ attributes standby="off" primitive LanCluster100 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.100" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive LanCluster101 ocf:heartbeat:IPaddr2 \ params ip="172.25.0.101" cidr_netmask="32" nic="eth3" \ op monitor interval="10s" \ meta is-managed="true" target-role="Started" primitive Ping100 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive Ping101 ocf:pacemaker:ping \ params host_list="192.0.2.1" multiplier="500" dampen="15s" \ op monitor interval="5s" \ meta target-role="Started" primitive WanCluster100 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.100" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive WanCluster101 ocf:heartbeat:IPaddr2 \ params ip="192.0.2.101" cidr_netmask="32" nic="eth2" \ op monitor interval="10s" \ meta target-role="Started" primitive Website0 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" options="-DSSL" \ operations $id="Website-one" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" primitive Website1 ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf.1" options="-DSSL" \ operations $id="Website-two" \ op start interval="0" timeout="40" \ op stop interval="0" timeout="60" \ op monitor interval="10" timeout="120" start-delay="0" statusurl="http://127.0.0.1/server-status/" \ meta target-role="Started" group All100 WanCluster100 LanCluster100 group All101 WanCluster101 LanCluster101 location AlwaysPing100WithNode10 Ping100 \ rule $id="AlWaysPing100WithNode10-rule" inf: #uname eq 10sch location AlwaysPing101WithNode11 Ping101 \ rule $id="AlWaysPing101WithNode11-rule" inf: #uname eq 11sch location NeverLan100WithNode11 LanCluster100 \ rule $id="RAND1083308" -inf: #uname eq 11sch location NeverPing100WithNode11 Ping100 \ rule $id="NeverPing100WithNode11-rule" -inf: #uname eq 11sch location NeverPing101WithNode10 Ping101 \ rule $id="NeverPing101WithNode10-rule" -inf: #uname eq 10sch location Website0NeedsConnectivity Website0 \ rule $id="Website0NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 location Website1NeedsConnectivity Website1 \ rule $id="Website1NeedsConnectivity-rule" -inf: not_defined pingd or pingd lte 0 colocation Never -inf: LanCluster101 LanCluster100 colocation Never2 -inf: WanCluster100 LanCluster101 colocation NeverBothWebsitesTogether -inf: Website0 Website1 property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1408954702" \ maintenance-mode="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" \ migration-threshold="3" I also have a less important question concerning this line: colocation NeverBothLans -inf: LanCluster101 LanCluster100 How do I tell it that this collocation only applies to '11node'.

Read the article
ldirectord ipvsadm not show reals ip and not work wtih pacemaker and corosync

- by miguer27

first thanks for your time. I'm having a problem with ldirectord that I can not solve, I comment my situation: I have two nodes with pace maker and corosync and configure somes resources: root@ldap1:/home/mamartin# crm status Last updated: Tue Jun 3 12:58:30 2014 Last change: Tue Jun 3 12:23:47 2014 via cibadmin on ldap1 Stack: openais Current DC: ldap2 - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 7 Resources configured. Online: [ ldap1 ldap2 ] Resource Group: IPV_LVS IPV_4 (ocf::heartbeat:IPaddr2): Started ldap1 IPV_6 (ocf::heartbeat:IPv6addr): Started ldap1 lvs (ocf::heartbeat:ldirectord): Started ldap1 Clone Set: clon_IPV_lo [IPV_lo] Started: [ ldap2 ] Stopped: [ IPV_lo:1 ] root@ldap1:/home/mamartin# crm configure show node ldap2 \ attributes standby="off" node ldap1 \ attributes standby="off" primitive IPV-lo_4 ocf:heartbeat:IPaddr \ params ip="192.168.1.10" cidr_netmask="32" nic="lo" \ op monitor interval="5s" primitive IPV-lo_6 ocf:heartbeat:IPv6addrLO \ params ipv6addr="[fc00:1::3]" cidr_netmask="64" \ op monitor interval="5s" primitive IPV_4 ocf:heartbeat:IPaddr2 \ params ip="192.168.1.10" nic="eth0" cidr_netmask="25" lvs_support="true" \ op monitor interval="5s" primitive IPV_6 ocf:heartbeat:IPv6addr \ params ipv6addr="[fc00:1::3]" nic="eth0" cidr_netmask="64" \ op monitor interval="5s" primitive lvs ocf:heartbeat:ldirectord \ params configfile="/etc/ldirectord.cf" \ op monitor interval="20" timeout="10" \ meta target-role="Started" group IPV_LVS IPV_4 IPV_6 lvs group IPV_lo IPV-lo_6 IPV-lo_4 clone clon_IPV_lo IPV_lo \ meta interleave="true" target-role="Started" location cli-prefer-IPV_LVS IPV_LVS \ rule $id="cli-prefer-rule-IPV_LVS" inf: #uname eq ldap1 colocation LVS_no_IPV_lo -inf: clon_IPV_lo IPV_LVS property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1401264327" rsc_defaults $id="rsc-options" \ resource-stickiness="1000" The problem is in the ipvsadm only show a one real IP, when i configured two now, show the ldirector.cf: root@ldap1:/home/mamartin# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags - RemoteAddress:Port Forward Weight ActiveConn InActConn TCP ldap-maqueta.cica.es:ldap wrr - ldap2.cica.es:ldap Route 4 0 0 TCP [[fc00:1::3]]:ldap wrr - [[fc00:1::2]]:ldap Route 4 0 0 root@ldap1:/home/mamartin# cat /etc/ldirectord.cf checktimeout=10 checkinterval=2 autoreload=yes logfile="/var/log/ldirectord.log" quiescent=yes #ipv4 virtual=192.168.1.10:389 real=192.168.1.11:389 gate 4 real=192.168.1.12:389 gate 4 scheduler=wrr protocol=tcp checktype=on #ipv6 virtual6=[[fc00:1::3]]:389 real6=[[fc00:1::1]]:389 gate 4 real6=[[fc00:1::2]]:389 gate 4 scheduler=wrr protocol=tcp checkport=389 checktype=on and in the logs I see nothing clear: root@ldap1:/home/mamartin# ldirectord -d /etc/ldirectord.cf start DEBUG2: Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.11:389 -g -w 0) Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.11:389 -g -w 0) DEBUG2: Quiescent real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 0) Quiescent real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 0) DEBUG2: Disabled real server=on:tcp:192.168.1.11:389:::4:gate:\/: (virtual=tcp:192.168.1.10:389) DEBUG2: Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 0) Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 0) DEBUG2: Quiescent real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 0) Quiescent real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 0) DEBUG2: Disabled real server=on:tcp:192.168.1.12:389:::4:gate:\/: (virtual=tcp:192.168.1.10:389) DEBUG2: Checking on: Real servers are added without any checks DEBUG2: Resetting soft failure count: 192.168.1.12:389 (tcp:192.168.1.10:389) Resetting soft failure count: 192.168.1.12:389 (tcp:192.168.1.10:389) DEBUG2: Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 4) Running system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 4) Destination already exists root@ldap1:/home/mamartin# cat /var/log/ldirectord.log [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Quiescent real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 0) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Quiescent real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 0) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: 192.168.1.12:389 (tcp:192.168.1.10:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] system(/sbin/ipvsadm -a -t 192.168.1.10:389 -r 192.168.1.12:389 -g -w 4) failed: [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Added real server: 192.168.1.12:389 (192.168.1.10:389) (Weight set to 4) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: 192.168.1.11:389 (tcp:192.168.1.10:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Restored real server: 192.168.1.11:389 (192.168.1.10:389) (Weight set to 4) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: [[fc00:1::2]]:389 (tcp:[[fc00:1::3]]:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] system(/sbin/ipvsadm -a -t [[fc00:1::3]]:389 -r [[fc00:1::2]]:389 -g -w 4) failed: [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Added real server: [[fc00:1::2]]:389 ([[fc00:1::3]]:389) (Weight set to 4) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Resetting soft failure count: [[fc00:1::1]]:389 (tcp:[[fc00:1::3]]:389) [Tue Jun 3 09:39:29 2014|ldirectord.cf|19266] Restored real server: [[fc00:1::1]]:389 ([[fc00:1::3]]:389) (Weight set to 4) do not know if this is a bug or a configuration error, can anyone help? Regards.

Read the article
What's the difference between keepalived and corosync, others?

- by Matt

I'm building a failover firewall for a server cluster and started looking at the various options. I'm more familiar with carp on freebsd, but need to use linux for this project. Searching google has produced several different projects, but no clear information about features they provide . CARP gave virtual interfaces that failover, I am not really clear on whether that's what corosync does, or is that what pacemaker does? On the other hand I did get manage to get keepalived working. However, I noted that corosync provides native support for infiniband. This would be useful for me. Perhaps someone could shed some light on the differences between: corosync keepalive pacemaker heartbeat Which product would be the best fit for router failover?

Read the article
Corosync - stopping the service crashes the server

- by Antipop

I am trying to set up a test cluster on a Xen Server with 2 paravirtualized CentOS 5.4 machines. I am using Pacemaker+Corosync, and following the instructions found at http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf and other sites. Anyway, when I try to manually stop the corosync service, about 80% of the times the whole VM locks up with the message "Waiting for corosync services to unload" and I am forced to shut the machine down manually. For the remaining 20%, the VM keeps responding and adds dots to the above message, but it won't actually stop the service. There aren't many resources on the internet about this particular error. Any ideas about this? Thanks in advance.

Read the article
MySQL: Pacemaker cannot start the failed master as a new slave?

- by quanta

I'm going to setup failover for MySQL replication (1 master and 1 slave) follow this guide: https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst Here're the output of crm configure show: node serving-6192 \ attributes p_mysql_mysql_master_IP="192.168.6.192" node svr184R-638.localdomain \ attributes p_mysql_mysql_master_IP="192.168.6.38" primitive p_mysql ocf:percona:mysql \ params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" replication_user="repl" replication_passwd="x" test_user="test_user" test_passwd="x" \ op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \ op monitor interval="2s" role="Slave" timeout="30s" OCF_CHECK_LEVEL="1" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" primitive writer_vip ocf:heartbeat:IPaddr2 \ params ip="192.168.6.8" cidr_netmask="32" \ op monitor interval="10s" \ meta is-managed="true" ms ms_MySQL p_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="Master" is-managed="true" colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start property $id="cib-bootstrap-options" \ dc-version="1.0.12-unknown" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1341801689" property $id="mysql_replication" \ p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338" crm_mon: Last updated: Mon Jul 9 10:30:01 2012 Stack: openais Current DC: serving-6192 - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ serving-6192 svr184R-638.localdomain ] Master/Slave Set: ms_MySQL Masters: [ serving-6192 ] Slaves: [ svr184R-638.localdomain ] writer_vip (ocf::heartbeat:IPaddr2): Started serving-6192 Editing /etc/my.cnf on the serving-6192 of wrong syntax to test failover and it's working fine: svr184R-638.localdomain being promoted to become the master writer_vip switch to svr184R-638.localdomain Current state: Last updated: Mon Jul 9 10:35:57 2012 Stack: openais Current DC: serving-6192 - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ serving-6192 svr184R-638.localdomain ] Master/Slave Set: ms_MySQL Masters: [ svr184R-638.localdomain ] Stopped: [ p_mysql:0 ] writer_vip (ocf::heartbeat:IPaddr2): Started svr184R-638.localdomain Failed actions: p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7, status=complete): not running p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7, status=complete): not running p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed Out): unknown exec error Remove the wrong syntax from /etc/my.cnf on serving-6192, and restart corosync, what I would like to see is serving-6192 was started as a new slave but it doesn't: Failed actions: p_mysql:0_start_0 (node=serving-6192, call=4, rc=1, status=complete): unknown error Here're snippet of the logs which I'm suspecting: Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output: (p_mysql:0:start:stderr) Error performing operation: The object/attribute does not exist Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked: /usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0 The full logs: http://fpaste.org/AyOZ/ The strange thing is I can starting it manually: export OCF_ROOT=/usr/lib/ocf export OCF_RESKEY_config="/etc/my.cnf" export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid" export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock" export OCF_RESKEY_replication_user="repl" export OCF_RESKEY_replication_passwd="x" export OCF_RESKEY_test_user="test_user" export OCF_RESKEY_test_passwd="x" sh -x /usr/lib/ocf/resource.d/percona/mysql start: http://fpaste.org/RVGh/ Did I make something wrong?

Read the article
Linux HA / cluster: what are the differences between Pacemaker, Heartbeat, Corosync, wackamole?

- by Continuation

Can you help me understand Linux HA? Pacemaker, Heartbeat, Corosync seem to be part of a whole HA stack, but how do they fit together? How does wackamole differ from Pacemaker/Heartbeat/Corosync? I've seen opinions that wackamole is better than Heartbeat because it's peer-based. Is that valid? The last release of wackamole was 2.5 years ago. Is it still being maintained or active? What would you recommend for HA setup for web/application/database servers?

Read the article
Active node stops resources when pasive node is shutdown

- by Wakaru44

2 nodes, active/pasive. 2 resources, a virtual ip, openLdap, and the nfs mount where openldap saves the data. When both nodes are up, things worked fine. You could move resources away and put the active in stanby. But when i rebooted the passive node, ( with the resources in the active node), and the passive node loses conectivity, all the resources in the active where stopped by pacemaker. I'm reading the documentation right now, but I just need a little quick tip to figure what could be hapenning here. Im using: corosync pacemaker RHEL 6

Read the article
Handling database failover for Rails applications on FreeBSD

- by bianster

I'm working on implementing database (Postgresql) failover for a Rails app that runs with Passenger/FreeBSD. Due to certain constraints regarding the server OS, it's necessary to continue using FreeBSD (as opposed to say, Ubuntu). I'm finding it to be quite a challenge to have failover handled within the Rails application, by way of a customised database adapter due to the fact that this application will be load-balanced between several webservers, and the multiple Rails processes that Passenger spawned in each webserver. I previously looked at setting up Pacemaker/Corosync to manage database server failover on a common IP but unfortunately I wasn't able to get past building the packages on FreeBSD. It does work rather well on Ubuntu 10.04 but I'm not likely to be able to use Ubuntu due to the OS constraints. I'm considering a custom witness daemon that simply pings the primary DB server, and this witness daemon switches all the webservers to the standby DB server when the primary becomes uncontactable (permanently/temporarily), to avoid split-brain. Though I would really like to know if there is a way to get Pacemaker(or something similar) to do the switch on FreeBSD.

Read the article
New Options for MySQL High Availability

- by Mat Keep

Data is the currency of today’s web, mobile, social, enterprise and cloud applications. Ensuring data is always available is a top priority for any organization – minutes of downtime will result in significant loss of revenue and reputation. There is not a “one size fits all” approach to delivering High Availability (HA). Unique application attributes, business requirements, operational capabilities and legacy infrastructure can all influence HA technology selection. And then technology is only one element in delivering HA – “People and Processes” are just as critical as the technology itself. For this reason, MySQL Enterprise Edition is available supporting a range of HA solutions, fully certified and supported by Oracle. MySQL Enterprise HA is not some expensive add-on, but included within the core Enterprise Edition offering, along with the management tools, consulting and 24x7 support needed to deliver true HA. At the recent MySQL Connect conference, we announced new HA options for MySQL users running on both Linux and Solaris: - DRBD for MySQL - Oracle Solaris Clustering for MySQL DRBD (Distributed Replicated Block Device) is an open source Linux kernel module which leverages synchronous replication to deliver high availability database applications across local storage. DRBD synchronizes database changes by mirroring data from an active node to a standby node and supports automatic failover and recovery. Linux, DRBD, Corosync and Pacemaker, provide an integrated stack of mature and proven open source technologies. DRBD Stack: Providing Synchronous Replication for the MySQL Database with InnoDB Download the DRBD for MySQL whitepaper to learn more, including step-by-step instructions to install, configure and provision DRBD with MySQL Oracle Solaris Cluster provides high availability and load balancing to mission-critical applications and services in physical or virtualized environments. With Oracle Solaris Cluster, organizations have a scalable and flexible solution that is suited equally to small clusters in local datacenters or larger multi-site, multi-cluster deployments that are part of enterprise disaster recovery implementations. The Oracle Solaris Cluster MySQL agent integrates seamlessly with MySQL offering a selection of configuration options in the various Oracle Solaris Cluster topologies. Putting it All Together When you add MySQL Replication and MySQL Cluster into the HA mix, along with 3rd party solutions, users have extensive choice (and decisions to make) to deliver HA services built on MySQL To make the decision process simpler, we have also published a new MySQL HA Solutions Guide. Exploring beyond just the technology, the guide presents a methodology to select the best HA solution for your new web, cloud and mobile services, while also discussing the importance of people and process in ensuring service continuity. This is subject recently presented at Oracle Open World, and the slides are available here. Whatever your uptime requirements, you can be sure MySQL has an HA solution for your needs Please don't hesitate to let us know of your HA requirements in the comments section of this blog. You can also contact MySQL consulting to learn more about their HA Jumpstart offering which will help you scope out your scaling and HA requirements.

Read the article
Ubuntu Server mdadm drbd ocfs2 kvm hangs under heavy file reading

- by Stefano Annese

I have deployed four ubuntu 10.04 server. They are coupled two by two in a cluster scenario. on both sides we have software raid1 disks, drbd8 and OCFS2 and on top of it some kvm machines run with qcow2 disks. I followed this: Link corosync is just used for DRBD and OCFS, the kvm machines are run "manually" When it works is fine: good performances, good I/O, but at a given time one of the two cluster started hanging. Then we tried with just one server turned on and it hangs the same. It seems to happen when an heavy READ in one of the virtual machines occurs, that is during rsyn backup. When the fact occurs the virtual machines are not reachable any more and the real server responds with good delay to the ping but no screen and no ssh is available. All we can do is force shutdown (hold the button) and restart and when it turns on again the raid on which relay drbd is resyncing. All the time it hangs we see such fact. After a couple of week of pain on one side this morning also the other cluster hung, but it has different moteherboard, ram, kvm instances. What is similar is reading for rsync scenario and Western Digital RAID Edistion disks on both side. Can anybody give me some input to solve such issue?

Read the article

Search Results

Search found 12 results on 1 pages for 'corosync'.

Page 1/1 | 1

- by Hace

- by Emre He

- by moebius_eye

- by miguer27

- by Matt

- by Antipop

- by quanta

- by Continuation

- by Wakaru44

- by bianster

- by Mat Keep

- by Stefano Annese