Search Results

Search found 1815 results on 73 pages for 'percona xtradb cluster'.

Page 2/73 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Oracle Coherence, Split-Brain and Recovery Protocols In Detail

    - by Ricardo Ferreira
    This article provides a high level conceptual overview of Split-Brain scenarios in distributed systems. It will focus on a specific example of cluster communication failure and recovery in Oracle Coherence. This includes a discussion on the witness protocol (used to remove failed cluster members) and the panic protocol (used to resolve Split-Brain scenarios). Note that the removal of cluster members does not necessarily indicate a Split-Brain condition. Oracle Coherence does not (and cannot) detect a Split-Brain as it occurs, the condition is only detected when cluster members that previously lost contact with each other regain contact. Cluster Topology and Configuration In order to create an good didactic for the article, let's assume a cluster topology and configuration. In this example we have a six member cluster, consisting of one JVM on each physical machine. The member IDs are as follows: Member ID  IP Address  1  10.149.155.76  2  10.149.155.77  3  10.149.155.236  4  10.149.155.75  5  10.149.155.79  6  10.149.155.78 Members 1, 2, and 3 are connected to a switch, and members 4, 5, and 6 are connected to a second switch. There is a link between the two switches, which provides network connectivity between all of the machines. Member 1 is the first member to join this cluster, thus making it the senior member. Member 6 is the last member to join this cluster. Here is a log snippet from Member 6 showing the complete member set: 2010-02-26 15:27:57.390/3.062 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=main, member=6): Started DefaultCacheServer... SafeCluster: Name=cluster:0xDDEB Group{Address=224.3.5.3, Port=35465, TTL=4} MasterMemberSet ( ThisMember=Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) OldestMember=Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) ActualMemberSet=MemberSet(Size=6, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer) Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ) RecycleMillis=120000 RecycleSet=MemberSet(Size=0, BitSetCount=0 ) ) At approximately 15:30, the connection between the two switches is severed: Thirty seconds later (the default packet timeout in development mode) the logs indicate communication failures across the cluster. In this example, the communication failure was caused by a network failure. In a production setting, this type of communication failure can have many root causes, including (but not limited to) network failures, excessive GC, high CPU utilization, swapping/virtual memory, and exceeding maximum network bandwidth. In addition, this type of failure is not necessarily indicative of a split brain. Any communication failure will be logged in this fashion. Member 2 logs a communication failure with Member 5: 2010-02-26 15:30:32.638/196.928 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=5, Timestamp=2010-02-26 15:27:49.095, Address=10.149.155.79:8088, MachineId=1103, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:3229, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) ) The Coherence clustering protocol (TCMP) is a reliable transport mechanism built on UDP. In order for the protocol to be reliable, it requires an acknowledgement (ACK) for each packet delivered. If a packet fails to be acknowledged within the configured timeout period, the Coherence cluster member will log a packet timeout (as seen in the log message above). When this occurs, the cluster member will consult with other members to determine who is at fault for the communication failure. If the witness members agree that the suspect member is at fault, the suspect is removed from the cluster. If the witnesses unanimously disagree, the accuser is removed. This process is known as the witness protocol. Since Member 2 cannot communicate with Member 5, it selects two witnesses (Members 1 and 4) to determine if the communication issue is with Member 5 or with itself (Member 2). However, Member 4 is on the switch that is no longer accessible by Members 1, 2 and 3; thus a packet timeout for member 4 is recorded as well: 2010-02-26 15:30:35.648/199.938 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ) Member 1 has the ability to confirm the departure of member 4, however Member 6 cannot as it is also inaccessible. At the same time, Member 3 sends a request to remove Member 6, which is followed by a report from Member 3 indicating that Member 6 has departed the cluster: 2010-02-26 15:30:35.706/199.996 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=2): MemberLeft request for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) 2010-02-26 15:30:35.709/199.999 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=2): MemberLeft notification for Member 6 received from Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) The log for Member 3 determines how Member 6 departed the cluster: 2010-02-26 15:30:35.161/191.694 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=3): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer) ) 2010-02-26 15:30:35.165/191.698 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=3): Member departure confirmed by MemberSet(Size=2, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) Member(Id=2, Timestamp=2010-02-26 15:27:17.847, Address=10.149.155.77:8088, MachineId=1101, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:296, Role=CoherenceServer) ); removing Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) In this case, Member 3 happened to select two witnesses that it still had connectivity with (Members 1 and 2) thus resulting in a simple decision to remove Member 6. Given the departure of Member 6, Member 2 is left with a single witness to confirm the departure of Member 4: 2010-02-26 15:30:35.713/200.003 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=2): Member departure confirmed by MemberSet(Size=1, BitSetCount=2 Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) ); removing Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) In the meantime, Member 4 logs a missing heartbeat from the senior member. This message is also logged on Members 5 and 6. 2010-02-26 15:30:07.906/150.453 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=PacketListenerN, member=4): Scheduled senior member heartbeat is overdue; rejoining multicast group. Next, Member 4 logs a TcpRing failure with Member 2, thus resulting in the termination of Member 2: 2010-02-26 15:30:21.421/163.968 Oracle Coherence GE 3.5.3/465p2 <D4> (thread=Cluster, member=4): TcpRing: Number of socket exceptions exceeded maximum; last was "java.net.SocketTimeoutException: connect timed out"; removing the member: 2 For quick process termination detection, Oracle Coherence utilizes a feature called TcpRing which is a sparse collection of TCP/IP-based connections between different members in the cluster. Each member in the cluster is connected to at least one other member, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer, only heartbeat communications are sent once a second per each link. If a certain number of exceptions are thrown while trying to re-establish a connection, the member throwing the exceptions is removed from the cluster. Member 5 logs a packet timeout with Member 3 and cites witnesses Members 4 and 6: 2010-02-26 15:30:29.791/165.037 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=PacketPublisher, member=5): Timeout while delivering a packet; requesting the departure confirmation for Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) by MemberSet(Size=2, BitSetCount=2 Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ) 2010-02-26 15:30:29.798/165.044 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=5): Member departure confirmed by MemberSet(Size=2, BitSetCount=2 Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member(Id=6, Timestamp=2010-02-26 15:27:58.635, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) ); removing Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer) Eventually we are left with two distinct clusters consisting of Members 1, 2, 3 and Members 4, 5, 6, respectively. In the latter cluster, Member 4 is promoted to senior member. The connection between the two switches is restored at 15:33. Upon the restoration of the connection, the cluster members immediately receive cluster heartbeats from the two senior members. In the case of Members 1, 2, and 3, the following is logged: 2010-02-26 15:33:14.970/369.066 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=Cluster, member=1): The member formerly known as Member(Id=4, Timestamp=2010-02-26 15:30:35.341, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored. Likewise for Members 4, 5, and 6: 2010-02-26 15:33:14.343/336.890 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=Cluster, member=4): The member formerly known as Member(Id=1, Timestamp=2010-02-26 15:30:31.64, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored. This message indicates that a senior heartbeat is being received from members that were previously removed from the cluster, in other words, something that should not be possible. For this reason, the recipients of these messages will initially ignore them. After several iterations of these messages, the existence of multiple clusters is acknowledged, thus triggering the panic protocol to reconcile this situation. When the presence of more than one cluster (i.e. Split-Brain) is detected by a Coherence member, the panic protocol is invoked in order to resolve the conflicting clusters and consolidate into a single cluster. The protocol consists of the removal of smaller clusters until there is one cluster remaining. In the case of equal size clusters, the one with the older Senior Member will survive. Member 1, being the oldest member, initiates the protocol: 2010-02-26 15:33:45.970/400.066 Oracle Coherence GE 3.5.3/465p2 <Warning> (thread=Cluster, member=1): An existence of a cluster island with senior Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) containing 3 nodes have been detected. Since this Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) is the senior of an older cluster island, the panic protocol is being activated to stop the other island's senior and all junior nodes that belong to it. Member 3 receives the panic: 2010-02-26 15:33:45.803/382.336 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=3): Received panic from senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer) caused by Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer) Member 4, the senior member of the younger cluster, receives the kill message from Member 3: 2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service. In turn, Member 4 requests the departure of its junior members 5 and 6: 2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service. 2010-02-26 15:33:43.343/349.015 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=6): Received a Kill message from a valid Member(Id=4, Timestamp=2010-02-26 15:27:39.574, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer); stopping cluster service. Once Members 4, 5, and 6 restart, they rejoin the original cluster with senior member 1. The log below is from Member 4. Note that it receives a different member id when it rejoins the cluster. 2010-02-26 15:33:44.921/367.468 Oracle Coherence GE 3.5.3/465p2 <Error> (thread=Cluster, member=4): Received a Kill message from a valid Member(Id=3, Timestamp=2010-02-26 15:27:24.892, Address=10.149.155.236:8088, MachineId=1260, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:32459, Role=CoherenceServer); stopping cluster service. 2010-02-26 15:33:46.921/369.468 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Service Cluster left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Invocation:InvocationService, member=4): Service InvocationService left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=OptimisticCache, member=4): Service OptimisticCache left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=ReplicatedCache, member=4): Service ReplicatedCache left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=DistributedCache, member=4): Service DistributedCache left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Invocation:Management, member=4): Service Management left the cluster 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service Management with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service DistributedCache with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service ReplicatedCache with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service OptimisticCache with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member 6 left service InvocationService with senior member 5 2010-02-26 15:33:47.046/369.593 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=4): Member(Id=6, Timestamp=2010-02-26 15:33:47.046, Address=10.149.155.78:8088, MachineId=1102, Location=process:228, Role=CoherenceServer) left Cluster with senior member 4 2010-02-26 15:33:49.218/371.765 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=main, member=n/a): Restarting cluster 2010-02-26 15:33:49.421/371.968 Oracle Coherence GE 3.5.3/465p2 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a 2010-02-26 15:33:49.625/372.172 Oracle Coherence GE 3.5.3/465p2 <Info> (thread=Cluster, member=n/a): This Member(Id=5, Timestamp=2010-02-26 15:33:50.499, Address=10.149.155.75:8088, MachineId=1099, Location=process:800, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) joined cluster "cluster:0xDDEB" with senior Member(Id=1, Timestamp=2010-02-26 15:27:06.931, Address=10.149.155.76:8088, MachineId=1100, Location=site:usdhcp.oraclecorp.com,machine:dhcp-burlington6-4fl-east-10-149,process:511, Role=CoherenceServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) Cool isn't it?

    Read the article

  • MySQL Cluster 7.3 - Join This Week's Webinar to Learn What's New

    - by Mat Keep
    The first Development Milestone and Early Access releases of MySQL Cluster 7.3 were announced just several weeks ago. To provide more detail and demonstrate the new features, Andrew Morgan and I will be hosting a live webinar this coming Thursday 25th October at 0900 Pacific Time / 16.00 UTC Even if you can't make the live webinar, it is still worth registering for the event as you will receive a notification when the replay will be available, to view on-demand at your convenience In the webinar, we will discuss the enhancements being previewed as part of MySQL Cluster 7.3, including: - Foreign Key Constraints: Yes, we've looked into the future and decided Foreign Keys are it ;-) You can read more about the implementation of Foreign Keys in MySQL Cluster 7.3 here - Node.js NoSQL API: Allowing web, mobile and cloud services to query and receive results sets from MySQL Cluster, natively in JavaScript, enables developers to seamlessly couple high performance, distributed applications with a high performance, distributed, persistence layer delivering 99.999% availability. You can study the Node.js / MySQL Cluster tutorial here - Auto-Installer: This new web-based GUI makes it simple for DevOps teams to quickly configure and provision highly optimized MySQL Cluster deployments on-premise or in the cloud You can view a YouTube tutorial on the MySQL Cluster Auto-Installer here  So we have a lot to cover in our 45 minute session. It will be time well spent if you want to know more about the future direction of MySQL Cluster and how it can help you innovate faster, with greater simplicity. Registration is open 

    Read the article

  • Linux stretch cluster: MD replication, DRBD or Veritas?

    - by PieterB
    For the moment there's a lot of choices for setting up a Linux cluster. For cluster manager: you can use Red Hat Cluster manager, Pacemaker or Veritas Cluster Server. The first one has the most momentum, the second one comes by default with RH subscriptions and the last one is very expensive and has a very good reputation ;-) For storage: - You can replicate LUN's using software raid / md device - You can use the network using DRBD replication, which offers a bit more flexibility - You can use Veritas Storage Foundation technology to talk to your SANs replication technology. Anyone has any recommandations or experience with these technologies?

    Read the article

  • Installed Percona mySQL on CPanel but getting an error

    - by user1227914
    I installed Percona mySQL on my fresh CPanel server (no databases yet) according to: http://www.ecommy.com/linux/install-...el-environment Everything seemed to be OK and the server also starts fine, except some commands return this error: root@server [/var/lib/mysql]# mysql -A -sN information_schema -e "select * from user_statistics;" mysql: unknown variable 'innodb_file_per_table=1' root@server [/var/lib/mysql]# mysql -A mysql: unknown variable 'innodb_file_per_table=1' In my /etc/my.cnf I have: [mysql] innodb_file_per_table=1 userstat_running=1 I am planning on using InnoDB for the databases. Anyone know what the problem is? Or even better, how to fix it? I have installed Percona 5.5 with yum on CentOS.

    Read the article

  • Product Update Bulletin: Oracle Solaris Cluster October 2013

    - by uwes
    Announcing new qualifications and general news for the Oracle Solaris Cluster product. Hardware Qualifications Sun Server X4-2 and X4-2L servers, Sun Blade X4-2B server module with Oracle Solaris Cluster 3.3 Sun Storage 16 Gb Fibre Channel ExpressModule Universal HBA, Emulex Oracle Dual Port QDR InfiniBand Adapter M3 Software Qualifications Oracle Database 12c Real Application Cluster with Oracle Solaris Cluster 4.1 Oracle Database 11.2.0.4 single instance and RAC with Oracle Solaris Cluster 4.1 Oracle VM server for SPARC 3.1 SAP Netweaver with new kernel versions ZFS Storage Appliance Kit version 2011.1.7.0 and 2013.1.0.0 Application monitoring in Oracle VM for SPARC failover guest domain Storage Partner Update Oracle Solaris Cluster 3.3 3/13 with the HDS Enterprise Storage arrays EMC SRDF for Oracle database 12c RAC in Oracle Solaris Cluster 4.1 geo cluster configuration Oracle Solaris Cluster References Korea Enterprise Data, HDFC Securities, Dealis Fund Operations Web Updates New blog entry: Oracle Solaris 10 Brand Zone cluster Solaris Application Engineering website now includes Oracle Solaris Cluster application support information Please read the Oracle Solaris Cluster Product Update Bulletin on Oracle HW TRC for more details. (If you are not registered on Oracle HW TRC, click here ... and follow the instructions..) _____________________________________________________________________ For More Information Go To:Oracle.com Oracle Solaris Cluster page Oracle Technology Network Oracle Solaris Cluster pageOracle Solaris Cluster mos communityPartner web Oracle Solaris Cluster pageOracle Solaris Cluster Blog Solaris.us.oracle.com page

    Read the article

  • Announcing Release of Oracle Solaris Cluster 4.1!

    - by user9159196
    Oct 26, 2012We are very happy to announce the release of  Oracle Solaris Cluster 4.1, providing High Availability (HA) and  Disaster Recovery (DR) capabilities for Oracle Solaris 11.1.  This is yet another proof of Oracle's continued investment in Oracle Solaris technologies such as Oracle Solaris Cluster. For this new release we have improved the Solaris Cluster integration within the Oracle environment. For example  we've created new agents such as PeopleSoft JobScheduler or added the support of the Oracle ZFS Storage Appliance replication in the Geo Edition module (to facilitate disaster recovery in multi-site configuration equipped with those types of storage.) We have also extended the Oracle Solaris Zone Cluster feature with support of Oracle Solaris 10 zone clusters and exclusive-IP to facilitate deployment of virtualized or cloud architecture.And there are many more new features to discover in this release. Stay tuned for more specific articles. In the mean time check out the What's new document or even better, download the latest version from  here.Also, join the Oracle Solaris 11 Online Event on November 7 where an entire session will be devoted to discussing Oracle Solaris Cluster 4.1. Our Oracle Solaris Cluster engineers will be on hand to respond to your questions. We look forward to your feedback and inputs! -Nancy Chow and Eve Kleinknecht 

    Read the article

  • MySQL cluster: Error after 1 data node is shutdown and started again

    - by nitins
    MySQL cluster: Error after 1 data node is shutdown and started again. We have configured MySQL cluster(version 7.1) with 2 sql/data nodes. We are using table space instead of in-memory clustering. The setup was working fine. So to test the setup I shutdown one data node, updated a table and and again stated the stopped node. Its giving this error and not starting. Any ideas ? Forced node shutdown completed. Occured during startphase 5. Caused by error 2306: 'Pointer too large(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

    Read the article

  • Windows failover cluster - virtual node with multiple Client Access Names

    - by mclaassen
    I recently encountered a customer environment in which they had failover cluster where one of the virtual nodes had more than one 'Client Access Name' (i.e. more than one IP address and DNS name for the single virtual node). Long story short we had to modify our software to deal with this situation, but we want to recreate the situation in house to test it before releasing. I have been unable to locate any information about how or why you would end up with a virtual node that has more than one access name. Does anyone know how I can set up a Windows failover cluster where a virtual node has more than one access name/IP?

    Read the article

  • Clustering Basics and Challenges

    - by Karoly Vegh
    For upcoming posts it seemed to be a good idea to dedicate some time for cluster basic concepts and theory. This post misses a lot of details that would explode the articlesize, should you have questions, do not hesitate to ask them in the comments.  The goal here is to get some concepts straight. I can't promise to give you an overall complete definitions of cluster, cluster agent, quorum, voting, fencing, split brain condition, so the following is more of an explanation. Here we go. -------- Cluster, HA, failover, switchover, scalability -------- An attempted definition of a Cluster: A cluster is a set (2+) server nodes dedicated to keep application services alive, communicating through the cluster software/framework with eachother, test and probe health status of servernodes/services and with quorum based decisions and with switchover/failover techniques keep the application services running on them available. That is, should a node that runs a service unexpectedly lose functionality/connection, the other ones would take over the and run the services, so that availability is guaranteed. To provide availability while strictly sticking to a consistent clusterconfiguration is the main goal of a cluster.  At this point we have to add that this defines a HA-cluster, a High-Availability cluster, where the clusternodes are planned to run the services in an active-standby, or failover fashion. An example could be a single instance database. Some applications can be run in a distributed or scalable fashion. In the latter case instances of the application run actively on separate clusternodes serving servicerequests simultaneously. An example for this version could be a webserver that forwards connection requests to many backend servers in a round-robin way. Or a database running in active-active RAC setup.  -------- Cluster arhitecture, interconnect, topologies -------- Now, what is a cluster made of? Servers, right. These servers (the clusternodes) need to communicate. This of course happens over the network, usually over dedicated network interfaces interconnecting all the clusternodes. These connection are called interconnects.How many clusternodes are in a cluster? There are different cluster topologies. The most simple one is a clustered pair topology, involving only two clusternodes:  There are several more topologies, clicking the image above will take you to the relevant documentation. Also, to answer the question Solaris Cluster allows you to run up to 16 servers in a cluster. Where shall these clusternodes be placed? A very important question. The right answer is: It depends on what you plan to achieve with the cluster. Do you plan to avoid only a server outage? Then you can place them right next to eachother in the datacenter. Do you need to avoid DataCenter outage? In that case of course you should place them at least in different fire zones. Or in two geographically distant DataCenters to avoid disasters like floods, large-scale fires or power outages. We call this a stretched- or campus cluster, the clusternodes being several kilometers away from eachother. To cover really large distances, you probably need to move to a GeoCluster, which is a different kind of animal.  What is a geocluster? A Geographic Cluster in Solaris Cluster terms is actually a metacluster between two, separate (locally-HA) clusters.  -------- Cluster resource types, agents, resources, resource groups -------- So how does the cluster manage my applications? The cluster needs to start, stop and probe your applications. If you application runs, the cluster needs to check regularly if the application state is healthy, does it respond over the network, does it have all the processes running, etc. This is called probing. If the cluster deems the application is in a faulty state, then it can try to restart it locally or decide to switch (stop on node A, start on node B) the service. Starting, stopping and probing are the three actions that a cluster agent does. There are many different kinds of agents included in Solaris Cluster, but you can build your own too. Examples are an agent that manages (mounts, moves) ZFS filesystems, or the Oracle DB HA agent that cares about the database, or an agent that moves a floating IP address between nodes. There are lots of other agents included for Apache, Tomcat, MySQL, Oracle DB, Oracle Weblogic, Zones, LDoms, NFS, DNS, etc.We also need to clarify the difference between a cluster resource and the cluster resource group.A cluster resource is something that is managed by a cluster agent. Cluster resource types are included in Solaris cluster (see above, e.g. HAStoragePlus, HA-Oracle, LogicalHost). You can group cluster resources into cluster resourcegroups, and switch these groups together from one node to another. To stick to the example above, to move an Oracle DB service from one node to another, you have to switch the group between nodes, and the agents of the cluster resources in the group will do the following:  On node A Shut down the DB Unconfigure the LogicalHost IP the DB Listener listens on unmount the filesystem   Then, on node B: mount the FS configure the IP  startup the DB -------- Voting, Quorum, Split Brain Condition, Fencing, Amnesia -------- How do the clusternodes agree upon their action? How do they decide which node runs what services? Another important question. Running a cluster is a strictly democratic thing.Every node has votes, and you need the majority of votes to have the deciding power. Now, this is usually no problem, clusternodes think very much all alike. Still, every action needs to be governed upon in a productive system, and has to be agreed upon. Agreeing is easy as long as the clusternodes all behave and talk to eachother over the interconnect. But if the interconnect is gone/down, this all gets tricky and confusing. Clusternodes think like this: "My job is to run these services. The other node does not answer my interconnect communication, it must be down. I'd better take control and run the services!". The problem is, as I have already mentioned, clusternodes very much think alike. If the interconnect is gone, they all assume the other node is down, and they all want to mount the data backend, enable the IP and run the database. Double IPs, double mounts, double DB instances - now that is trouble. Also, in a 2-node cluster they both have only 50% of the votes, that is, they themselves alone are not allowed to run a cluster.  This is where you need a quorum device. According to Wikipedia, the "requirement for a quorum is protection against totally unrepresentative action in the name of the body by an unduly small number of persons.". They need additional votes to run the cluster. For this requirement a 2-node cluster needs a quorum device or a quorum server. If the interconnect is gone, (this is what we call a split brain condition) both nodes start to race and try to reserve the quorum device to themselves. They do this, because the quorum device bears an additional vote, that could ensure majority (50% +1). The one that manages to lock the quorum device (e.g. if it's an FC LUN, it SCSI reserves it) wins the right to build/run a cluster, the other one - realizing he was late - panics/reboots to ensure the cluster config stays consistent.  Losing the interconnect isn't only endangering the availability of services, but it also endangers the cluster configuration consistence. Just imagine node A being down and during that the cluster configuration changes. Now node B goes down, and node A comes up. It isn't uptodate about the cluster configuration's changes so it will refuse to start a cluster, since that would lead to cluster amnesia, that is the cluster had some changes, but now runs with an older cluster configuration repository state, that is it's like it forgot about the changes.  Also, to ensure application data consistence, the clusternode that wins the race makes sure that a server that isn't part of or can't currently join the cluster can access the devices. This procedure is called fencing. This usually happens to storage LUNs via SCSI reservation.  Now, another important question: Where do I place the quorum disk?  Imagine having two sites, two separate datacenters, one in the north of the city and the other one in the south part of it. You run a stretched cluster in the clustered pair topology. Where do you place the quorum disk/server? If you put it into the north DC, and that gets hit by a meteor, you lose one clusternode, which isn't a problem, but you also lose your quorum, and the south clusternode can't keep the cluster running lacking the votes. This problem can't be solved with two sites and a campus cluster. You will need a third site to either place the quorum server to, or a third clusternode. Otherwise, lacking majority, if you lose the site that had your quorum, you lose the cluster. Okay, we covered the very basics. We haven't talked about virtualization support, CCR, ClusterFilesystems, DID devices, affinities, storage-replication, management tools, upgrade procedures - should those be interesting for you, let me know in the comments, along with any other questions. Given enough demand I'd be glad to write a followup post too. Now I really want to move on to the second part in the series: ClusterInstallation.  Oh, as for additional source of information, I recommend the documentation: http://docs.oracle.com/cd/E23623_01/index.html, and the OTN Oracle Solaris Cluster site: http://www.oracle.com/technetwork/server-storage/solaris-cluster/index.html

    Read the article

  • ??????????·????????????Oracle Solaris Cluster 4.0?

    - by kazun
    ??????OS?Oracle Solaris 11???????????????·????????????Oracle Solaris Cluster 4.0???2011?12?6????????????Oracle Solaris 11 ???????????????????????????Oracle Solaris Cluster 4.0 ???????????????????????? Oracle Solaris Cluster 4.0 ?3?????? - Oracle Solaris Cluster ???????????????????????????????????????????????? - Oracle Solaris 11 ?????? - Oracle Solaris Cluster 4.0?????????????????????????????????????? Oracle Solaris Cluster 4.0?? Oracle Solaris Cluster? Oracle Solaris ??????????????????????????????????????(HA)??????·????(DR)????????? Oracle Solaris 11 ??????????????? Oracle Solaris 11 Image Packaging System (IPS)???????????????????????????????????????????????????????? Oracle Solaris 11 ?????????(Automated Installer)????Oracle Solaris???Oracle Solaris Cluster????????????????????????? Oracle Solaris 11??????? Oracle Solaris 11 Zone Cluster ?????????????????????????????Solaris 11 native ??????????????????????????????????????????????????? ?????????????????????????????????????????Solaris 10 ???? Solaris 11 native ????????? Oracle VM Server for SPARC 2.1????????? ?:Zone Cluster (?)?Failover Zone(?)???? ?????·?????? Oracle Solaris Cluster Geographic Edition ?????????????????????·????????????? ?????????????????????????????????/?????????????????????? Oracle Data Guard(Oracle ?????? 11.2.0.3 ?????)?Availability Suite Feature of Oracle Solaris(Oracle Solaris 11 SRU1 ?????)?Oracle Solaris Cluster Geographic Edition script-based plug-ins?????????·??????????????????? ?:?????·???? ?????????????? Oracle Solaris Cluster 4.0??Apache?Apache Tomcat?DHCP?DNS?NFS???Oracle Solaris 11?????????Oracle Database 11g(?????????????Oracle Real Application Clusters)?Oracle WebLogic Server???????·???????????????·????????????????????????????????????????????????API??????????????????? Oracle Solaris Cluster 4.0??????????????????????????????????????????????????? ??????????? IPS ?????(???????? 30 ??????) Oracle Software Delivery Cloud (IPS?????????) (???? (??????????) ???? 30 ??????) OTN (IPS?????????)(?????????? ) ?????? ??????????????????? ???? Oracle Solaris Cluster Oracle Solaris Oracle's Sun Server and Storage Systems ???? Oracle Solaris Cluste Oracle Solaris ?Oracle Solaris Cluster Oasis?Blog

    Read the article

  • Does the mysql Client API Library version have to match the installed MySQL/Percona server version?

    - by William Jamieson
    I'm running Scientific Linux 6.3 (binary compaible with Redhat/CentOS/etc..) as a LAMP stack. I've installed Percona server and client v5.5 from the Percona yum repository. However when I run phpinfo() I notice that under the MySQL and mysqli sections, it lists the Client API Library version as 5.1.66, and not 5.5x. I'm guessing these need to match, at least to major versions, and I have no idea what the possible consequences of such a mismatch could be. Do I need to revert to Percona server and client v5.1? This is for a production environment so it needs to be right. I'd appreciate any input or experience people could offer. I'm running Scientific Linux 6.3 (binary compaible with Redhat/CentOS/etc..) as a LAMP stack. I've installed Percona server and client v5.5 from the Percona yum repository. However when I run phpinfo() I notice that under the MySQL and mysqli sections, it lists the Client API Library version as 5.1.66, and not 5.5x. I'm guessing these need to match, at least to major versions, and I have no idea what the possible consequences of such a mismatch could be. Do I need to revert to Percona server and client v5.1? This is for a production environment so it needs to be right. I'd appreciate any input or experience people could offer. (Note I will also be cross posting this on the Percona forums)

    Read the article

  • Mysql cluster strange behaviour

    - by Champion
    Hi Guys, I have 2 mysql clusters on two different servers with management node on each of them. It went down someway. I ran following commands to start the cluster: Start the management node on srv1: srv1: mysqlc/bin/ndb_mgmd --initial -f my_cluster/conf/config.ini --configdir=/home/mysql_cluster/my_cluster/conf Start the management node on srv2: srv2: mysqlc/bin/ndb_mgmd --initial -f my_cluster/conf/config.ini --configdir=/home/mysql_cluster/my_cluster/conf Start the ndbd nodes on srv1: srv1: mysqlc/bin/ndbd --initial -c localhost:1186 Start the ndbd nodes on srv2: srv2: mysqlc/bin/ndbd --initial -c localhost:1186 Start mysqld server on srv1: srv1: mysqlc/bin/mysqld --defaults-file=my_cluster/conf/my.cnf --user=root & and here is the problem. mysql server not loading the data. Only database names are present. All the tables which are ENGINE=ndbcluster are not being loaded. Tables with ENGINE=myisam are being loaded. Backup scripts helped me load the data. But this way I can't use cluster setup. Similar issue appeared when i started srv2. How can I resolve this issue ?

    Read the article

  • Ubuntu upgrade process failed

    - by Spin0us
    I tried to dist-upgrade my ubuntu server on my percona cluster but it failed with this message The following packages have unmet dependencies: libmysqlclient18 : Depends: libmariadbclient18 (= 5.5.33a+maria-1~precise) but it is not installable And here is the package listing # dpkg --list | grep -E 'percona|mysql' ii libdbd-mysql-perl 4.020-1build2 Perl5 database interface to the MySQL database iU libmysqlclient18 5.5.33a+maria-1~precise Virtual package to satisfy external depends ii mariadb-common 5.5.33a+maria-1~precise MariaDB database common files (e.g. /etc/mysql/conf.d/mariadb.cnf) ii percona-xtrabackup 2.1.5-680-1.precise Open source backup tool for InnoDB and XtraDB ii percona-xtradb-cluster-client-5.5 5.5.31-23.7.5-438.precise Percona Server database client binaries ii percona-xtradb-cluster-common-5.5 5.5.33-23.7.6-496.precise Percona Server database common files (e.g. /etc/mysql/my.cnf) ii percona-xtradb-cluster-galera-2.x 157.precise Galera components of Percona XtraDB Cluster ii percona-xtradb-cluster-server-5.5 5.5.31-23.7.5-438.precise Percona Server database server binaries ii php5-mysql 5.3.10-1ubuntu3.8 MySQL module for php5 During the install of the server, mariadb and galera cluster have first been installed. Then removed to be replaced by percona XtraDBCluster. So i think this is the source of the problem. But how can i resolve this without reinstalling all ? UPDATE 1 # apt-cache policy libmariadbclient18 libmariadbclient18: Installed: (none) Candidate: (none) Version table: 5.5.32+maria-1~precise 0 100 /var/lib/dpkg/status

    Read the article

  • ORA-28001 the password has expired error in Solaris Cluster

    - by Onur Bingul
    Solaris Cluster start or stop Oracle database using credentials of a specified user in Oracle Database. If you have issues with starting of Oracle Database resource and see ORA-28001 error message in /var/adm/messages it means that database user's who is used by Solaris Cluster to start Oracle database, password has expired. To resolve the issue reset the password of the Oracle database user SQL> alter user user_name identified by password  and change connection string in Solaris Cluster using following command -bash-3.2 # /usr/cluster/bin/clresource set -p Connect_string="user/password" oracle_resource

    Read the article

  • Configuring MySQL Cluster Data Nodes

    - by Mat Keep
    0 0 1 692 3948 Homework 32 9 4631 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} In my previous blog post, I discussed the enhanced performance and scalability delivered by extensions to the multi-threaded data nodes in MySQL Cluster 7.2. In this post, I’ll share best practices on the configuration of data nodes to achieve optimum performance on the latest generations of multi-core, multi-thread CPU designs. Configuring the Data Nodes The configuration of data node threads can be managed in two ways via the config.ini file: - Simply set MaxNoOfExecutionThreads to the appropriate number of threads to be run in the data node, based on the number of threads presented by the processors used in the host or VM. - Use the new ThreadConfig variable that enables users to configure both the number of each thread type to use and also which CPUs to bind them too. The flexible configuration afforded by the multi-threaded data node enhancements means that it is possible to optimise data nodes to use anything from a single CPU/thread up to a 48 CPU/thread server. Co-locating the MySQL Server with a single data node can fully utilize servers with 64 – 80 CPU/threads. It is also possible to co-locate multiple data nodes per server, but this is now only required for very large servers with 4+ CPU sockets dense multi-core processors. 24 Threads and Beyond! An example of how to make best use of a 24 CPU/thread server box is to configure the following: - 8 ldm threads - 4 tc threads - 3 recv threads - 3 send threads - 1 rep thread for asynchronous replication. Each of those threads should be bound to a CPU. It is possible to bind the main thread (schema management domain) and the IO threads to the same CPU in most installations. In the configuration above, we have bound threads to 20 different CPUs. We should also protect these 20 CPUs from interrupts by using the IRQBALANCE_BANNED_CPUS configuration variable in /etc/sysconfig/irqbalance and setting it to 0x0FFFFF. The reason for doing this is that MySQL Cluster generates a lot of interrupt and OS kernel processing, and so it is recommended to separate activity across CPUs to ensure conflicts with the MySQL Cluster threads are eliminated. When booting a Linux kernel it is also possible to provide an option isolcpus=0-19 in grub.conf. The result is that the Linux scheduler won't use these CPUs for any task. Only by using CPU affinity syscalls can a process be made to run on those CPUs. By using this approach, together with binding MySQL Cluster threads to specific CPUs and banning CPUs IRQ processing on these tasks, a very stable performance environment is created for a MySQL Cluster data node. On a 32 CPU/Thread server: - Increase the number of ldm threads to 12 - Increase tc threads to 6 - Provide 2 more CPUs for the OS and interrupts. - The number of send and receive threads should, in most cases, still be sufficient. On a 40 CPU/Thread server, increase ldm threads to 16, tc threads to 8 and increment send and receive threads to 4. On a 48 CPU/Thread server it is possible to optimize further by using: - 12 tc threads - 2 more CPUs for the OS and interrupts - Avoid using IO threads and main thread on same CPU - Add 1 more receive thread. Summary As both this and the previous post seek to demonstrate, the multi-threaded data node extensions not only serve to increase performance of MySQL Cluster, they also enable users to achieve significantly improved levels of utilization from current and future generations of massively multi-core, multi-thread processor designs. A big thanks to Mikael Ronstrom, Senior MySQL Architect at Oracle, for his work in developing these enhancements and best practices. You can download MySQL Cluster 7.2 today and try out all of these enhancements. The Getting Started guides are an invaluable aid to quickly building a Proof of Concept Don’t forget to check out the MySQL Cluster 7.2 New Features whitepaper to discover everything that is new in the latest GA release

    Read the article

  • Cluster failover and strange gratuitous arp behavior

    - by lazerpld
    I am experiencing a strange Windows 2008R2 cluster related issue that is bothering me. I feel that I have come close as to what the issue is, but still don't fully understand what is happening. I have a two node exchange 2007 cluster running on two 2008R2 servers. The exchange cluster application works fine when running on the "primary" cluster node. The problem occurs when failing over the cluster ressource to the secondary node. When failing over the cluster to the "secondary" node, which for instance is on the same subnet as the "primary", the failover initially works ok and the cluster ressource continues to work for a couple of minutes on the new node. Which means that the recieving node does send out a gratuitous arp reply packet that updated the arp tables on the network. But after x amount of time (typically within 5 minutes time) something updates the arp-tables again because all of a sudden the cluster service does not answer to pings. So basically I start a ping to the exchange cluster address when its running on the "primary node". It works just great. I failover the cluster ressource group to the "secondary node" and I only have loss of one ping which is acceptable. The cluster ressource still answers for some time after being failed over and all of a sudden the ping starts timing out. This is telling me that the arp table initially is updated by the secondary node, but then something (which I haven't found out yet) wrongfully updates it again, probably with the primary node's MAC. Why does this happen - has anyone experienced the same problem? The cluster is NOT running NLB and the problem stops immidiately after failing over back to the primary node where there are no problems. Each node is using NIC teaming (intel) with ALB. Each node is on the same subnet and has gateway and so on entered correctly as far as I am concerned. Edit: I was wondering if it could be related to network binding order maybe? Because I have noticed that the only difference I can see from node to node is when showing the local arp table. On the "primary" node the arp table is generated on the cluster address as the source. While on the "secondary" its generated from the nodes own network card. Any input on this? Edit: Ok here is the connection layout. Cluster address: A.B.6.208/25 Exchange application address: A.B.6.212/25 Node A: 3 physical nics. Two teamed using intels teaming with the address A.B.6.210/25 called public The last one used for cluster traffic called private with 10.0.0.138/24 Node B: 3 physical nics. Two teamed using intels teaming with the address A.B.6.211/25 called public The last one used for cluster traffic called private with 10.0.0.139/24 Each node sits in a seperate datacenter connected together. End switches being cisco in DC1 and NEXUS 5000/2000 in DC2. Edit: I have been testing a little more. I have now created an empty application on the same cluster, and given it another ip address on the same subnet as the exchange application. After failing this empty application over, I see the exact same problem occuring. After one or two minutes clients on other subnets cannot ping the virtual ip of the application. But while clients on other subnets cannot, another server from another cluster on the same subnet has no trouble pinging. But if i then make another failover to the original state, then the situation is the opposite. So now clients on same subnet cannot, and on other they can. We have another cluster set up the same way and on the same subnet, with the same intel network cards, the same drivers and same teaming settings. Here we are not seeing this. So its somewhat confusing. Edit: OK done some more research. Removed the NIC teaming of the secondary node, since it didnt work anyway. After some standard problems following that, I finally managed to get it up and running again with the old NIC teaming settings on one single physical network card. Now I am not able to reproduce the problem described above. So it is somehow related to the teaming - maybe some kind of bug? Edit: Did some more failing over without being able to make it fail. So removing the NIC team looks like it was a workaround. Now I tried to reestablish the intel NIC teaming with ALB (as it was before) and i still cannot make it fail. This is annoying due to the fact that now i actually cannot pinpoint the root of the problem. Now it just seems to be some kind of MS/intel hick-up - which is hard to accept because what if the problem reoccurs in 14 days? There is a strange thing that happened though. After recreating the NIC team I was not able to rename the team to "PUBLIC" which the old team was called. So something has not been cleaned up in windows - although the server HAS been restarted! Edit: OK after restablishing the ALB teaming the error came back. So I am now going to do some thorough testing and i will get back with my observations. One thing is for sure. It is related to Intel 82575EB NICS, ALB and Gratuitous Arp.

    Read the article

  • Oracle Solaris Cluster at Oracle OpenWorld 2012

    - by evek
    Once again Oracle OpenWorld is taking over San Francisco's Moscone Center.  Once Again Oracle Solaris Cluster will be present at the event. Please come and visit us in the Oracle DEMOgrounds in Moscone South.  Take the time to stop by at the Oracle Solaris Cluster demo pod (S-116): you will meet some of our architects, tech leads and product managers... And if you are interested in sessions showing the use of Oracle Solaris Cluster check our Focus On document. Have a great show and hope to meet you there.

    Read the article

  • Percona MySQL 5.5 fails to start

    - by keymone
    trying to setup new server here but keep getting this in error log: mysqld_safe Starting mysqld daemon with databases from /data/mysql/myisam [Warning] Can't create test file /data/mysql/myisam/hostname.lower-test [Warning] Can't create test file /data/mysql/myisam/hostname.lower-test [Note] Flashcache bypass: disabled [Note] Flashcache setup error is : setmntent failed /usr/sbin/mysqld: File '/var/mysql/bin/bin-log.index' not found (Errcode: 13) [ERROR] Aborting [Note] /usr/sbin/mysqld: Shutdown complete mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended everything under /data/mysql (it's ibdata and myisam folders) is owned my mysql:mysql and has proper permissions same goes for folders with bin and relay logs under /var/mysql apparmor is purged from server any ideas? PS it seems like something else apart from apparmor is affecting permissions to access mysql files after i changed data directory to more default one - /var/lib/mysql and "Can't create test file" error is gone, but "'/var/mysql/bin/bin-log.index' not found (Errcode: 13)" is still there PPS so i installed apparmor back and added all folders to mysqld's profile and errors mentioned above are now gone(or mysql doesn't even get to that point now) what i have now is this: /usr/sbin/mysqld: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory banging my head against the wall.

    Read the article

  • Tutorial: Getting Started with the NoSQL JavaScript / Node.js API for MySQL Cluster

    - by Mat Keep
    Tutorial authored by Craig Russell and JD Duncan  The MySQL Cluster team are working on a new NoSQL JavaScript connector for MySQL. The objectives are simplicity and high performance for JavaScript users: - allows end-to-end JavaScript development, from the browser to the server and now to the world's most popular open source database - native "NoSQL" access to the storage layer without going first through SQL transformations and parsing. Node.js is a complete web platform built around JavaScript designed to deliver millions of client connections on commodity hardware. With the MySQL NoSQL Connector for JavaScript, Node.js users can easily add data access and persistence to their web, cloud, social and mobile applications. While the initial implementation is designed to plug and play with Node.js, the actual implementation doesn't depend heavily on Node, potentially enabling wider platform support in the future. Implementation The architecture and user interface of this connector are very different from other MySQL connectors in a major way: it is an asynchronous interface that follows the event model built into Node.js. To make it as easy as possible, we decided to use a domain object model to store the data. This allows for users to query data from the database and have a fully-instantiated object to work with, instead of having to deal with rows and columns of the database. The domain object model can have any user behavior that is desired, with the NoSQL connector providing the data from the database. To make it as fast as possible, we use a direct connection from the user's address space to the database. This approach means that no SQL (pun intended) is needed to get to the data, and no SQL server is between the user and the data. The connector is being developed to be extensible to multiple underlying database technologies, including direct, native access to both the MySQL Cluster "ndb" and InnoDB storage engines. The connector integrates the MySQL Cluster native API library directly within the Node.js platform itself, enabling developers to seamlessly couple their high performance, distributed applications with a high performance, distributed, persistence layer delivering 99.999% availability. The following sections take you through how to connect to MySQL, query the data and how to get started. Connecting to the database A Session is the main user access path to the database. You can get a Session object directly from the connector using the openSession function: var nosql = require("mysql-js"); var dbProperties = {     "implementation" : "ndb",     "database" : "test" }; nosql.openSession(dbProperties, null, onSession); The openSession function calls back into the application upon creating a Session. The Session is then used to create, delete, update, and read objects. Reading data The Session can read data from the database in a number of ways. If you simply want the data from the database, you provide a table name and the key of the row that you want. For example, consider this schema: create table employee (   id int not null primary key,   name varchar(32),   salary float ) ENGINE=ndbcluster; Since the primary key is a number, you can provide the key as a number to the find function. function onSession = function(err, session) {   if (err) {     console.log(err);     ... error handling   }   session.find('employee', 0, onData); }; function onData = function(err, data) {   if (err) {     console.log(err);     ... error handling   }   console.log('Found: ', JSON.stringify(data));   ... use data in application }; If you want to have the data stored in your own domain model, you tell the connector which table your domain model uses, by specifying an annotation, and pass your domain model to the find function. var annotations = new nosql.Annotations(); function Employee = function(id, name, salary) {   this.id = id;   this.name = name;   this.salary = salary;   this.giveRaise = function(percent) {     this.salary *= percent;   } }; annotations.mapClass(Employee, {'table' : 'employee'}); function onSession = function(err, session) {   if (err) {     console.log(err);     ... error handling   }   session.find(Employee, 0, onData); }; Updating data You can update the emp instance in memory, but to make the raise persistent, you need to write it back to the database, using the update function. function onData = function(err, emp) {   if (err) {     console.log(err);     ... error handling   }   console.log('Found: ', JSON.stringify(emp));   emp.giveRaise(0.12); // gee, thanks!   session.update(emp); // oops, session is out of scope here }; Using JavaScript can be tricky because it does not have the concept of block scope for variables. You can create a closure to handle these variables, or use a feature of the connector to remember your variables. The connector api takes a fixed number of parameters and returns a fixed number of result parameters to the callback function. But the connector will keep track of variables for you and return them to the callback. So in the above example, change the onSession function to remember the session variable, and you can refer to it in the onData function: function onSession = function(err, session) {   if (err) {     console.log(err);     ... error handling   }   session.find(Employee, 0, onData, session); }; function onData = function(err, emp, session) {   if (err) {     console.log(err);     ... error handling   }   console.log('Found: ', JSON.stringify(emp));   emp.giveRaise(0.12); // gee, thanks!   session.update(emp, onUpdate); // session is now in scope }; function onUpdate = function(err, emp) {   if (err) {     console.log(err);     ... error handling   } Inserting data Inserting data requires a mapped JavaScript user function (constructor) and a session. Create a variable and persist it: function onSession = function(err, session) {   var data = new Employee(999, 'Mat Keep', 20000000);   session.persist(data, onInsert);   } }; Deleting data To remove data from the database, use the session remove function. You use an instance of the domain object to identify the row you want to remove. Only the key field is relevant. function onSession = function(err, session) {   var key = new Employee(999);   session.remove(Employee, onDelete);   } }; More extensive queries We are working on the implementation of more extensive queries along the lines of the criteria query api. Stay tuned. How to evaluate The MySQL Connector for JavaScript is available for download from labs.mysql.com. Select the build: MySQL-Cluster-NoSQL-Connector-for-Node-js You can also clone the project on GitHub Since it is still early in development, feedback is especially valuable (so don't hesitate to leave comments on this blog, or head to the MySQL Cluster forum). Try it out and see how easy (and fast) it is to integrate MySQL Cluster into your Node.js platforms. You can learn more about other previewed functionality of MySQL Cluster 7.3 here

    Read the article

  • MySQL Cluster ndb_mgmd error

    - by Patryk
    I set up MySQL Cluster on Ubuntu. My ndb_mgmd.cnf file looked: [NDBD DEFAULT] NoOfReplicas=2 DataDir= /var/lib/mysql-cluster # Management Node [NDB_MGMD] NodeId=1 HostName=192.168.204.20 DataDir=/var/lib/mysql-cluster # Storage Nodes (one for each node) [NDBD] NodeId=2 HostName=192.168.204.25 DataDir=/var/lib/mysql-cluster [NDBD] NodeId=3 HostName=192.168.204.26 DataDir=/var/lib/mysql-cluster # SQL Nodes (one for each node) [MYSQLD] NodeId=4 HostName=192.168.204.30 Now I want to edit this configuration, so I changed this file: [NDBD DEFAULT] NoOfReplicas=2 DataDir= /var/lib/mysql-cluster # Management Node [NDB_MGMD] NodeId=1 HostName=192.168.204.20 DataDir=/var/lib/mysql-cluster # Storage Nodes (one for each node) [NDBD] NodeId=2 HostName=192.168.204.25 DataDir=/var/lib/mysql-cluster [NDBD] NodeId=3 HostName=192.168.204.26 DataDir=/var/lib/mysql-cluster # SQL Nodes (one for each node) [MYSQLD] NodeId=4 HostName=192.168.204.25 [MYSQLD] NodeId=5 HostName=192.168.204.26 But ndb_mgm > show; still shows: Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 (not connected, accepting connect from 192.168.204.25) id=3 (not connected, accepting connect from 192.168.204.26) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.204.20 (mysql-5.1.51 ndb-7.1.9) [mysqld(API)] 1 node(s) id=4 (not connected, accepting connect from 192.168.204.30) I tried: sudo /etc/init.d/mysql-ndb-mgm restart sudo ndb_mgmd --initial sudo ndb_mgmd -f /etc/mysql/ndb_mgmd.cnf And nothing works. Any help?

    Read the article

  • Announcing: Oracle Solaris Cluster Product Bulletin, May 2014

    - by uwes
    New qualifications announcements and general news for Oracle Solaris Cluster products can be found in the new Product Bulletin Hardware Qualifications New Oracle ZFS Storage Appliance Kit versions with Oracle Solaris Cluster 4.1 geographic cluster Pillar AXIOM 600 with Oracle Solaris Cluster 4.1 on x86 Software Qualifications SAP Livecache 7.9 and MAXDB 7.9 Oracle Weblogic Server 12.1.2 Latest Support Information Oracle Solaris Cluster 4.1 SRU 7 (4.1.7) Oracle Solaris Cluster 3.3 3/13 patch train #5 Resources Configuration guides and documentation Product Update Bulletin Archives Contacts Please read the Product Bulletin on Oracle HW TRC for more details. (If you are not registered on Oracle HW TRC, click here ... and follow the instructions..) _____________________________________________________________________ For More Information Go To:Oracle.com Oracle Solaris Cluster page Oracle Technology Network Oracle Solaris Cluster pageOracle Solaris Cluster MOS communityPartner web Oracle Solaris Cluster pageOracle Solaris Cluster Blog

    Read the article

  • unable to destroy windows 2008 r2 failover cluster after SAN rebuild

    - by Zack
    I created a windows 2008 r2 failover cluster for a sql 2008 active/passive cluster. This two node cluster was using a SAN device for a quorum disk resource as well as MSDTC resource. Well....I decided to reconfigure the SAN device, but I didn't destroy the cluster first. Now that the quorum disk and mstdc disk are completely gone, the cluster is obviously not working. But, I can't even destroy the cluster and start again. I've tried from the Windows Clustering tool, as well as the command line. I was able to get the cluster service to start using the "/fixquorum" parameter. After doing this I was able to remove the passive node from the cluster, but it wouldn't let me destroy the cluster because the default resource group and msdtc are still attached as resources. I tried to delete these resources from both the GUI tool, as well as command line. It will either freeze for several minutes and crash the program, or once it even BSOD'd the server. Can someone advise on how to destroy this cluster so I can start over?

    Read the article

  • What are the pros & cons of these MySQL engines for OLTP -- XtraDB, PBXT, or TokuDB?

    - by Continuation
    I'm working on a social website with an approximate read/write split of 90/10. Trying to decide on a MySQL engine. The ones I'm interested in are: XtraDB PBXT TokuDB What are the pros and cons of them for my use case? A few specific questions: PBXT uses log-based structure that avoids double-writes. It sounds very elegant, but the benchmark I've seen doesn't show any/much advantages over XtraDB. Do you have any experience with PBXT/XtraDB you can share? TokuDB sounds VERY interesting. But all the benchmarks I've seen are about single-threaded bulk inserts - inserting 100M rows for example. that's not very relevant for OLTP. What about its performance with large number of concurrent threads writing and reading at the same time? Anyone has tried that?

    Read the article

  • RPC Server Unavailable on Hyper-V cluster when moving resources after the host adapter has failed

    - by Doug Luxem
    On a Windows 2008 R2 SP1 cluster running Hyper-V, a lost network connectivity on the primary host interface. The interface was rapidly flapping up and down, and this was later determined to be caused by a faulty switch port. As this was a clustered server, the host interface was not fault tolerant (seeing as how the whole server was fault tolerant), so connectivity to the host was going up and down. The Hyper-V guests were completely unaffected by the network outage as they used a dedicated trunk on the server separate from the host interface. Additionally, dedicated interfaces for the cluster and live migration networks were fine. In order to diagnose the server, I tried to move all resources (Hyper-V Guests) to other nodes through Failover Cluster Manager. These moves failed with an error RPC Server Unavailable. The only way to move resources was by shutting down the guests, stopping the cluster service on the Node A, allowing other nodes to take ownership of the resources, and restarting the guests. A few other notes: All nodes have Client for MS Networks and File & Printer Sharing enabled on the Cluster and LM networks. Node A was accessible over cluster and LM networks from other nodes (these are private, cluster-only networks); pingable, CIFs, etc. Accessing \\NODEA is done over the Host adapters, as you would expect in this case and is the reason for the RPC Server Unavailable error with that adapter being down. My questions here are - Is there a way to still use Live Migration in a failure scenario such as this to prevent shutting down the Hyper-V guests? How can the network be reconfigured in the future so that the cluster service attempts to use the cluster and/or live migration networks to issue the RPC requests?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >