Search Results

Search found 504 results on 21 pages for 'failover'.

Page 1/21 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Force failover a Cisco ASA

- by user974896

I have two ASA in a lan state primary\secondary configuration. None of them have "failover active" or "no failover active" in their configuration. Would it be proper to failover in a manner such as: Log into console of primary unit and issue "failover lan state secondary", log into the console of the original secondary unit and issue "failover lan state primary". To fail back simply reverse the process or Log into the console of the primary unit and issue "no failover active", log into the console of the original secondary unit and issue "failover active". To fail back issue "failover active" on the original primary (now secondary) unit, and "no failover active" on the now primary unit. I do not like the second method because it adds configuration directives that were not in place before. Will the first method work?

Read the article
SQL Server 2012 - AlwaysOn

- by Claus Jandausch

Ich war nicht nur irritiert, ich war sogar regelrecht schockiert - und für einen kurzen Moment sprachlos (was nur selten der Fall ist). Gerade eben hatte mich jemand gefragt "Wann Oracle denn etwas Vergleichbares wie AlwaysOn bieten würde - und ob überhaupt?" War ich hier im falschen Film gelandet? Ich konnte nicht anders, als meinen Unmut kundzutun und zu erklären, dass die Fragestellung normalerweise anders herum läuft. Zugegeben - es mag vielleicht strittige Punkte geben im Vergleich zwischen Oracle und SQL Server - bei denen nicht unbedingt immer Oracle die Nase vorn haben muss - aber das Thema Clustering für Hochverfügbarkeit (HA), Disaster Recovery (DR) und Skalierbarkeit gehört mit Sicherheit nicht dazu. Dieses Erlebnis hakte ich am Nachgang als Einzelfall ab, der so nie wieder vorkommen würde. Bis ich kurz darauf eines Besseren belehrt wurde und genau die selbe Frage erneut zu hören bekam. Diesmal sogar im Exadata-Umfeld und einem Oracle Stretch Cluster. Einmal ist keinmal, doch zweimal ist einmal zu viel... Getreu diesem alten Motto war mir klar, dass man das so nicht länger stehen lassen konnte. Ich habe keine Ahnung, wie die Microsoft Marketing Abteilung es geschafft hat, unter dem AlwaysOn Brading eine innovative Technologie vermuten zu lassen - aber sie hat ihren Job scheinbar gut gemacht. Doch abgesehen von einem guten Marketing, stellt sich natürlich die Frage, was wirklich dahinter steckt und wie sich das Ganze mit Oracle vergleichen lässt - und ob überhaupt? Damit wären wir wieder bei der ursprünglichen Frage angelangt. So viel zum Hintergrund dieses Blogbeitrags - von meiner Antwort handelt der restliche Blog. "Windows was the God ..." Um den wahren Unterschied zwischen Oracle und Microsoft verstehen zu können, muss man zunächst das bedeutendste Microsoft Dogma kennen. Es lässt sich schlicht und einfach auf den Punkt bringen: "Alles muss auf Windows basieren." Die Überschrift dieses Absatzes ist kein von mir erfundener Ausspruch, sondern ein Zitat. Konkret stammt es aus einem längeren Artikel von Kurt Eichenwald in der Vanity Fair aus dem August 2012. Er lautet Microsoft's Lost Decade und sei jedem ans Herz gelegt, der die "Microsoft-Maschinerie" unter Steve Ballmer und einige ihrer Kuriositäten besser verstehen möchte. "YOU TALKING TO ME?" Microsoft C.E.O. Steve Ballmer bei seiner Keynote auf der 2012 International Consumer Electronics Show in Las Vegas am 9. Januar Manche Dinge in diesem Artikel mögen überspitzt dargestellt erscheinen - sind sie aber nicht. Vieles davon kannte ich bereits aus eigener Erfahrung und kann es nur bestätigen. Anderes hat sich mir erst so richtig erschlossen. Insbesondere die folgenden Passagen führten zum Aha-Erlebnis: “Windows was the god—everything had to work with Windows,” said Stone... “Every little thing you want to write has to build off of Windows (or other existing roducts),” one software engineer said. “It can be very confusing, …” Ich habe immer schon darauf hingewiesen, dass in einem SQL Server Failover Cluster die Microsoft Datenbank eigentlich nichts Nenneswertes zum Geschehen beiträgt, sondern sich voll und ganz auf das Windows Betriebssystem verlässt. Deshalb muss man auch die Windows Server Enterprise Edition installieren, soll ein Failover Cluster für den SQL Server eingerichtet werden. Denn hier werden die Cluster Services geliefert - nicht mit dem SQL Server. Er ist nur lediglich ein weiteres Server Produkt, für das Windows in Ausfallszenarien genutzt werden kann - so wie Microsoft Exchange beispielsweise, oder Microsoft SharePoint, oder irgendein anderes Server Produkt das auf Windows gehostet wird. Auch Oracle kann damit genutzt werden. Das Stichwort lautet hier: Oracle Failsafe. Nur - warum sollte man das tun, wenn gleichzeitig eine überlegene Technologie wie die Oracle Real Application Clusters (RAC) zur Verfügung steht, die dann auch keine Windows Enterprise Edition voraussetzen, da Oracle die eigene Clusterware liefert. Welche darüber hinaus für kürzere Failover-Zeiten sorgt, da diese Cluster-Technologie Datenbank-integriert ist und sich nicht auf "Dritte" verlässt. Wenn man sich also schon keine technischen Vorteile mit einem SQL Server Failover Cluster erkauft, sondern zusätzlich noch versteckte Lizenzkosten durch die Lizenzierung der Windows Server Enterprise Edition einhandelt, warum hat Microsoft dann in den vergangenen Jahren seit SQL Server 2000 nicht ebenfalls an einer neuen und innovativen Lösung gearbeitet, die mit Oracle RAC mithalten kann? Entwickler hat Microsoft genügend? Am Geld kann es auch nicht liegen? Lesen Sie einfach noch einmal die beiden obenstehenden Zitate und sie werden den Grund verstehen. Anders lässt es sich ja auch gar nicht mehr erklären, dass AlwaysOn aus zwei unterschiedlichen Technologien besteht, die beide jedoch wiederum auf dem Windows Server Failover Clustering (WSFC) basieren. Denn daraus ergeben sich klare Nachteile - aber dazu später mehr. Um AlwaysOn zu verstehen, sollte man sich zunächst kurz in Erinnerung rufen, was Microsoft bisher an HA/DR (High Availability/Desaster Recovery) Lösungen für SQL Server zur Verfügung gestellt hat. Replikation Basiert auf logischer Replikation und Pubisher/Subscriber Architektur Transactional Replication Merge Replication Snapshot Replication Microsoft's Replikation ist vergleichbar mit Oracle GoldenGate. Oracle GoldenGate stellt jedoch die umfassendere Technologie dar und bietet High Performance. Log Shipping Microsoft's Log Shipping stellt eine einfache Technologie dar, die vergleichbar ist mit Oracle Managed Recovery in Oracle Version 7. Das Log Shipping besitzt folgende Merkmale: Transaction Log Backups werden von Primary nach Secondary/ies geschickt Einarbeitung (z.B. Restore) auf jedem Secondary individuell Optionale dritte Server Instanz (Monitor Server) für Überwachung und Alarm Log Restore Unterbrechung möglich für Read-Only Modus (Secondary) Keine Unterstützung von Automatic Failover Database Mirroring Microsoft's Database Mirroring wurde verfügbar mit SQL Server 2005, sah aus wie Oracle Data Guard in Oracle 9i, war funktional jedoch nicht so umfassend. Für ein HA/DR Paar besteht eine 1:1 Beziehung, um die produktive Datenbank (Principle DB) abzusichern. Auf der Standby Datenbank (Mirrored DB) werden alle Insert-, Update- und Delete-Operationen nachgezogen. Modi Synchron (High-Safety Modus) Asynchron (High-Performance Modus) Automatic Failover Unterstützt im High-Safety Modus (synchron) Witness Server vorausgesetzt Zur Frage der Kontinuität Es stellt sich die Frage, wie es um diesen Technologien nun im Zusammenhang mit SQL Server 2012 bestellt ist. Unter Fanfaren seinerzeit eingeführt, war Database Mirroring das erklärte Mittel der Wahl. Ich bin kein Produkt Manager bei Microsoft und kann hierzu nur meine Meinung äußern, aber zieht man den SQL AlwaysOn Team Blog heran, so sieht es nicht gut aus für das Database Mirroring - zumindest nicht langfristig. "Does AlwaysOn Availability Group replace Database Mirroring going forward?” “The short answer is we recommend that you migrate from the mirroring configuration or even mirroring and log shipping configuration to using Availability Group. Database Mirroring will still be available in the Denali release but will be phased out over subsequent releases. Log Shipping will continue to be available in future releases.” Damit wären wir endlich beim eigentlichen Thema angelangt. Was ist eine sogenannte Availability Group und was genau hat es mit der vielversprechend klingenden Bezeichnung AlwaysOn auf sich? SQL Server 2012 - AlwaysOn Zwei HA-Features verstekcne sich hinter dem “AlwaysOn”-Branding. Einmal das AlwaysOn Failover Clustering aka SQL Server Failover Cluster Instances (FCI) - zum Anderen die AlwaysOn Availability Groups. Failover Cluster Instances (FCI) Entspricht ungefähr dem Stretch Cluster Konzept von Oracle Setzt auf Windows Server Failover Clustering (WSFC) auf Bietet HA auf Instanz-Ebene AlwaysOn Availability Groups (Verfügbarkeitsgruppen) Ähnlich der Idee von Consistency Groups, wie in Storage-Level Replikations-Software von z.B. EMC SRDF Abhängigkeiten zu Windows Server Failover Clustering (WSFC) Bietet HA auf Datenbank-Ebene Hinweis: Verwechseln Sie nicht eine SQL Server Datenbank mit einer Oracle Datenbank. Und auch nicht eine Oracle Instanz mit einer SQL Server Instanz. Die gleichen Begriffe haben hier eine andere Bedeutung - nicht selten ein Grund, weshalb Oracle- und Microsoft DBAs schnell aneinander vorbei reden. Denken Sie bei einer SQL Server Datenbank eher an ein Oracle Schema, das kommt der Sache näher. So etwas wie die SQL Server Northwind Datenbank ist vergleichbar mit dem Oracle Scott Schema. Wenn Sie die genauen Unterschiede kennen möchten, finden Sie eine detaillierte Beschreibung in meinem Buch "Oracle10g Release 2 für Windows und .NET", erhältich bei Lehmanns, Amazon, etc. Windows Server Failover Clustering (WSFC) Wie man sieht, basieren beide AlwaysOn Technologien wiederum auf dem Windows Server Failover Clustering (WSFC), um einerseits Hochverfügbarkeit auf Ebene der Instanz zu gewährleisten und andererseits auf der Datenbank-Ebene. Deshalb nun eine kurze Beschreibung der WSFC. Die WSFC sind ein mit dem Windows Betriebssystem geliefertes Infrastruktur-Feature, um HA für Server Anwendungen, wie Microsoft Exchange, SharePoint, SQL Server, etc. zu bieten. So wie jeder andere Cluster, besteht ein WSFC Cluster aus einer Gruppe unabhängiger Server, die zusammenarbeiten, um die Verfügbarkeit einer Applikation oder eines Service zu erhöhen. Falls ein Cluster-Knoten oder -Service ausfällt, kann der auf diesem Knoten bisher gehostete Service automatisch oder manuell auf einen anderen im Cluster verfügbaren Knoten transferriert werden - was allgemein als Failover bekannt ist. Unter SQL Server 2012 verwenden sowohl die AlwaysOn Avalability Groups, als auch die AlwaysOn Failover Cluster Instances die WSFC als Plattformtechnologie, um Komponenten als WSFC Cluster-Ressourcen zu registrieren. Verwandte Ressourcen werden in eine Ressource Group zusammengefasst, die in Abhängigkeit zu anderen WSFC Cluster-Ressourcen gebracht werden kann. Der WSFC Cluster Service kann jetzt die Notwendigkeit zum Neustart der SQL Server Instanz erfassen oder einen automatischen Failover zu einem anderen Server-Knoten im WSFC Cluster auslösen. Failover Cluster Instances (FCI) Eine SQL Server Failover Cluster Instanz (FCI) ist eine einzelne SQL Server Instanz, die in einem Failover Cluster betrieben wird, der aus mehreren Windows Server Failover Clustering (WSFC) Knoten besteht und so HA (High Availability) auf Ebene der Instanz bietet. Unter Verwendung von Multi-Subnet FCI kann auch Remote DR (Disaster Recovery) unterstützt werden. Eine weitere Option für Remote DR besteht darin, eine unter FCI gehostete Datenbank in einer Availability Group zu betreiben. Hierzu später mehr. FCI und WSFC Basis FCI, das für lokale Hochverfügbarkeit der Instanzen genutzt wird, ähnelt der veralteten Architektur eines kalten Cluster (Aktiv-Passiv). Unter SQL Server 2008 wurde diese Technologie SQL Server 2008 Failover Clustering genannt. Sie nutzte den Windows Server Failover Cluster. In SQL Server 2012 hat Microsoft diese Basistechnologie unter der Bezeichnung AlwaysOn zusammengefasst. Es handelt sich aber nach wie vor um die klassische Aktiv-Passiv-Konfiguration. Der Ablauf im Failover-Fall ist wie folgt: Solange kein Hardware-oder System-Fehler auftritt, werden alle Dirty Pages im Buffer Cache auf Platte geschrieben Alle entsprechenden SQL Server Services (Dienste) in der Ressource Gruppe werden auf dem aktiven Knoten gestoppt Die Ownership der Ressource Gruppe wird auf einen anderen Knoten der FCI transferriert Der neue Owner (Besitzer) der Ressource Gruppe startet seine SQL Server Services (Dienste) Die Connection-Anforderungen einer Client-Applikation werden automatisch auf den neuen aktiven Knoten mit dem selben Virtuellen Network Namen (VNN) umgeleitet Abhängig vom Zeitpunkt des letzten Checkpoints, kann die Anzahl der Dirty Pages im Buffer Cache, die noch auf Platte geschrieben werden müssen, zu unvorhersehbar langen Failover-Zeiten führen. Um diese Anzahl zu drosseln, besitzt der SQL Server 2012 eine neue Fähigkeit, die Indirect Checkpoints genannt wird. Indirect Checkpoints ähnelt dem Fast-Start MTTR Target Feature der Oracle Datenbank, das bereits mit Oracle9i verfügbar war. SQL Server Multi-Subnet Clustering Ein SQL Server Multi-Subnet Failover Cluster entspricht vom Konzept her einem Oracle RAC Stretch Cluster. Doch dies ist nur auf den ersten Blick der Fall. Im Gegensatz zu RAC ist in einem lokalen SQL Server Failover Cluster jeweils nur ein Knoten aktiv für eine Datenbank. Für die Datenreplikation zwischen geografisch entfernten Sites verlässt sich Microsoft auf 3rd Party Lösungen für das Storage Mirroring. Die Verbesserung dieses Szenario mit einer SQL Server 2012 Implementierung besteht schlicht darin, dass eine VLAN-Konfiguration (Virtual Local Area Network) nun nicht mehr benötigt wird, so wie dies bisher der Fall war. Das folgende Diagramm stellt dar, wie der Ablauf mit SQL Server 2012 gehandhabt wird. In Site A und Site B wird HA jeweils durch einen lokalen Aktiv-Passiv-Cluster sichergestellt. Besondere Aufmerksamkeit muss hier der Konfiguration und dem Tuning geschenkt werden, da ansonsten völlig inakzeptable Failover-Zeiten resultieren. Dies liegt darin begründet, weil die Downtime auf Client-Seite nun nicht mehr nur von der reinen Failover-Zeit abhängt, sondern zusätzlich von der Dauer der DNS Replikation zwischen den DNS Servern. (Rufen Sie sich in Erinnerung, dass wir gerade von Multi-Subnet Clustering sprechen). Außerdem ist zu berücksichtigen, wie schnell die Clients die aktualisierten DNS Informationen abfragen. Spezielle Konfigurationen für Node Heartbeat, HostRecordTTL (Host Record Time-to-Live) und Intersite Replication Frequeny für Active Directory Sites und Services werden notwendig. Default TTL für Windows Server 2008 R2: 20 Minuten Empfohlene Einstellung: 1 Minute DNS Update Replication Frequency in Windows Umgebung: 180 Minuten Empfohlene Einstellung: 15 Minuten (minimaler Wert) Betrachtet man diese Werte, muss man feststellen, dass selbst eine optimale Konfiguration die rigiden SLAs (Service Level Agreements) heutiger geschäftskritischer Anwendungen für HA und DR nicht erfüllen kann. Denn dies impliziert eine auf der Client-Seite erlebte Failover-Zeit von insgesamt 16 Minuten. Hierzu ein Auszug aus der SQL Server 2012 Online Dokumentation: Cons: If a cross-subnet failover occurs, the client recovery time could be 15 minutes or longer, depending on your HostRecordTTL setting and the setting of your cross-site DNS/AD replication schedule. Wir sind hier an einem Punkt unserer Überlegungen angelangt, an dem sich erklärt, weshalb ich zuvor das "Windows was the God ..." Zitat verwendet habe. Die unbedingte Abhängigkeit zu Windows wird zunehmend zum Problem, da sie die Komplexität einer Microsoft-basierenden Lösung erhöht, anstelle sie zu reduzieren. Und Komplexität ist das Letzte, was sich CIOs heutzutage wünschen. Zur Ehrenrettung des SQL Server 2012 und AlwaysOn muss man sagen, dass derart lange Failover-Zeiten kein unbedingtes "Muss" darstellen, sondern ein "Kann". Doch auch ein "Kann" kann im unpassenden Moment unvorhersehbare und kostspielige Folgen haben. Die Unabsehbarkeit ist wiederum Ursache vieler an der Implementierung beteiligten Komponenten und deren Abhängigkeiten, wie beispielsweise drei Cluster-Lösungen (zwei von Microsoft, eine 3rd Party Lösung). Wie man die Sache auch dreht und wendet, kommt man an diesem Fakt also nicht vorbei - ganz unabhängig von der Dauer einer Downtime oder Failover-Zeiten. Im Gegensatz zu AlwaysOn und der hier vorgestellten Version eines Stretch-Clusters, vermeidet eine entsprechende Oracle Implementierung eine derartige Komplexität, hervorgerufen duch multiple Abhängigkeiten. Den Unterschied machen Datenbank-integrierte Mechanismen, wie Fast Application Notification (FAN) und Fast Connection Failover (FCF). Für Oracle MAA Konfigurationen (Maximum Availability Architecture) sind Inter-Site Failover-Zeiten im Bereich von Sekunden keine Seltenheit. Wenn Sie dem Link zur Oracle MAA folgen, finden Sie außerdem eine Reihe an Customer Case Studies. Auch dies ist ein wichtiges Unterscheidungsmerkmal zu AlwaysOn, denn die Oracle Technologie hat sich bereits zigfach in höchst kritischen Umgebungen bewährt. Availability Groups (Verfügbarkeitsgruppen) Die sogenannten Availability Groups (Verfügbarkeitsgruppen) sind - neben FCI - der weitere Baustein von AlwaysOn. Hinweis: Bevor wir uns näher damit beschäftigen, sollten Sie sich noch einmal ins Gedächtnis rufen, dass eine SQL Server Datenbank nicht die gleiche Bedeutung besitzt, wie eine Oracle Datenbank, sondern eher einem Oracle Schema entspricht. So etwas wie die SQL Server Northwind Datenbank ist vergleichbar mit dem Oracle Scott Schema. Eine Verfügbarkeitsgruppe setzt sich zusammen aus einem Set mehrerer Benutzer-Datenbanken, die im Falle eines Failover gemeinsam als Gruppe behandelt werden. Eine Verfügbarkeitsgruppe unterstützt ein Set an primären Datenbanken (primäres Replikat) und einem bis vier Sets von entsprechenden sekundären Datenbanken (sekundäre Replikate). Es können jedoch nicht alle SQL Server Datenbanken einer AlwaysOn Verfügbarkeitsgruppe zugeordnet werden. Der SQL Server Spezialist Michael Otey zählt in seinem SQL Server Pro Artikel folgende Anforderungen auf: Verfügbarkeitsgruppen müssen mit Benutzer-Datenbanken erstellt werden. System-Datenbanken können nicht verwendet werden Die Datenbanken müssen sich im Read-Write Modus befinden. Read-Only Datenbanken werden nicht unterstützt Die Datenbanken in einer Verfügbarkeitsgruppe müssen Multiuser Datenbanken sein Sie dürfen nicht das AUTO_CLOSE Feature verwenden Sie müssen das Full Recovery Modell nutzen und es muss ein vollständiges Backup vorhanden sein Eine gegebene Datenbank kann sich nur in einer einzigen Verfügbarkeitsgruppe befinden und diese Datenbank düerfen nicht für Database Mirroring konfiguriert sein Microsoft empfiehl außerdem, dass der Verzeichnispfad einer Datenbank auf dem primären und sekundären Server identisch sein sollte Wie man sieht, eignen sich Verfügbarkeitsgruppen nicht, um HA und DR vollständig abzubilden. Die Unterscheidung zwischen der Instanzen-Ebene (FCI) und Datenbank-Ebene (Availability Groups) ist von hoher Bedeutung. Vor kurzem wurde mir gesagt, dass man mit den Verfügbarkeitsgruppen auf Shared Storage verzichten könne und dadurch Kosten spart. So weit so gut ... Man kann natürlich eine Installation rein mit Verfügbarkeitsgruppen und ohne FCI durchführen - aber man sollte sich dann darüber bewusst sein, was man dadurch alles nicht abgesichert hat - und dies wiederum für Desaster Recovery (DR) und SLAs (Service Level Agreements) bedeutet. Kurzum, um die Kombination aus beiden AlwaysOn Produkten und der damit verbundene Komplexität kommt man wohl in der Praxis nicht herum. Availability Groups und WSFC AlwaysOn hängt von Windows Server Failover Clustering (WSFC) ab, um die aktuellen Rollen der Verfügbarkeitsreplikate einer Verfügbarkeitsgruppe zu überwachen und zu verwalten, und darüber zu entscheiden, wie ein Failover-Ereignis die Verfügbarkeitsreplikate betrifft. Das folgende Diagramm zeigt de Beziehung zwischen Verfügbarkeitsgruppen und WSFC: Der Verfügbarkeitsmodus ist eine Eigenschaft jedes Verfügbarkeitsreplikats. Synychron und Asynchron können also gemischt werden: Availability Modus (Verfügbarkeitsmodus) Asynchroner Commit-Modus Primäres replikat schließt Transaktionen ohne Warten auf Sekundäres Synchroner Commit-Modus Primäres Replikat wartet auf Commit von sekundärem Replikat Failover Typen Automatic Manual Forced (mit möglichem Datenverlust) Synchroner Commit-Modus Geplanter, manueller Failover ohne Datenverlust Automatischer Failover ohne Datenverlust Asynchroner Commit-Modus Nur Forced, manueller Failover mit möglichem Datenverlust Der SQL Server kennt keinen separaten Switchover Begriff wie in Oracle Data Guard. Für SQL Server werden alle Role Transitions als Failover bezeichnet. Tatsächlich unterstützt der SQL Server keinen Switchover für asynchrone Verbindungen. Es gibt nur die Form des Forced Failover mit möglichem Datenverlust. Eine ähnliche Fähigkeit wie der Switchover unter Oracle Data Guard ist so nicht gegeben. SQL Sever FCI mit Availability Groups (Verfügbarkeitsgruppen) Neben den Verfügbarkeitsgruppen kann eine zweite Failover-Ebene eingerichtet werden, indem SQL Server FCI (auf Shared Storage) mit WSFC implementiert wird. Ein Verfügbarkeitesreplikat kann dann auf einer Standalone Instanz gehostet werden, oder einer FCI Instanz. Zum Verständnis: Die Verfügbarkeitsgruppen selbst benötigen kein Shared Storage. Diese Kombination kann verwendet werden für lokale HA auf Ebene der Instanz und DR auf Datenbank-Ebene durch Verfügbarkeitsgruppen. Das folgende Diagramm zeigt dieses Szenario: Achtung! Hier handelt es sich nicht um ein Pendant zu Oracle RAC plus Data Guard, auch wenn das Bild diesen Eindruck vielleicht vermitteln mag - denn alle sekundären Knoten im FCI sind rein passiv. Es existiert außerdem eine weitere und ernsthafte Einschränkung: SQL Server Failover Cluster Instanzen (FCI) unterstützen nicht das automatische AlwaysOn Failover für Verfügbarkeitsgruppen. Jedes unter FCI gehostete Verfügbarkeitsreplikat kann nur für manuelles Failover konfiguriert werden. Lesbare Sekundäre Replikate Ein oder mehrere Verfügbarkeitsreplikate in einer Verfügbarkeitsgruppe können für den lesenden Zugriff konfiguriert werden, wenn sie als sekundäres Replikat laufen. Dies ähnelt Oracle Active Data Guard, jedoch gibt es Einschränkungen. Alle Abfragen gegen die sekundäre Datenbank werden automatisch auf das Snapshot Isolation Level abgebildet. Es handelt sich dabei um eine Versionierung der Rows. Microsoft versuchte hiermit die Oracle MVRC (Multi Version Read Consistency) nachzustellen. Tatsächlich muss man die SQL Server Snapshot Isolation eher mit Oracle Flashback vergleichen. Bei der Implementierung des Snapshot Isolation Levels handelt sich um ein nachträglich aufgesetztes Feature und nicht um einen inhärenten Teil des Datenbank-Kernels, wie im Falle Oracle. (Ich werde hierzu in Kürze einen weiteren Blogbeitrag verfassen, wenn ich mich mit der neuen SQL Server 2012 Core Lizenzierung beschäftige.) Für die Praxis entstehen aus der Abbildung auf das Snapshot Isolation Level ernsthafte Restriktionen, derer man sich für den Betrieb in der Praxis bereits vorab bewusst sein sollte: Sollte auf der primären Datenbank eine aktive Transaktion zu dem Zeitpunkt existieren, wenn ein lesbares sekundäres Replikat in die Verfügbarkeitsgruppe aufgenommen wird, werden die Row-Versionen auf der korrespondierenden sekundären Datenbank nicht sofort vollständig verfügbar sein. Eine aktive Transaktion auf dem primären Replikat muss zuerst abgeschlossen (Commit oder Rollback) und dieser Transaktions-Record auf dem sekundären Replikat verarbeitet werden. Bis dahin ist das Isolation Level Mapping auf der sekundären Datenbank unvollständig und Abfragen sind temporär geblockt. Microsoft sagt dazu: "This is needed to guarantee that row versions are available on the secondary replica before executing the query under snapshot isolation as all isolation levels are implicitly mapped to snapshot isolation." (SQL Storage Engine Blog: AlwaysOn: I just enabled Readable Secondary but my query is blocked?) Grundlegend bedeutet dies, dass ein aktives lesbares Replikat nicht in die Verfügbarkeitsgruppe aufgenommen werden kann, ohne das primäre Replikat vorübergehend stillzulegen. Da Leseoperationen auf das Snapshot Isolation Transaction Level abgebildet werden, kann die Bereinigung von Ghost Records auf dem primären Replikat durch Transaktionen auf einem oder mehreren sekundären Replikaten geblockt werden - z.B. durch eine lang laufende Abfrage auf dem sekundären Replikat. Diese Bereinigung wird auch blockiert, wenn die Verbindung zum sekundären Replikat abbricht oder der Datenaustausch unterbrochen wird. Auch die Log Truncation wird in diesem Zustant verhindert. Wenn dieser Zustand längere Zeit anhält, empfiehlt Microsoft das sekundäre Replikat aus der Verfügbarkeitsgruppe herauszunehmen - was ein ernsthaftes Downtime-Problem darstellt. Die Read-Only Workload auf den sekundären Replikaten kann eingehende DDL Änderungen blockieren. Obwohl die Leseoperationen aufgrund der Row-Versionierung keine Shared Locks halten, führen diese Operatioen zu Sch-S Locks (Schemastabilitätssperren). DDL-Änderungen durch Redo-Operationen können dadurch blockiert werden. Falls DDL aufgrund konkurrierender Lese-Workload blockiert wird und der Schwellenwert für 'Recovery Interval' (eine SQL Server Konfigurationsoption) überschritten wird, generiert der SQL Server das Ereignis sqlserver.lock_redo_blocked, welches Microsoft zum Kill der blockierenden Leser empfiehlt. Auf die Verfügbarkeit der Anwendung wird hierbei keinerlei Rücksicht genommen. Keine dieser Einschränkungen existiert mit Oracle Active Data Guard. Backups auf sekundären Replikaten Über die sekundären Replikate können Backups (BACKUP DATABASE via Transact-SQL) nur als copy-only Backups einer vollständigen Datenbank, Dateien und Dateigruppen erstellt werden. Das Erstellen inkrementeller Backups ist nicht unterstützt, was ein ernsthafter Rückstand ist gegenüber der Backup-Unterstützung physikalischer Standbys unter Oracle Data Guard. Hinweis: Ein möglicher Workaround via Snapshots, bleibt ein Workaround. Eine weitere Einschränkung dieses Features gegenüber Oracle Data Guard besteht darin, dass das Backup eines sekundären Replikats nicht ausgeführt werden kann, wenn es nicht mit dem primären Replikat kommunizieren kann. Darüber hinaus muss das sekundäre Replikat synchronisiert sein oder sich in der Synchronisation befinden, um das Beackup auf dem sekundären Replikat erstellen zu können. Vergleich von Microsoft AlwaysOn mit der Oracle MAA Ich komme wieder zurück auf die Eingangs erwähnte, mehrfach an mich gestellte Frage "Wann denn - und ob überhaupt - Oracle etwas Vergleichbares wie AlwaysOn bieten würde?" und meine damit verbundene (kurze) Irritation. Wenn Sie diesen Blogbeitrag bis hierher gelesen haben, dann kennen Sie jetzt meine darauf gegebene Antwort. Der eine oder andere Punkt traf dabei nicht immer auf Jeden zu, was auch nicht der tiefere Sinn und Zweck meiner Antwort war. Wenn beispielsweise kein Multi-Subnet mit im Spiel ist, sind alle diesbezüglichen Kritikpunkte zunächst obsolet. Was aber nicht bedeutet, dass sie nicht bereits morgen schon wieder zum Thema werden könnten (Sag niemals "Nie"). In manch anderes Fettnäpfchen tritt man wiederum nicht unbedingt in einer Testumgebung, sondern erst im laufenden Betrieb. Erst recht nicht dann, wenn man sich potenzieller Probleme nicht bewusst ist und keine dedizierten Tests startet. Und wer AlwaysOn erfolgreich positionieren möchte, wird auch gar kein Interesse daran haben, auf mögliche Schwachstellen und den besagten Teufel im Detail aufmerksam zu machen. Das ist keine Unterstellung - es ist nur menschlich. Außerdem ist es verständlich, dass man sich in erster Linie darauf konzentriert "was geht" und "was gut läuft", anstelle auf das "was zu Problemen führen kann" oder "nicht funktioniert". Wer will schon der Miesepeter sein? Für mich selbst gesprochen, kann ich nur sagen, dass ich lieber vorab von allen möglichen Einschränkungen wissen möchte, anstelle sie dann nach einer kurzen Zeit der heilen Welt schmerzhaft am eigenen Leib erfahren zu müssen. Ich bin davon überzeugt, dass es Ihnen nicht anders geht. Nachfolgend deshalb eine Zusammenfassung all jener Punkte, die ich im Vergleich zur Oracle MAA (Maximum Availability Architecture) als unbedingt Erwähnenswert betrachte, falls man eine Evaluierung von Microsoft AlwaysOn in Betracht zieht. 1. AlwaysOn ist eine komplexe Technologie Der SQL Server AlwaysOn Stack ist zusammengesetzt aus drei verschiedenen Technlogien: Windows Server Failover Clustering (WSFC) SQL Server Failover Cluster Instances (FCI) SQL Server Availability Groups (Verfügbarkeitsgruppen) Man kann eine derartige Lösung nicht als nahtlos bezeichnen, wofür auch die vielen von Microsoft dargestellten Einschränkungen sprechen. Während sich frühere SQL Server Versionen in Richtung eigener HA/DR Technologien entwickelten (wie Database Mirroring), empfiehlt Microsoft nun die Migration. Doch weshalb dieser Schwenk? Er führt nicht zu einem konsisten und robusten Angebot an HA/DR Technologie für geschäftskritische Umgebungen. Liegt die Antwort in meiner These begründet, nach der "Windows was the God ..." noch immer gilt und man die Nachteile der allzu engen Kopplung mit Windows nicht sehen möchte? Entscheiden Sie selbst ... 2. Failover Cluster Instanzen - Kein RAC-Pendant Die SQL Server und Windows Server Clustering Technologie basiert noch immer auf dem veralteten Aktiv-Passiv Modell und führt zu einer Verschwendung von Systemressourcen. In einer Betrachtung von lediglich zwei Knoten erschließt sich auf Anhieb noch nicht der volle Mehrwert eines Aktiv-Aktiv Clusters (wie den Real Application Clusters), wie er von Oracle bereits vor zehn Jahren entwickelt wurde. Doch kennt man die Vorzüge der Skalierbarkeit durch einfaches Hinzufügen weiterer Cluster-Knoten, die dann alle gemeinsam als ein einziges logisches System zusammenarbeiten, versteht man was hinter dem Motto "Pay-as-you-Grow" steckt. In einem Aktiv-Aktiv Cluster geht es zwar auch um Hochverfügbarkeit - und ein Failover erfolgt zudem schneller, als in einem Aktiv-Passiv Modell - aber es geht eben nicht nur darum. An dieser Stelle sei darauf hingewiesen, dass die Oracle 11g Standard Edition bereits die Nutzung von Oracle RAC bis zu vier Sockets kostenfrei beinhaltet. Möchten Sie dazu Windows nutzen, benötigen Sie keine Windows Server Enterprise Edition, da Oracle 11g die eigene Clusterware liefert. Sie kommen in den Genuss von Hochverfügbarkeit und Skalierbarkeit und können dazu die günstigere Windows Server Standard Edition nutzen. 3. SQL Server Multi-Subnet Clustering - Abhängigkeit zu 3rd Party Storage Mirroring Die SQL Server Multi-Subnet Clustering Architektur unterstützt den Aufbau eines Stretch Clusters, basiert dabei aber auf dem Aktiv-Passiv Modell. Das eigentlich Problematische ist jedoch, dass man sich zur Absicherung der Datenbank auf 3rd Party Storage Mirroring Technologie verlässt, ohne Integration zwischen dem Windows Server Failover Clustering (WSFC) und der darunterliegenden Mirroring Technologie. Wenn nun im Cluster ein Failover auf Instanzen-Ebene erfolgt, existiert keine Koordination mit einem möglichen Failover auf Ebene des Storage-Array. 4. Availability Groups (Verfügbarkeitsgruppen) - Vier, oder doch nur Zwei? Ein primäres Replikat erlaubt bis zu vier sekundäre Replikate innerhalb einer Verfügbarkeitsgruppe, jedoch nur zwei im Synchronen Commit Modus. Während dies zwar einen Vorteil gegenüber dem stringenten 1:1 Modell unter Database Mirroring darstellt, fällt der SQL Server 2012 damit immer noch weiter zurück hinter Oracle Data Guard mit bis zu 30 direkten Stanbdy Zielen - und vielen weiteren durch kaskadierende Ziele möglichen. Damit eignet sich Oracle Active Data Guard auch für die Bereitstellung einer Reader-Farm Skalierbarkeit für Internet-basierende Unternehmen. Mit AwaysOn Verfügbarkeitsgruppen ist dies nicht möglich. 5. Availability Groups (Verfügbarkeitsgruppen) - kein asynchrones Switchover Die Technologie der Verfügbarkeitsgruppen wird auch als geeignetes Mittel für administrative Aufgaben positioniert - wie Upgrades oder Wartungsarbeiten. Man muss sich jedoch einem gravierendem Defizit bewusst sein: Im asynchronen Verfügbarkeitsmodus besteht die einzige Möglichkeit für Role Transition im Forced Failover mit Datenverlust! Um den Verlust von Daten durch geplante Wartungsarbeiten zu vermeiden, muss man den synchronen Verfügbarkeitsmodus konfigurieren, was jedoch ernstzunehmende Auswirkungen auf WAN Deployments nach sich zieht. Spinnt man diesen Gedanken zu Ende, kommt man zu dem Schluss, dass die Technologie der Verfügbarkeitsgruppen für geplante Wartungsarbeiten in einem derartigen Umfeld nicht effektiv genutzt werden kann. 6. Automatisches Failover - Nicht immer möglich Sowohl die SQL Server FCI, als auch Verfügbarkeitsgruppen unterstützen automatisches Failover. Möchte man diese jedoch kombinieren, wird das Ergebnis kein automatisches Failover sein. Denn ihr Zusammentreffen im Failover-Fall führt zu Race Conditions (Wettlaufsituationen), weshalb diese Konfiguration nicht länger das automatische Failover zu einem Replikat in einer Verfügbarkeitsgruppe erlaubt. Auch hier bestätigt sich wieder die tiefere Problematik von AlwaysOn, mit einer Zusammensetzung aus unterschiedlichen Technologien und der Abhängigkeit zu Windows. 7. Problematische RTO (Recovery Time Objective) Microsoft postioniert die SQL Server Multi-Subnet Clustering Architektur als brauchbare HA/DR Architektur. Bedenkt man jedoch die Problematik im Zusammenhang mit DNS Replikation und den möglichen langen Wartezeiten auf Client-Seite von bis zu 16 Minuten, sind strenge RTO Anforderungen (Recovery Time Objectives) nicht erfüllbar. Im Gegensatz zu Oracle besitzt der SQL Server keine Datenbank-integrierten Technologien, wie Oracle Fast Application Notification (FAN) oder Oracle Fast Connection Failover (FCF). 8. Problematische RPO (Recovery Point Objective) SQL Server ermöglicht Forced Failover (erzwungenes Failover), bietet jedoch keine Möglichkeit zur automatischen Übertragung der letzten Datenbits von einem alten zu einem neuen primären Replikat, wenn der Verfügbarkeitsmodus asynchron war. Oracle Data Guard hingegen bietet diese Unterstützung durch das Flush Redo Feature. Dies sichert "Zero Data Loss" und beste RPO auch in erzwungenen Failover-Situationen. 9. Lesbare Sekundäre Replikate mit Einschränkungen Aufgrund des Snapshot Isolation Transaction Level für lesbare sekundäre Replikate, besitzen diese Einschränkungen mit Auswirkung auf die primäre Datenbank. Die Bereinigung von Ghost Records auf der primären Datenbank, wird beeinflusst von lang laufenden Abfragen auf der lesabaren sekundären Datenbank. Die lesbare sekundäre Datenbank kann nicht in die Verfügbarkeitsgruppe aufgenommen werden, wenn es aktive Transaktionen auf der primären Datenbank gibt. Zusätzlich können DLL Änderungen auf der primären Datenbank durch Abfragen auf der sekundären blockiert werden. Und imkrementelle Backups werden hier nicht unterstützt. Keine dieser Restriktionen existiert unter Oracle Data Guard.

Read the article
Good Enough Failover Strategy for DNS / MySQL / Email

- by IMB

I've asked and read a lot questions regarding DNS failover but the more I read the more complicated it becomes, some people say it's good enough some say it isn't. No clear answers from what I read. I was wondering if we can set it straight once and for all, at least for the requirements of most websites out there. Right now let's assume the following: We don't need really need load-balancing, what we need is a failover solution. We are running a website based on LAMP on a VPS. We need to make sure that the Web Server, MySQL, Email are always accessible if not 99%. Basically here's my idea and questions about it: Web Server: We need at least one failover server (another VPS on a separate data center). Is DNS Failover via Round Robin good, if not, what's the best? And how do you exactly implement it? How do you make the files you upload/delete on Server A is also on Server B? MySQL: I've only read a brief intro to MySQL replication and I assume that I can replicate Server A to Server B and vice versa on the fly right? So just it case Server A fails and Server B is now running, it will continue to work and replicate to Server A when it becomes available. So in essence Server B is now the primary server, and will later on failover to Server A, should a failure happen again. Email: If we are gonna use DNS Failover, using webmail or relying on emails stored on the server is probably not a good idea right? Since some emails might be on Server A while some might be on Server B? I assume a basic email forwarder to a 3rdparty is good enough (like Gmail for example) to ensure all emails are kept in one place. Here's a basic diagram for a better picture: http://i.stack.imgur.com/KWSIi.png

Read the article
Solution 6 : Kill a Non-Clustered Process during Two-Node Cluster Failover

- by StanleyGu

Using Visual Studio 2008 and C#, I developed a windows service A and deployed it to two nodes of a windows server 2008 failover cluster. The service A is part of the failover cluster service, which means, when failover occurs at node1, the cluster service will failover the windows service A from node 1 to node 2. One of the tasks implemented by the windows service A is to start, monitor or kill a process B. The process B is installed to the two nodes but is not part of the failover cluster service. When a failover occurs at node1, the cluster service does not failover the process B from node 1 to node 2, and the process B continues running at node1. The requirement is: When failover occurs at node1, we want the process B running at node1 gets killed, but we do not want the process B be part of the failover cluster service. The first idea that pops up immediately is to put some code in an event handler triggered by the failover in the windows service A. The failover effect to the windows service A is similar to using the task manager to kill the process of the windows service A, but there is no event in windows service that can be triggered by killing the process of the window service. The events related to terminating a windows service are OnStop and OnShutDown, but killing the process of windows service A triggers neither of them. The OnStop event can only be triggered by stopping the windows service using Services Control Manager or Services Management Console. Apparently, the first idea is not feasible. The second idea that emerges is to put code into the OnStart event handler of the windows service A. When failover occurs at node 1, the windows service A is killed at node 1 and started at node 2. During the starting, the windows service A at node 2 kills the process B that is running at node 1. It is a workaround and works very well. The C# code implementation within the OnStart event handler is as following: 1. Capture server names of the two nodes from App.config 2. Determine server name of the remote node. 3. Kill the process B running on the remote node. Check here for sample code.

Read the article
Sonicwall with failover, multiple subnets, and preferred WAN interface per subnet

- by Dan

I am trying to set up my Sonicwall TZ-210 as follows: Two WAN interfaces (different ISPs), set up in failover mode. Two LAN interfaces with different subnets Each LAN subnet will have a preferred outbound WAN interface, but would failover when necessary. In this way, each ISP is being used for a separate subnet of my network, but both subnets could failover to the other ISP if their primary went down. I know how to do 1 and 2, but I don't know how to do 3. I could set up a route for each subnet to go through a specific interface, but what would happen in the event of a failover? Would it automatically update those routes? Thanks!

Read the article
Secondary DHCP server won't start on Centos 6.2

- by Slowjoe

I'm trying to create a backup DHCP server. Server times are in sync. Primary server starts fine. Secondary server won't start. Error from /var/log/messages is: Sep 15 14:47:45 stream dhcpd: Copyright 2004-2010 Internet Systems Consortium. Sep 15 14:47:45 stream dhcpd: All rights reserved. Sep 15 14:47:45 stream dhcpd: For info, please visit https://www.isc.org/software/dhcp/ Sep 15 14:47:45 stream dhcpd: /etc/dhcp/dhcpd.conf line 25: invalid statement in peer declaration Sep 15 14:47:45 stream dhcpd: #011max-response-default Sep 15 14:47:45 stream dhcpd: ^ Sep 15 14:47:45 stream dhcpd: /etc/dhcp/dhcpd.conf line 41: failover peer dhcp-failover: not found Sep 15 14:47:45 stream dhcpd: failover peer "dhcp-failover" Sep 15 14:47:45 stream dhcpd: ^ Sep 15 14:47:45 stream dhcpd: /etc/dhcp/dhcpd.conf line 49: failover peer dhcp-failover: not found Sep 15 14:47:45 stream dhcpd: failover peer "dhcp-failover" Sep 15 14:47:45 stream dhcpd: ^ Sep 15 14:47:45 stream dhcpd: WARNING: Host declarations are global. They are not limited to the scope you declared them in. Sep 15 14:47:45 stream dhcpd: /etc/dhcp/dhcpd.conf line 70: failover peer dhcp-failover: not found Sep 15 14:47:45 stream dhcpd: failover peer "dhcp-failover" Sep 15 14:47:45 stream dhcpd: ^ Sep 15 14:47:45 stream dhcpd: /etc/dhcp/dhcpd.conf line 78: failover peer dhcp-failover: not found Sep 15 14:47:45 stream dhcpd: failover peer "dhcp-failover" Sep 15 14:47:45 stream dhcpd: ^ Sep 15 14:47:45 stream dhcpd: Configuration file errors encountered -- exiting Sep 15 14:47:45 stream dhcpd: Sep 15 14:47:45 stream dhcpd: This version of ISC DHCP is based on the release available Sep 15 14:47:45 stream dhcpd: on ftp.isc.org. Features have been added and other changes Sep 15 14:47:45 stream dhcpd: have been made to the base software release in order to make Sep 15 14:47:45 stream dhcpd: it work better with this distribution. Sep 15 14:47:45 stream dhcpd: Sep 15 14:47:45 stream dhcpd: Please report for this software via the CentOS Bugs Database: Sep 15 14:47:45 stream dhcpd: http://bugs.centos.org/ Sep 15 14:47:45 stream dhcpd: Sep 15 14:47:45 stream dhcpd: exiting. Config file contents: # DHCP Server Configuration file. # see /usr/share/doc/dhcp*/dhcpd.conf.sample # see 'man 5 dhcpd.conf' # option domain-name "eng.foo.com"; option domain-name-servers ns0.eng.foo.com, ns1.eng.foo.com; option ntp-servers ntp.eng.foo.com; #option time-servers ntp.eng.foo.com; default-lease-time 3600; max-lease-time 7200; authoritative; log-facility local7; failover peer "dhcp-failover" { secondary; address 10.0.1.70; port 647; peer address 10.0.1.11; peer port 647; max-response-default 30; max-unacked-updates 10; load balance max seconds 3; } # # Management subnet # subnet 10.0.0.0 netmask 255.255.255.0 { option subnet-mask 255.255.255.0; option broadcast-address 10.0.0.255; option routers 10.0.0.1; option domain-search "eng.foo.com", "foo.com"; # Unknown clients get this pool pool { failover peer "dhcp-failover"; max-lease-time 300; range 10.0.0.240 10.0.0.249; allow unknown-clients; } # Known clients get this pool pool { failover peer "dhcp-failover"; max-lease-time 28800; range 10.0.0.150 10.0.0.199; deny unknown-clients; } include "/etc/dhcp/dhcpd.conf-engmgmt"; } # # Data subnet # subnet 10.0.1.0 netmask 255.255.255.0 { option subnet-mask 255.255.255.0; option broadcast-address 10.0.1.255; option routers 10.0.1.1; option domain-search "eng.foo.com", "foo.com"; # Unknown clients get this pool pool { failover peer "dhcp-failover"; max-lease-time 300; range 10.0.1.240 10.0.1.249; allow unknown-clients; } # Known clients get this pool pool { failover peer "dhcp-failover"; max-lease-time 28800; range 10.0.1.150 10.0.1.199; deny unknown-clients; } # For centos network installs if substring (option vendor-class-identifier, 0, 8) = "anaconda" { filename "/autohome/distro/ks/"; next-server eng-data.eng.foo.com; } # For PXE network installs if substring (option vendor-class-identifier, 0, 9) = "PXEClient" { filename "pxelinux.0"; next-server eng-data.eng.foo.com; } # For KVM PXE network installs if substring (option vendor-class-identifier, 0, 9) = "Etherboot" { filename "pxelinux.0"; next-server eng-data.eng.foo.com; } include "/etc/dhcp/dhcpd.conf-engdata"; }

Read the article
SQL Server Database In Single User Mode after Failover

- by jlichauc

Here is a weird situation we experienced with a SQL Server 2008 Database Mirroring Failover. We have a pair of mirrored databases running in high-availability mode and both the principal and mirror showed as synchronized. As part of some maintenance I triggered a manual failover of the principal to the mirror. However after the failover the principal was now in single-user mode instead of the expected "Principal/Synchronized" state we usually get. The database had been in multi-user mode on the previous principal before this had happened. We ended up stopping all applications, restarting the SQL Server instances, and executing "ALTER DATABASE ... SET MULTI_USER" to bring the database back to the expected "Principal/Synchronized" state in a multi-user mode. Question. Does anyone know where SQL Server stores information about whether a database should be in single-user mode or not? I'm wondering if there is some system database or table that has this setting recorded somewhere. In particular we had an incident once with the database on the original principal (the one I was failing over to) where when trying to detach the database it was put into single-user mode. I'm wondering if that setting is cached somewhere and is the reason that SQL Server put it back into single-user mode after a failover.

Read the article
DNS Round-robin, Load Balancing, Load sharing, and failover in 2012

- by user1089770

I have been reading many posts on serverfault as well as on other sites regarding all these. What I understand is, Multiple A records(round-robin dns) can be used for both : Load sharing (round-robin, but NOT load-balancing). Many people say that “Load Balancing” but I think there will be no load-balancing because “Balance” means (literally) “compare two(or more) and adjust” (and that is what Real s/w or h/w Load balancers do) but Browsers never do this, instead they randomly select and IP and connect to it. It doesn't have any knowledge about the current load of that server (probably, the IP it picked had the highest load!). Automatic failover (latest browsers only). Yes, I think DNS can be used as a simple failover system (at least in 2012, I dont know when it actually "came in effect"). please refer to : http://webmasters.stackexchange.com/questions/10927/using-multiple-a-records-for-my-domain-do-web-browsers-ever-try-more-than-one and Browser-based DNS failover using multiple A records and http://www.nber.org/sys-admin/dns-failover.html I would like to make sure my assumptions/findings are right. So let me know please.....

Read the article
Apache failover for JBoss

- by DaveB

I am running a JBoss web app (AS 6 Final) hosted on linux (Debian). I would like to implement a failover solution so that when JBoss is down, a static web page is served in its place. My current solution is to run Apache as a reverse proxy (described here), which allows me to serve .php files from apache and forward all other requests to JBoss. But I am not sure how make Apache step in when JBoss is down? Note. both apache and jboss will be running on the same box, this is (Application failover rather then server failover) to cover times when JBoss is re-deploying etc. So I am looking for the simplest solution really Many thanks

Read the article
Failover or load balancing configuration on Apple Xserve and Mac OS X Server Snow Leopard (10.6)

- by Alasdair Allan

I'm currently installing a number of Apple Xserve boxes running Mac OS X Server (10.6). After poking and prodding for a while I can't find any obvious way, or documentation telling me how, to set up the 2 ethernet ports in a failover or load-balancing configuration. There seems to be an obvious way to do link aggregation, but not failover or load-balancing? What's the default configuration if I enable both ports with the same (manually configured) IP address?

Read the article
Failover Issuer CAs without Clustering

- by James Santiago

I am attempting to setup a Certificate Authority with some failover capabilities for the issuer CAs. I have an offline root CA and am attempting to setup two subordinate CAs on our domain which will handle issuing certificates. I'm trying to determine the architecture needed for these two CAs to allow one to go down and the other to take over without the use of failover clustering, as the two are in different geographic locales. Are there documents regarding this setup?

Read the article
Simple Webserver failover

- by yummm

I will be running a dynamic web site and if the server ever is to stop responding, I'd like to failover to a static website that displays a "We are down for maintenance" page. I have been reading and I found that switching the DNS dynamically may be an option, but how quick will that change take place? And will everyone see the change immediately? Are there any better ways to failover to another server?

Read the article
Self-healing Cloud vs Failover Boxes

- by IMB

Now that self-healing cloud servers are becoming more and more popular, I am currently torn between the decision if I should setup a HAproxy failover for my VPS or if should save myself the trouble and just put my sites on a self-healing cloud server. Does it still make sense to setup your own failover system (HAproxy + 2 or more servers for example) when self healing cloud seems like a practical solution? They seem to do the same job or am I missing something?

Read the article
Un-failing over a Cisco PIX 515e

- by ABrown

We had a power outage at our data center last week and when our dual PIX 515E running IOS 7.0(8) (configured with a failover cable) came back, they were in a failed over state where the Secondary unit is active and the Primary unit is standby I have tried 'failover reset', 'failover active', and 'failover reload-standby' as well as executing reloads on both units in a variety of orders, and they don't come back Primary/Active Secondary/Standby. The only thing in my arsenal that I haven't tried is driving to the data center and performing a hard reboot, which I hate to do. I have read How Failover Works on the Cisco Secure Firewall and it seems like this should be wicked straight forward. output of show failover on Primary: Failover On Cable status: Normal Failover unit Primary Failover LAN Interface: N/A - Serial-based failover enabled Unit Poll frequency 15 seconds, holdtime 45 seconds Interface Poll frequency 15 seconds Interface Policy 1 Monitored Interfaces 2 of 250 maximum Version: Ours 7.0(8), Mate 7.0(8) Last Failover at: 02:52:05 UTC Mar 10 2010 This host: Primary - Standby Ready Active time: 0 (sec) Interface outside (x.x.x.165): Normal Interface inside (y.y.y.3): Normal Other host: Secondary - Active Active time: 897045 (sec) Interface outside (x.x.x.164): Normal Interface inside (y.y.y.4): Normal Stateful Failover Logical Update Statistics Link : Unconfigured. output of show failover on Secondary: Failover On Cable status: Normal Failover unit Secondary Failover LAN Interface: N/A - Serial-based failover enabled Unit Poll frequency 15 seconds, holdtime 45 seconds Interface Poll frequency 15 seconds Interface Policy 1 Monitored Interfaces 2 of 250 maximum Version: Ours 7.0(8), Mate 7.0(8) Last Failover at: 02:03:04 UTC Feb 28 2010 This host: Secondary - Active Active time: 896925 (sec) Interface outside (x.x.x.164): Normal Interface inside (y.y.y.4): Normal Other host: Primary - Standby Ready Active time: 0 (sec) Interface outside (x.x.x.165): Normal Interface inside (y.y.y.3): Normal Stateful Failover Logical Update Statistics Link : Unconfigured. I'm seeing the following in my syslog: Mar 10 03:05:00 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command. Mar 10 03:05:09 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reload-standby' command. Mar 10 03:05:12 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=20,my=Active,peer=Failed. Mar 10 03:05:12 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Failed. Mar 10 03:06:09 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=0,my=Active,peer=Failed. Mar 10 03:06:09 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is down. Mar 10 03:06:09 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=1,my=Active,peer=Failed. Mar 10 03:06:10 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is up. Mar 10 03:06:10 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=411,op=2,my=Active,peer=Failed. Mar 10 03:06:23 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=80,my=Active,peer=Standby Ready. Mar 10 03:06:23 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Standby Ready. Mar 10 03:06:24 fw2 %PIX-6-720027: (VPN-Primary) HA status callback: My state Standby Ready. Mar 10 03:07:05 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command. Mar 10 03:07:31 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover active' command. Mar 10 03:08:04 fw1 %PIX-5-611103: User logged out: Uname: enable_1 Mar 10 03:08:04 fw1 %PIX-6-315011: SSH session from admin1_int on interface inside for user "pix" terminated normally Mar 10 03:08:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=20,my=Active,peer=Failed. Mar 10 03:08:39 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Failed. Mar 10 03:09:10 fw1 %PIX-6-605005: Login permitted from admin1_int/36891 to inside:192.168.4.4/ssh for user "pix" Mar 10 03:09:23 fw1 %PIX-5-111008: User 'enable_15' executed the 'failover reset' command. Mar 10 03:09:38 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=0,my=Active,peer=Failed. Mar 10 03:09:39 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is down. Mar 10 03:09:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=401,op=1,my=Active,peer=Failed. Mar 10 03:09:39 fw1 %PIX-6-720024: (VPN-Secondary) HA status callback: Control channel is up. Mar 10 03:09:39 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=411,op=2,my=Active,peer=Failed. Mar 10 03:09:52 fw1 %PIX-6-720032: (VPN-Secondary) HA status callback: id=3,seq=200,grp=0,event=406,op=80,my=Active,peer=Standby Ready. Mar 10 03:09:52 fw1 %PIX-6-720028: (VPN-Secondary) HA status callback: Peer state Standby Ready. Mar 10 03:09:53 fw2 %PIX-6-720027: (VPN-Primary) HA status callback: My state Standby Ready. I'm not exactly sure how to interpret that syslog data. Primary doesn't seem to even try to become Active. When I reload the individual units separately, my connections are retained, so it doesn't seem like I have a real hardware failure. Is there something I can query (IOS or SNMP) to check for hardware issues? Any thoughts? My IOS-fu is weak. Thanks for any help you might provide, Aaron

Read the article
Change OpenVZ route to pass through ip failover

- by Kevin Campion

I have one dedicaced server with its own IP and another IP (failover) who refer to the first. I will wish to change the gateway of a Proxmox virtual machine (openvz) who runs on this dedicaced server to go through the failover IP rather than the ip of host main server. Once connected to a virtual machine, when I do a traceroute VE# traceroute www.google.fr traceroute to www.google.fr (209.85.229.104), 30 hops max, 60 byte packets 1 MY_SERVER_NAME.ovh.net (xxx.xxx.xxx.xxx FIRST_IP_MAIN_SERVER) 0.021 ms 0.010 ms 0.009 ms The first line tells me the ip of host main server. I would like that the traceroute display the second IP failover. VE# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.0.2.1 * 255.255.255.255 UH 0 0 0 venet0 default 192.0.2.1 0.0.0.0 UG 0 0 0 venet0 With iptables HOST# iptables -t nat -L Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- anywhere anywhere MASQUERADE all -- anywhere anywhere SNAT tcp -- anywhere 10.10.101.2 tcp dpt:www state NEW,RELATED,ESTABLISHED,UNTRACKED to:SECOND_IP_FAILOVER SNAT all -- 10.10.101.2 anywhere to:SECOND_IP_FAILOVER 10.10.101.2 is the virtual machine IP (interface venet0) Any ideas ?

Read the article
master-slave datastore replication, automatic failover, and wackamole

- by z8000

I have 2 dedicated servers provisioned for my next project's datastores. The datastores are configured for master-slave replication. There's no inherent automatic failover but I of course want this. That is, I'd love for access to the master datastore to always just work without having to configure a client library to detect when a master is down and failover to the slave. I've seen Wackamole which is based on the Spread Toolkit. You provide Wackamole with a set of IPs and a bunch of nodes, and regardless of the up/down state of any of the nodes, those IPs will stay available/up. Wackamole detects when a node goes down and ARPs the IP(s) that were up on the now-down node. It's pretty neat actually. So, my thought was to use Wackamole to keep the 2 virtual private IPs available/up. Clients would then just always use the same private IP to access the master datastore and the same but distinct IP for the slave datastore, even if those IPs were hosted on the same node. My datastore servers are accessed over a private network. I am unsure if this messes with Wackamole though. Is this lunacy? How do you generally handle automatic failover of private services like a datastore. FWIW, it shouldn't matter but the datastore is Redis. I don't want to hear "use mySQL" please :) Thanks.

Read the article
DNS Round-robin failover and load balancing

- by Tom O'Connor

Having read all of the questions and answers (1 2 3 and so on) on here relating to DNS load balancing, and Round-robin DNS, there's still a number of unanswered questions.. Large companies, and I'm looking at Google, Facebook and Twitter here, do present multiple A records. 1) If DNS loadbalancing/failover is so dodgy, why do large organisations do it? There seems to be very little mention of "DNS Pinning", despite this (PDF) paper about it. 2) Why is DNS Pinning so seldom mentioned? 3) Are there any concrete examples of which ISPs and so on actually do rewrite DNS TTLs? That said, I'm not entirely backing the side for using DNS for failover or any form of load balancing. For most networks, BGP diverse routing still seems to be a better fit. DNS rears it's ugly head again. :(

Read the article
Best solution for Multi-WAN failover (inside & out)?

- by Sean O

Looking for a way to setup 2 ISPs in failover mode, for both incoming & outgoing traffic, for our small (<100 devices) network. The leading contender for now seems to be the Peplink Balance 310. However, a reseller I spoke with said it's great for 100% outgoing connectivity, but didn't seem to be confident in its abilities to handle incoming traffic. This is important as we host our own web site, Exchange e-mail, and virtual desktops (RDP). Do any Peplink owners use this for failover of incoming traffic? Are there other devices I should be considering? We're currently using a Cisco 1800 series router & ASA 5500 series firewall, with Comcast & T-1 lines (the goal being to replace the T with DSL/FiOS {whenever that becomes availble}). Price range: ~$1000 - $2500 USD. Thanks.

Read the article
Software for failover across multiple external hosts

- by Lin

I have multiple webservers with the same content, hosted across different providers. However, I can't seem to find a nice, simple failover solution. Load-balancing software (Pound, HAProxy, etc.) are unnecessary, and I need the flexibility to manage over 100+ domains, so the paid DNS failover solutions I've found are too expensive. So far the simplest solution I've thought of is just to set a very low TTL (30min - 1hr) in each zone entry on my nameservers (running BIND). Then, continuously monitor each server, and temporarily remove failed servers from zone entries. But this seems like something that should be currently available. I only have root access to different VPSes running CentOS. Any suggestions? Thanks!

Read the article
SQLServer 2008 FailOver and Load Balancing

- by Jedi Master Spooky

I have a project with a 2TB database ( 450.000.000 rows). I need to provide to the proyect a solution that gives FailOver and load Balacing, what do you recommend? We are going to use a NetApp Filer for the Data Files and for the File System of the Proyect. I read that SQl Clustering does not provide load balacing. If I cannnot have this feature and I have to go only to the FailOver what Server ( I presume that the key feature here is memory) would you recomend. We are adding 1.000.000 rows a day. Once the rows is inserted we are doing a lot of updates to that row for about 1 week then the row get static. Because of this I am thinking in some kind of history table or database or something like that. I am open to the Os servers implementation, I was thinking of a windows 2008 server with cluster but this depend of the database solution

Read the article
DNS failover across multiple datacenters?

- by Jae Lee

I've got a site that is starting to get a lot of traffic and just the other day, we had a network outage at the datacenter where our loadbalancer (haproxy) is hosted at. This worried me as despite all my efforts of making the system fully redundant, I still could not make our DNS redundant, which I think isn't an easy solution. Only thing I was able to find was to sign up for DNS failover from places like dnsme, etc .... but they cost too much for budding startups. Even their Corporate plan only gives you 50 million queries per month and we use that up in a week. So my question is, are there any self hosted DNS we can do that provides the failover like how dnsme does it?

Read the article
Failover of mirrored webservice

- by eflat

We are using a webservice (over http) which has 2 mirrored servers, accessible as, say www.blah.com and www2.blah.com. Is there a software solution that would help us handle failover? Currently if one server becomes unavailable, I need to manually edit our config to point at the other server. Failover on their side is "in the works", so I don't want to do checking for server availability in code. On our side, we use a mix of linux and windows boxes, so we have both os's in the cage.

Read the article
PostgreSQL failover cluster on Windows Server

- by user36997

We are looking for advice on how to setup a basic failover cluster for our application: We will be using 4 machines running Microsoft Windows Server (most probably 2003). All four will always run our application, which is essentially a web service. Load balancing is "outsourced" - somebody else handles the distribution of the web requests among the servers. Only one of the servers will be running the PostgreSQL server actively at any given time. Another server (of the four) also has the DB installed, but is on standby/passive. The DB data is stored on shared storage. No copying data between servers. Reads are done very frequently by many end-users, and in rather small chunks of data. Writes are done much less frequently, by less users, and in very large bulks of data. Now, how can one configure Microsoft Cluster Service to keep only one instance of the DB server and 4 instances (1 per server) of our application at all times? And does PostgreSQL integrate neatly with MSCS at all? Update: Instead of keeping the data on shared storage, I also consider using log shipping to replicate data on a couple of DB servers. There are two issues with this option: Log shipping only makes sure that I have a second server that gets all of the data and is ready to take over. How do I implement the actual failure detection and failover switch? Switching back: Suppose the master fails and the system automatically fails over to the slave, and later the master comes back online. I understand that with WAL shipping this will require to reconfigure the log shipping once again, and that switching back is far from seamless. Is that so?

Read the article
DNS failover in a two datacenter scenario

- by wanson

I'm trying to implement a low-cost solution for website high availability. I'm looking for the downsides of the following scenario: I have two servers with the same configuration, content, mysql replication (dual-master). They are in different datacenters - let's call them serverA and serverB. Users use serverA - serverB is more like a backup. Now, I want to use DNS failover, to switch users from serverA to serverB when serverA goes down. My idea is that I setup DNS servers (bind/powerdns) on serverA and serverB - let's call them ns1.website.com and ns2.website.com (assuming I own website.com). Then I configure my domain to use them as its nameservers. Both DNS servers will return serverA IP as my website's IP. If serverA goes down I can (either manually or automatically from serverB) change configuration of serverB's DNS, to return IP of serverB as website's IP. Of course the TTL will be low, as it's supposed to be in DNS failovers. I know that it may take some time to switch to serverB (DNS ttl, time to detect serverA failure, serverB DNS reconfiguration etc), and that some small part of users won't use serverB anyway. And I'm OK with that. But what are other downsides of such an approach? An alternative scenario is that ns1.website.com will return serverA IP as website's IP, and ns2.website.com will return serverB IP as website's IP. But AFAIK clients not always use primary nameserver and sometimes would use secondary one. So some small part of users would use serverB instead of serverA which is not quite what I'd like. Can you confirm that DNS clients behave like that and can you tell what percentage of clients would possibly use serverB instead of serverA (statistically)? This one also has the downside that when serverA goes back up, it will be automatically used as website's primary server, which is also a bad situation (cold cache, mysql replication could fail in the meantime etc). So I'm adding it only as a theoretical alternative. I was thinking about using some professional DNS failover companies but they charge for the number of DNS requests and the fees are very high (why?)

Read the article
Failover Cluster Quorum Failing

- by oruchreis

Hi, I have two nodes which boots from iscsi to implement windows 2008 cluster. And I'm using disk majority option as quorum over iscsi. But when the quorum's iscsi connection failed(May be san server reset), the failover cluster is failed too. If I reset one of the nodes, it can open, but its system disk goes offline. I cant change its status as online, because it says that its reserved by failover cluster(disk is on iscsi, beacuse iscsi boot). And this disk works as readonly. Anything on it cant be deleted or written. So, I cant rejoin the node to the cluster again. I have to reinstall windows. So, what I'm asking is, how can I implement more quorum backup? I mean, can I use both disk majority and file share majority at same time? AFAIK, every nodes also keep the quorum's copy too. But I don't know sometimes san servers goes offline. And quorum's iscsi connection and nodes' iscsi connections get lost. So, nor the quorum that is kept in the nodes neither the quorum iscsi disk is not enough to start the cluster again. I want to use both disk majority and file share majority at the same time. Can I do this? Have you any other suggestion? Regards.

Read the article

Search Results

Search found 504 results on 21 pages for 'failover'.

Page 1/21 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by user974896

- by Claus Jandausch

- by IMB

- by StanleyGu

- by Dan

- by Slowjoe

- by jlichauc

- by user1089770

- by DaveB

- by Alasdair Allan

- by James Santiago

- by yummm

- by IMB

- by ABrown

- by Kevin Campion

- by z8000

- by Tom O'Connor

- by Sean O

- by Lin

- by Jedi Master Spooky

- by Jae Lee

- by eflat

- by user36997

- by wanson

- by oruchreis

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >