Search Results

Search found 504 results on 21 pages for 'failover'.

Page 19/21 | < Previous Page | 15 16 17 18 19 20 21 | Next Page >

Database Mirroring on SQL Server Express Edition

- by Most Valuable Yak (Rob Volk)

Like most SQL Server users I'm rather frustrated by Microsoft's insistence on making the really cool features only available in Enterprise Edition. And it really doesn't help that they changed the licensing for SQL 2012 to be core-based, so now it's like 4 times as expensive! It almost makes you want to go with Oracle. That, and a desire to have Larry Ellison do things to your orifices. And since they've introduced Availability Groups, and marked database mirroring as deprecated, you'd think they'd make make mirroring available in all editions. Alas…they don't…officially anyway. Thanks to my constant poking around in places I'm not "supposed" to, I've discovered the low-level code that implements database mirroring, and found that it's available in all editions! It turns out that the query processor in all SQL Server editions prepends a simple check before every edition-specific DDL statement: IF CAST(SERVERPROPERTY('Edition') as nvarchar(max)) NOT LIKE '%e%e%e% Edition%' print 'Lame' else print 'Cool' If that statement returns true, it fails. (the print statements are just placeholders) Go ahead and test it on Standard, Workgroup, and Express editions compared to an Enterprise or Developer edition instance (which support everything). Once again thanks to Argenis Fernandez (b | t) and his awesome sessions on using Sysinternals, I was able to watch the exact process SQL Server performs when setting up a mirror. Surprisingly, it's not actually implemented in SQL Server! Some of it is, but that's something of a smokescreen, the real meat of it is simple filesystem primitives. The NTFS filesystem supports links, both hard links and symbolic, so that you can create two entries for the same file in different directories and/or different names. You can create them using the MKLINK command in a command prompt: mklink /D D:\SkyDrive\Data D:\Data mklink /D D:\SkyDrive\Log D:\Log This creates a symbolic link from my data and log folders to my Skydrive folder. Any file saved in either location will instantly appear in the other. And since my Skydrive will be automatically synchronized with the cloud, any changes I make will be copied instantly (depending on my internet bandwidth of course). So what does this have to do with database mirroring? Well, it seems that the mirroring endpoint that you have to create between mirror and principal servers is really nothing more than a Skydrive link. Although it doesn't actually use Skydrive, it performs the same function. So in effect, the following statement: ALTER DATABASE Mir SET PARTNER='TCP://MyOtherServer.domain.com:5022' Is turned into: mklink /D "D:\Data" "\\MyOtherServer.domain.com\5022$" The 5022$ "port" is actually a hidden system directory on the principal and mirror servers. I haven't quite figured out how the log files are included in this, or why you have to SET PARTNER on both principal and mirror servers, except maybe that mklink has to do something special when linking across servers. I couldn't get the above statement to work correctly, but found that doing mklink to a local Skydrive folder gave me similar functionality. To wrap this up, all you have to do is the following: Install Skydrive on both SQL Servers (principal and mirror) and set the local Skydrive folder (D:\SkyDrive in these examples) On the principal server, run mklink /D on the data and log folders to point to SkyDrive: mklink /D D:\SkyDrive\Data D:\Data On the mirror server, run the complementary linking: mklink /D D:\Data D:\SkyDrive\Data Create your database and make sure the files map to the principal data and log folders (D:\Data and D:\Log) Viola! Your databases are kept in sync on multiple servers! One wrinkle you will encounter is that the mirror server will show the data and log files, but you won't be able to attach them to the mirror SQL instance while they are attached to the principal. I think this is a bug in the Skydrive, but as it turns out that's fine: you can't access a mirror while it's hosted on the principal either. So you don't quite get automatic failover, but you can attach the files to the mirror if the principal goes offline. It's also not exactly synchronous, but it's better than nothing, and easier than either replication or log shipping with a lot less latency. I will end this with the obvious "not supported by Microsoft" and "Don't do this in production without an updated resume" spiel that you should by now assume with every one of my blog posts, especially considering the date.

Read the article
Oracle WebLogic Server and Oracle Database: A Robust Infrastructure for your Applications

- by Ruma Sanyal

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 It has been said that a chain is as strong as its weakest link. Well, this is also true for your application infrastructure. Not only are the various components that constitute your infrastructure, like database and application server critical, the integration between these things [whether coming out of the box from your vendor or done in-house] is paramount. Imagine your database being down and your application server not knowing about it and as a result your application waiting indefinitely for a database response – not a great situation if high availability is critical to your application. Or one of your database nodes is very busy, but your application server doesn’t have the intelligence to decipher that – it keeps pinging the busy node when it can in fact get a response from another idle node much faster. This is what Oracle WebLogic and Database integration provides: Intelligent integration out of the box. Tight integration between Oracle WebLogic and Database makes your infrastructure robust enough that not only does each of your infrastructure component provide you with improved RASP [reliability availability, scalability, and performance] but these components work together to offer improved performance & availability, better resource sharing, inherent scalability, ease of configuration and automated management for your entire infrastructure. Oracle WebLogic Server is the only application server with this degree of integration to Oracle Database. With Oracle WebLogic Server 11g, we introduced Active GridLink for Real Application Clusters (RAC). In conjunction with Oracle Database, this powerful software technology simplifies management, increases availability, and ensures fast connection failover with runtime connection, load balancing and affinity capabilities. With the release of Oracle Database 12c this summer, even tighter integration between Oracle WebLogic Server 12c (12.1.2) and Oracle Database 12c has been achieved and this further optimizes the integration for a global cloud environment. Read about these capabilities in detail in the Oracle WebLogic-Database Integration Whitepaper. Get in depth ‘how-to’ details from this YouTube video on the topic from our resident expert, Monica Roccelli. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

Read the article
65536% Autogrowth!

- by Tara Kizer

Twice a year, we move our production systems to our disaster recovery site. Last Saturday night was one of those days. There are about 50 SQL Server databases to be moved to the DR site, which is done via database mirroring. It takes only a few seconds to failover, but some databases have a bit more involved work such as setting up replication. Everything went relatively smooth, but we encountered a weird bug on our most mission critical system. After everything was successfully failed over to the DR site, it was noticed that mirroring was in a suspended state on one of the databases. We thought we had run into a SQL Server 2005 bug that we had been encountering and were working with Microsoft on a fix. Microsoft did fix it in both SQL Server 2005 service pack 3 cumulative update package 13 and service pack 4 cumulative update package 2, however SP3 CU13 and SP4 both recently failed on this system so we were not patched yet with the bug fix. As the suspended state was causing us issues with replication, we dropped mirroring. We then noticed we had 10MB of free disk space on the mount point where the principal’s data files are stored. I knew something went amiss as this system should have at least 150GB free on that mount point. I immediately checked the main database’s data file and was shocked to see an autgrowth size of 65536%. The data file autogrew right before mirroring went into the suspended state. 65536%! I didn’t have a lot of time to research if this autgrowth problem was a known SQL Server bug, so I deferred that research to today. A quick Google search yielded no results but emphasis on “quick”. I checked our performance system, which was recently restored with a copy of the affected production database, and found the autogrowth setting to be 512MB. So this autogrowth bug was encountered sometime in the last two weeks. On February 26th, we had attempted to install SQL 2005 SP4 on production, however it had failed (PSS case open with Microsoft). I suspected that the SP4 failure was somehow related to this autgrowth bug although that turned out not to be the case. I then tweeted (@TaraKizer) about this problem to see if the SQL Server community (#sqlhelp) had any insights. It seems several people have either heard of this bug or encountered it. Aaron Bertrand (blog|twitter) referred me to this Connect item. Our affected database originated on SQL Server 2000 and was upgraded to SQL Server 2005 in 2007. Back on SQL Server 2000, we were using the default file growth setting which was a percentage. Sometime after the 2005 upgrade is when we changed it to 512MB. Our situation seemed to fit the bug Aaron referred to me, so now the question was whether Microsoft had fixed it yet. I received a reply to my tweet from Amit Banerjee (twitter) that it had been fixed in SP3 CU1 (KB958004). My affected system is SP3 CU8, so I was initially confused why we had encountered the bug. Because I don’t read things fully, I had missed that there are additional steps you have to follow after applying the bug fix. Amit set me straight. Although you can read this information in the KB article, I will also copy it here in case you are as lazy as me and miss the most important section of it (although if you are as lazy as me, you won’t have read this far down my blog post): This hotfix will prevent only future occurrences of this problem. For example, if you restore a database from SQL Server 2000 to a SQL Server 2005 instance that contains this hotfix, this problem will not occur. However, if you already have a database that is affected by this problem, you must follow these steps to resolve this problem manually: Apply this hotfix. Set the file growth settings for the affected files to percentage settings, and then set the settings back to megabyte settings. Take the database offline, and then bring it back online. Verify that the values of the is_percent_growth column are correct in the sys.database_files system table and in the sys.master_files system table.

Read the article
PASS Summit 2011 – Part II

- by Tara Kizer

I arrived in Seattle last Monday afternoon to attend PASS Summit 2011. I had really wanted to attend Gail Shaw’s (blog|twitter) and Grant Fritchey’s (blog|twitter) pre-conference seminar “All About Execution Plans” on Monday, but that would have meant flying out on Sunday which I couldn’t do. On Tuesday, I attended Allan Hirt’s (blog|twitter) pre-conference seminar entitled “A Deep Dive into AlwaysOn: Failover Clustering and Availability Groups”. Allan is a great speaker, and his seminar was packed with demos and information about AlwaysOn in SQL Server 2012. Unfortunately, I have lost my notes from this seminar and the presentation materials are only available on the pre-con DVD. Hmpf! On Wednesday, I attended Gail Shaw’s “Bad Plan! Sit!”, Andrew Kelly’s (blog|twitter) “SQL 2008 Query Statistics”, Dan Jones’ (blog|twitter) “Improving your PowerShell Productivity”, and Brent Ozar’s (blog|twitter) “BLITZ! The SQL – More One Hour SQL Server Takeovers”. In Gail’s session, she went over how to fix bad plans and bad query patterns. Update your stale statistics! How to fix bad plans Use local variables – optimizer can’t sniff it, so it’ll optimize for “average” value Use RECOMPILE (at the query or stored procedure level) – CPU hit OPTIMIZE FOR hint – most common value you’ll pass How to fix bad query patterns Don’t use them – ha! Catch-all queries Use dynamic SQL OPTION (RECOMPILE) Multiple execution paths Split into multiple stored procedures OPTION (RECOMPILE) Modifying parameter values Use local variables Split into outer and inner procedure OPTION (RECOMPILE) She also went into “last resort” and “very last resort” options, but those are risky unless you know what you are doing. For the average Joe, she wouldn’t recommend these. Examples are query hints and plan guides. While I enjoyed Andrew’s session, I didn’t take any notes as it was familiar material. Andrew is a great speaker though, and I’d highly recommend attending his sessions in the future. Next up was Dan’s PowerShell session. I need to look into profiles, manifests, function modules, and function import scripts more as I just didn’t quite grasp these concepts. I am attending a PowerShell training class at the end of November, so maybe that’ll help clear it up. I really enjoyed the Excel integration demo. It was very cool watching PowerShell build the spreadsheet in real-time. I must look into this more! On a side note, I am jealous of Dan’s hair. Fabulous hair! Brent’s session showed us how to quickly gather information about a server that you will be taking over database administration duties for. He wrote a script to do a fast health check and then later wrapped it into a stored procedure, sp_Blitz. I can’t wait to use this at my work even on systems where I’ve been the primary DBA for years, maybe there’s something I’ve overlooked. We are using EPM to help standardize our environment and uncover problems, but sp_Blitz will definitely still help us out. He even provides a cloud-based update feature, sp_BlitzUpdate, for sp_Blitz so you don’t have to constantly update it when he makes a change. I think I’ll utilize his update code for some other challenges that we face at my work.

Read the article
Oracle EMEA News Digest - May 2014

- by Steve Walker

Systems Oracle introduced a technology preview of an OpenStack® distribution that allows Oracle Linux and Oracle VM users to work with the open source cloud software. This provides customers with additional choices and interoperability while taking advantage of the efficiency, performance, scalability, and security of Oracle Linux and Oracle VM. The distribution is delivered as part of the Oracle Linux and Oracle VM Premier Support offerings, at no additional cost. Oracle plans to work further with the OpenStack community to develop and enhance its enterprise-class capabilities to meet customer demands. Also in the Open Source arena, Oracle announced the general availability of MySQL Fabric. MySQL Fabric provides an integrated system that makes it simpler to manage groups of MySQL databases. It delivers both high availability - via failure detection and failover - and scalability through automated data sharding. Oracle Database, Middleware and Technology The company made two announcements for Oracle Tuxedo, the #1 application server for C, C++, COBOL and Java deployments in private cloud or traditional data center environments. With enhanced management and monitoring features and tighter integration with Oracle technologies, the latest release of Oracle Tuxedo 12c enables organizations to dramatically increase application throughput, while reducing total cost of ownership and time to market for new application development and deployment. Oracle also introduced the latest release of its mainframe application rehosting platform, Oracle Tuxedo ART 12c, to help organizations speed up migration projects and accelerate the adoption of the new environment by current IT staff. It enables organizations to accelerate the rehosting of IBM mainframe applications and greatly enhance management and supportability of the rehosted applications while reducing costs and risk. Applications According to new Oracle studies, B2B and B2C commerce professionals find integrated, omni-channel customer experiences increasingly valuable to their organizations, and are continuing to invest in technologies and digital content strategies to facilitate them. The studies—one for B2B and one for B2C—surveyed e-commerce professionals in business and technology departments from around the world. Although the priorities, success metrics, and technology investments differed between the two groups, customer acquisition and retention emerged as common themes across B2B and B2C. Growing market share and enhancing customer experience are cited as top investment areas for all e-commerce professionals. In product news, Oracle announced the latest release of Oracle Business Intelligence (BI) Applications (version 11.1.1.8.1, in case anyone asks). It includes prebuilt connectors between Oracle Procurement and Spend Analytics and Oracle’s JD Edwards. Additionally, a new Oracle Human Resources Analytics module for developing and maintaining a skilled workforce has been introduced. In use at more than 4,000 companies worldwide, Oracle BI Applications support leading enterprise applications, including Oracle E-Business Suite, Oracle’s PeopleSoft, Oracle's Siebel CRM, Oracle’s JD Edwards EnterpriseOne offering high-performing analytics at a lower cost. Industries For the Communications Industry, Oracle has launched a new release of the Oracle Communications Core Session Manager. This gives CSPs a new way to design, deploy and manage complex networking services and embrace next-generation technology, It provides them with an immediate entry point for network function virtualization (NFV) efforts, allowing them to realize immediate benefits associated with network virtualization – including increased service agility and improved network resource sharing. And for the Utilities Industry, Oracle is releasing solutions with new business features and enhanced technical architecture that help position utilities for success now and into the future. Oracle has provided new releases for its customer information system, meter data management system, customer self-service solution and mobile workforce management solution.

Read the article
Cloud Adoption Challenges

- by Herve Roggero

Originally posted on: http://geekswithblogs.net/hroggero/archive/2013/11/07/cloud-adoption-challenges.aspxWhile cloud computing makes sense for most organizations and countless projects, I have seen customers significantly struggle with cloud adoption challenges. This blog post is not an attempt to provide a generic assessment of cloud adoption; rather it is an account of personal experiences in the field, some of which may or may not apply to your organization. Cloud First, Burst? In the rush to cloud adoption some companies have made the decision to redesign their core system with a cloud first approach. However a cloud first approach means that the system may not work anymore on-premises after it has been redesigned, specifically if the system depends on Platform as a Service (PaaS) components (such as Azure Tables). While PaaS makes sense when your company is in a position to adopt the cloud exclusively, it can be difficult to leverage with systems that need to work in different clouds or on-premises. As a result, some companies are starting to rethink their cloud strategy by designing for on-premises first, and modify only the necessary components to burst when needed in the cloud. This generally means that the components need to work equally well in any environment, which requires leveraging Infrastructure as a Service (IaaS) or additional investments for PaaS applications, or both. What’s the Problem? Although most companies can benefit from cloud computing, not all of them can clearly identify a business reason for doing so other than in very generic terms. I heard many companies claim “it’s cheaper”, or “it allows us to scale”, without any specific metric or clear strategy behind the adoption decision. Other companies have a very clear strategy behind cloud adoption and can precisely articulate business benefits, such as “we have a 500% increase in traffic twice a year, so we need to burst in the cloud to avoid doubling our network and server capacity”. Understanding the problem being solved through by adopting cloud computing can significantly help organizations determine the optimum path and timeline to adoption. Performance or Scalability? I stopped counting the number of times I heard “the cloud doesn’t scale; our database runs faster on a laptop”. While performance and scalability are related concepts, they are nonetheless different in nature. Performance is a measure of response time under a given load (meaning with a specific number of users), while scalability is the performance curve over various loads. For example one system could see great performance with 100 users, but timeout with 1,000 users, in which case the system wouldn’t scale. However another system could have average performance with 100 users, but display the exact same performance with 1,000,000 users, in which case the system would scale. Understanding that cloud computing does not usually provide high performance, but instead provides the tools necessary to build a scalable system (usually using PaaS services such as queuing and data federation), is fundamental to proper cloud adoption. Uptime? Last but not least, you may want to read the Service Level Agreement of your cloud provider in detail if you haven’t done so. If you are expecting 99.99% uptime annually you may be in for a surprise. Depending on the component being used, there may be no associated SLA at all! Other components may be restarted at any time, or services may experience failover conditions weekly ( or more) based on current overall conditions of the cloud service provider, most of which are outside of your control. As a result, for PaaS cloud environments (and to a certain extent some IaaS systems), applications need to assume failure and gracefully retry to be successful in the cloud in order to provide service continuity to end users. About Herve Roggero Herve Roggero, Windows Azure MVP, is the founder of Blue Syntax Consulting (http://www.bluesyntax.net). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Read the article
System Account Logon Failures ever 30 seconds

- by floyd

We have two Windows 2008 R2 SP1 servers running in a SQL failover cluster. On one of them we are getting the following events in the security log every 30 seconds. The parts that are blank are actually blank. Has anyone seen similar issues, or assist in tracking down the cause of these events? No other event logs show anything relevant that I can tell. Log Name: Security Source: Microsoft-Windows-Security-Auditing Date: 10/17/2012 10:02:04 PM Event ID: 4625 Task Category: Logon Level: Information Keywords: Audit Failure User: N/A Computer: SERVERNAME.domainname.local Description: An account failed to log on. Subject: Security ID: SYSTEM Account Name: SERVERNAME$ Account Domain: DOMAINNAME Logon ID: 0x3e7 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: Account Domain: Failure Information: Failure Reason: Unknown user name or bad password. Status: 0xc000006d Sub Status: 0xc0000064 Process Information: Caller Process ID: 0x238 Caller Process Name: C:\Windows\System32\lsass.exe Network Information: Workstation Name: SERVERNAME Source Network Address: - Source Port: - Detailed Authentication Information: Logon Process: Schannel Authentication Package: Kerberos Transited Services: - Package Name (NTLM only): - Key Length: 0 Second event which follows every one of the above events Log Name: Security Source: Microsoft-Windows-Security-Auditing Date: 10/17/2012 10:02:04 PM Event ID: 4625 Task Category: Logon Level: Information Keywords: Audit Failure User: N/A Computer: SERVERNAME.domainname.local Description: An account failed to log on. Subject: Security ID: NULL SID Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: Account Domain: Failure Information: Failure Reason: An Error occured during Logon. Status: 0xc000006d Sub Status: 0x80090325 Process Information: Caller Process ID: 0x0 Caller Process Name: - Network Information: Workstation Name: - Source Network Address: - Source Port: - Detailed Authentication Information: Logon Process: Schannel Authentication Package: Microsoft Unified Security Protocol Provider Transited Services: - Package Name (NTLM only): - Key Length: 0 EDIT UPDATE: I have a bit more information to add. I installed Network Monitor on this machine and did a filter for Kerberos traffic and found the following which corresponds to the timestamps in the security audit log. A Kerberos AS_Request Cname: CN=SQLInstanceName Realm:domain.local Sname krbtgt/domain.local Reply from DC: KRB_ERROR: KDC_ERR_C_PRINCIPAL_UNKOWN I then checked the security audit logs of the DC which responded and found the following: A Kerberos authentication ticket (TGT) was requested. Account Information: Account Name: X509N:<S>CN=SQLInstanceName Supplied Realm Name: domain.local User ID: NULL SID Service Information: Service Name: krbtgt/domain.local Service ID: NULL SID Network Information: Client Address: ::ffff:10.240.42.101 Client Port: 58207 Additional Information: Ticket Options: 0x40810010 Result Code: 0x6 Ticket Encryption Type: 0xffffffff Pre-Authentication Type: - Certificate Information: Certificate Issuer Name: Certificate Serial Number: Certificate Thumbprint: So appears to be related to a certificate installed on the SQL machine, still dont have any clue why or whats wrong with said certificate. It's not expired etc.

Read the article
backup and restoration of a freeipa infrastructure

- by Sirex

I'm finding the documentation on ipa server backup and restoration sadly lacking, and being so centrally critical it's not something i'm really happy about shooting in the dark with - could some kind soul more knowledable in the matter please attempt to provide an idiot-proof guide to backing up and restoring of IPA server(s) ? Particularly the main server (the cert signing one). ...We're looking towards rolling out ipa in a two server setup (1 master, 1 replica). I'm using dns srv records to handle failover, hence a loss of the replica isn't a big deal as i could make a new one and force a resync to happen - it's losing the master that bothered me. The thing that i'm really struggling with is locating a step-by-step procedure for backing up and restoring the master server. I'm aware that whole-VM snapshot is the recommended way of doing IPA server backup, but that isn't an option at this time for us. I'm also aware that freeipa 3.2.0 includes some sort of backup command build in, but that isn't in the ipa version of centos, and i don't expect it will be for some time yet. I've been trying many different methods, but none of them seem to restore cleanly, amongst others, i've tried; a command similar to db2ldif.pl -D "cn=directory manager" -w - -n userroot -a /root/userroot.ldif the script from here to produce three ldif files -- one for the domain ({domain}-userroot), and two for the ipa server (ipa-ipaca and ipa-userroot): Most of the restores i've tried have been similar to the form of: ldif2db.pl -D "cn=directory manager" -w - -n userroot -i userroot.ldif which seems to work and reports no errors, but totally borks the ipa install on the machine and i can no longer login with either the admin password on the backed up server, or the one i set it to on installation before attempting the ldif2db command (i'm installing ipa-server and running ipa-server-install, then attempting the restore). I'm not overly bothered about losing the CA, having to rejoin the domain, losing replication etc etc (although it'd be awesome if that could be avoided) but in the instance of the main server dropping i'd really like to avoid having to re-enter all the user/group information. I guess in the instance of losing the main server i could promote the other one and replicate in the other direction, but i've not tried that, either. Has anyone done that ? tl;dr: Can someone provide an idiots guide to backing up and restoring an IPA server (preferably on CentOS 6) in a clear enough way that'd make me feel confident it'll actually work when the dreaded time comes ? Crayons are optional, but appreciated ;-) I can't be the only person struggling with this, seeing how widely used IPA is, surely ?

Read the article
Tuning Linux IP routing parameters -- secret_interval and tcp_mem

- by Jeff Atwood

We had a little failover problem with one of our HAProxy VMs today. When we dug into it, we found this: Jan 26 07:41:45 haproxy2 kernel: [226818.070059] __ratelimit: 10 callbacks suppressed Jan 26 07:41:45 haproxy2 kernel: [226818.070064] Out of socket memory Jan 26 07:41:47 haproxy2 kernel: [226819.560048] Out of socket memory Jan 26 07:41:49 haproxy2 kernel: [226822.030044] Out of socket memory Which, per this link, apparently has to do with low default settings for net.ipv4.tcp_mem. So we increased them by 4x from their defaults (this is Ubuntu Server, not sure if the Linux flavor matters): current values are: 45984 61312 91968 new values are: 183936 245248 367872 After that, we started seeing a bizarre error message: Jan 26 08:18:49 haproxy1 kernel: [ 2291.579726] Route hash chain too long! Jan 26 08:18:49 haproxy1 kernel: [ 2291.579732] Adjust your secret_interval! Shh.. it's a secret!! This apparently has to do with /proc/sys/net/ipv4/route/secret_interval which defaults to 600 and controls periodic flushing of the route cache The secret_interval instructs the kernel how often to blow away ALL route hash entries regardless of how new/old they are. In our environment this is generally bad. The CPU will be busy rebuilding thousands of entries per second every time the cache is cleared. However we set this to run once a day to keep memory leaks at bay (though we've never had one). While we are happy to reduce this, it seems odd to recommend dropping the entire route cache at regular intervals, rather than simply pushing old values out of the route cache faster. After some investigation, we found /proc/sys/net/ipv4/route/gc_elasticity which seems to be a better option for keeping the route table size in check: gc_elasticity can best be described as the average bucket depth the kernel will accept before it starts expiring route hash entries. This will help maintain the upper limit of active routes. We adjusted elasticity from 8 to 4, in the hopes of the route cache pruning itself more aggressively. The secret_interval does not feel correct to us. But there are a bunch of settings and it's unclear which are really the right way to go here. /proc/sys/net/ipv4/route/gc_elasticity (8) /proc/sys/net/ipv4/route/gc_interval (60) /proc/sys/net/ipv4/route/gc_min_interval (0) /proc/sys/net/ipv4/route/gc_timeout (300) /proc/sys/net/ipv4/route/secret_interval (600) /proc/sys/net/ipv4/route/gc_thresh (?) rhash_entries (kernel parameter, default unknown?) We don't want to make the Linux routing worse, so we're kind of afraid to mess with some of these settings. Can anyone advise which routing parameters are best to tune, for a high traffic HAProxy instance?

Read the article
Linux not picking up new partition correctly on emc pseudo device

- by James

Hi We have a database server running oracle rac. We were recently running out of space on the main LUN that it is attached to. I created a new 100GB LUN and concatenated this onto the existing LUN creating a new MetaLUN. After some messing I managed to get linux to recognise the new space. I then created a new partition in on the pseudo device, to use the new space. Previously when I have done this on other system the next step is to create an ASM disk on the new partition and add this disk to the oracle disk group. This however fails. I am aware of various issues with ASM and powerpath, but I don't think this is the issue here. As on while investigating the issue I discovered that one of the underlying logical device is not reflecting the size change. See below; Powermt displays all of the underlying logical units [root@XXXXX~]# powermt display dev=emcpowerd Pseudo name=emcpowerd CLARiiON ID=CKM00091500009 [VFRAC2] Logical device ID=6006016030312200787502866C65DE11 [LUN 30] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP A, current=SP A Array failover mode: 1 ============================================================================== ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- ### HW Path I/O Paths Interf. Mode State Q-IOs Errors ============================================================================== 3 qla2xxx sde SP A0 active alive 0 0 3 qla2xxx sdj SP B0 active alive 0 0 4 qla2xxx sdo SP A1 active alive 0 0 4 qla2xxx sdt SP B1 active alive 0 0 Fdisk on the pseudo device shows correct space. [root@XXXXX ~]# fdisk -l /dev/emcpowerd Disk /dev/emcpowerd: 429.4 GB, 429496729600 bytes 255 heads, 63 sectors/track, 52216 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/emcpowerd1 1 39162 314568733+ 83 Linux /dev/emcpowerd2 39163 52216 104856255 83 Linux fdisk on one of the logical units is wrong [root@XXXXX~]# fdisk -l /dev/sde Disk /dev/sde: 322.1 GB, 322122547200 bytes 255 heads, 63 sectors/track, 39162 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sde1 1 39162 314568733+ 83 Linux /dev/sde2 39163 52216 104856255 83 Linux fdisk on the rest of the units is fine [root@XXXXX ~]# fdisk -l /dev/sdj Disk /dev/sdj: 429.4 GB, 429496729600 bytes 255 heads, 63 sectors/track, 52216 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdj1 1 39162 314568733+ 83 Linux /dev/sdj2 39163 52216 104856255 83 Linux Also when I created the the partition linux did not create the any entries in the /dev directory for the second partition so I created these manually [root@XXXXX dev]# mknod sde2 b 8 66 [root@XXXXX dev]# ls -al sd[ejot]? brw-r----- 1 root disk 8, 65 Dec 29 14:20 sde1 brw-r--r-- 1 root disk 8, 66 Apr 8 20:31 sde2 brw-r----- 1 root disk 8, 145 Dec 29 14:19 sdj1 brw-r--r-- 1 root disk 8, 146 Apr 8 20:33 sdj2 brw-r----- 1 root disk 8, 225 Apr 6 23:12 sdo1 brw-r--r-- 1 root disk 8, 226 Apr 8 20:33 sdo2 brw-r----- 1 root disk 65, 49 Dec 29 14:19 sdt1 brw-r--r-- 1 root disk 65, 50 Apr 8 20:33 sdt2 This is a production server that we cannot easily reboot. Any ideas would be much appreciated. J

Read the article
RFC 1918 address on open internet?

- by longneck

In trying to diagnose a failover problem with my Cisco ASA 5520 firewalls, I ran a traceroute to www.btfl.com and, much to my surprise, some of the hops came back as RFC 1918 addresses. Just to be clear, this host is not behind my firewall and there is no VPN involved. I have to connect across the open internet to get there. How/why is this possible? asa# traceroute www.btfl.com Tracing the route to 157.56.176.94 1 <redacted> 2 <redacted> 3 <redacted> 4 <redacted> 5 nap-edge-04.inet.qwest.net (67.14.29.170) 0 msec 10 msec 10 msec 6 65.122.166.30 0 msec 0 msec 10 msec 7 207.46.34.23 10 msec 0 msec 10 msec 8 * * * 9 207.46.37.235 30 msec 30 msec 50 msec 10 10.22.112.221 30 msec 10.22.112.219 30 msec 10.22.112.223 30 msec 11 10.175.9.193 30 msec 30 msec 10.175.9.67 30 msec 12 100.94.68.79 40 msec 100.94.70.79 30 msec 100.94.71.73 30 msec 13 100.94.80.39 30 msec 100.94.80.205 40 msec 100.94.80.137 40 msec 14 10.215.80.2 30 msec 10.215.68.16 30 msec 10.175.244.2 30 msec 15 * * * 16 * * * 17 * * * and it does the same thing from my FiOS connection at home: C:\>tracert www.btfl.com Tracing route to www.btfl.com [157.56.176.94] over a maximum of 30 hops: 1 1 ms <1 ms <1 ms myrouter.home [192.168.1.1] 2 8 ms 7 ms 8 ms <redacted> 3 10 ms 13 ms 11 ms <redacted> 4 12 ms 10 ms 10 ms ae2-0.TPA01-BB-RTR2.verizon-gni.net [130.81.199.82] 5 16 ms 16 ms 15 ms 0.ae4.XL2.MIA19.ALTER.NET [152.63.8.117] 6 14 ms 16 ms 16 ms 0.xe-11-0-0.GW1.MIA19.ALTER.NET [152.63.85.94] 7 19 ms 16 ms 16 ms microsoft-gw.customer.alter.net [63.65.188.170] 8 27 ms 33 ms * ge-5-3-0-0.ash-64cb-1a.ntwk.msn.net [207.46.46.177] 9 * * * Request timed out. 10 44 ms 43 ms 43 ms 207.46.37.235 11 42 ms 41 ms 40 ms 10.22.112.225 12 42 ms 43 ms 43 ms 10.175.9.1 13 42 ms 41 ms 42 ms 100.94.68.79 14 40 ms 40 ms 41 ms 100.94.80.193 15 * * * Request timed out.

Read the article
LACP : Cisco ASA 5515 & Switch ProCurve 2920

- by user979276

I've two ASAs 5515 connected in failover Active/Stand by (on Gi0/5) My two ASAs are connected to two Switch ProCurve 2920 to have HA if something happens. So I plug something like that (don't pay attention to the arrows) : So one the ASA, I created a Port-Channel like that : interface GigabitEthernet0/0 nameif outside security-level 0 ip address 192.168.1.3 255.255.255.0 standby 192.168.1.4 ! interface GigabitEthernet0/1 speed 1000 duplex full channel-group 1 mode passive no nameif no security-level no ip address ! interface GigabitEthernet0/2 speed 1000 duplex full channel-group 1 mode passive no nameif no security-level no ip address ! interface Port-channel1.1 vlan 1 nameif inside security-level 100 ip address 192.168.8.1 255.255.255.0 standby 192.168.8.2 ! interface Port-channel1.10 vlan 10 nameif guest security-level 50 ip address 172.16.100.2 255.255.255.224 standby 172.16.100.3 ! interface Port-channel1.16 vlan 16 nameif dmz security-level 50 ip address 192.168.16.1 255.255.255.0 standby 192.168.16.2 On the switch, I created a trunk LACP capable with the port 1 and 2 on each switch, force the speed to 1000 and put the port un full duplex mode. BUT this is not working... I tried many things and I can't make it work. In this configuration, I can't ping anything between my ASA and my Switch (or any object connected). Here what I get on my ASA : Channel group 1 LACP port Admin Oper Port Port Port Flags State Priority Key Key Number State ----------------------------------------------------------------------------- Gi0/2 SP not-bndl 32768 0x1 0x1 0x3 0xc Gi0/1 FP not-bndl 32768 0x1 0x1 0x2 0x6 And on the Switchs : PORT LACP TRUNK PORT LACP LACP NUMB ENABLED GROUP STATUS PARTNER STATUS ----- ------- ----- ------ ------- ------ 1 Active trk1 Broken Yes Failure 2 Active trk1 Broken Yes Failure If I change the Cisco interface to LACP mode On, I can ping the switch from the ASA but nothing other objects conneted on the switch. If I look at the statut of LACP on the switch I see this : PORT LACP TRUNK PORT LACP LACP NUMB ENABLED GROUP STATUS PARTNER STATUS ----- ------- ----- ------ ------- ------ 1 Active trk1 Up No Success 2 Active trk1 Up No Success I don't have any clue on what's going on so If someone have any idea and help me on this, it would be great ! Feel free to ask me anything if you need any more information ! Thanks a lot !

Read the article
Generating/managing config files for hosted application

- by mfinni

I asked a question about config management, and haven't seen a reply. It's possible my question was too vague, so let's get down to brass tacks. Here's the process we follow when onboarding a new customer instance into our hosted application : how would you manage this? I'm leaning towards a Perl script to populate templates to generate shell scripts, config files, XML config files, etc. Looking briefly at CFengine and Chef, it seems like they're not going to reduce the amount of work, because I'd still have to manually specify all of the changes/edits within the tool. Doesn't seem to be much of a gain over touching the config files directly. We add a stanza to the main config file for the core (3rd-party) application. This stanza has values that defines the instance (customer) name the TCP listener port for this instance (not one currently used) the DB2 database name (serial numeric identifier, already exists, they get prestaged for us by the DBAs) three sub-config files, by name - they need to be created from 3 templates and be named after the instance The sub-config files define: The filepath for the DB2 volumes The filepath for the storage of objects The filepath for just one of the DB2 volumes (yes, redundant to the first item. We run some application commands, start the instance We do some LDAP thingies (make an OU for the instance, etc.) We add a stanza to the config file for our security listener that acts as a passthrough to LDAP instance name LDAP OU TCP port for instance DB2 database name We restart the security listener (off-hours), change the main config file from item 1, stop and restart the instance. It is now authenticating via LDAP. We add the stop and start commands for this instance to the HA failover scripts. We import an XML config file into the instance that defines things for the actual application for the customer - user names, groups, permissions, and business rules. The XML is supplied by the implementation team. Now, we configure the dataloading application We add a stanza to the existing top-level config file that points to a new customer-level config file. The new customer-level config file includes: the instance (customer) name the DB2 database name arbitrary number of sub-config files, by name Each of the sub-config files defines: filepaths to the directories for ingestion, feedback, backup, and failure those filepaths have a common path to a customer-specific folder, and then one folder for each sub-config file Each of those filepaths needs to be created We need to add this customer instance to our monitoring scripts that confirm the proper processes are running and can be logged into. Of course, those monitoring config files include the instance name, the TCP port, the DB2 database name, etc. There's also a reporting application that needs to be configured for the new instance. You get the idea. There's also XML that is loaded into WAS by the middleware team. We give them the values for them to plug into the XML - they could very easily hand us the template and we could give them back completed XML.

Read the article
UCARP: prevent the original master from taking over the VIP when it comes back after failure?

- by quanta

Keepalived can do this by combining the nopreempt option and the BACKUP state on the both nodes: Prevent VRRP Master from becoming Master once it has failed Prevent master to fall back to master after failure How about the UCARP? Name : ucarp Arch : x86_64 Version : 1.5.2 Release : 1.el5.rf Size : 81 k Repo : installed Summary : Common Address Redundancy Protocol (CARP) for Unix URL : http://www.ucarp.org/ License : BSD Description: UCARP allows a couple of hosts to share common virtual IP addresses in order : to provide automatic failover. It is a portable userland implementation of the : secure and patent-free Common Address Redundancy Protocol (CARP, OpenBSD's : alternative to the patents-bloated VRRP). : Strong points of the CARP protocol are: very low overhead, cryptographically : signed messages, interoperability between different operating systems and no : need for any dedicated extra network link between redundant hosts. If I don't use the --preempt option and set the --advskew to the same value, both nodes become master. /etc/sysconfig/carp/vip-010.conf # Virtual IP configuration file for UCARP # The number (from 001 to 255) in the name of the file is the identifier # $Id: vip-001.conf.example 1527 2004-07-09 15:23:54Z dude $ # Set the same password on all mamchines sharing the same virtual IP PASSWORD="pa$$w0rd" # You are required to have an IPADDR= line in the configuration file for # this interface (so no DHCP allowed) BIND_INTERFACE="eth0" # Do *NOT* use a main interface for the virtual IP, use an ethX:Y alias # with the corresponding /etc/sysconfig/network-scripts/ifcfg-ethX:Y file # already configured and ith ONBOOT=no VIP_INTERFACE="eth0:0" # If you have extra options to add, see "ucarp --help" output # (the lower the "-k <val>" the higher priority and "-P" to become master ASAP) OPTIONS="-z -k 255" /etc/sysconfig/network-scripts/ifcfg-eth0:0 DEVICE=eth0:0 ONBOOT=no BOOTPROTO= IPADDR=192.168.6.8 NETMASK=255.255.255.0 USERCTL=yes IPV6INIT=no node 1: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether c6:9b:8e:af:a7:69 brd ff:ff:ff:ff:ff:ff inet 192.168.6.192/24 brd 192.168.6.255 scope global eth0 inet 192.168.6.8/24 brd 192.168.6.255 scope global secondary eth0:0 inet6 fe80::c49b:8eff:feaf:a769/64 scope link valid_lft forever preferred_lft forever node 2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:30:48:f7:0f:81 brd ff:ff:ff:ff:ff:ff inet 192.168.6.38/24 brd 192.168.6.255 scope global eth1 inet 192.168.6.8/24 brd 192.168.6.255 scope global secondary eth1:0 inet6 fe80::230:48ff:fef7:f81/64 scope link valid_lft forever preferred_lft forever

Read the article
Server 2008 Likes to restart itself

- by Campo

I have a weird issue here. I notice about once a week the web server restarts itself. This would be only a minor issue if we were not planning on implementing an IP failover. I have checked the event logs. I don't see anything that indicates a reason for the restart. I need some help diagnosing the reason the server restarts. It happened last night at 5:00AM Last even in the log was 1 hour before the unexpected shutdown. Here is the Log for the shutdown event. Any help is much appreciated. I know there isn't much to go on yet. Log Name: System Source: EventLog Date: 5/5/2010 5:01:12 AM Event ID: 6008 Task Category: None Level: Error Keywords: Classic User: N/A Computer: SERVERNAME Description: The previous system shutdown at 4:56:41 AM on 5/5/2010 was unexpected. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="EventLog" /> <EventID Qualifiers="32768">6008</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2010-05-05T09:01:12.000Z" /> <EventRecordID>346094</EventRecordID> <Channel>System</Channel> <Computer>SERVERNAME</Computer> <Security /> </System> <EventData> <Data>4:56:41 AM</Data> <Data>5/5/2010</Data> <Data> </Data> <Data> </Data> <Data>39594</Data> <Data> </Data> <Data> </Data> <Binary>DA070500030005000400380029008E03DA070500030005000800380029008E033C0000003C000000000 000000000000000000000000000000100000000000000</Binary> </EventData> </Event>

Read the article
What the best way to achieve RPO of zero and lowest possible RTO (less than 15 minutes) with SQL 2008 R2?

- by Adrian Hope-Bailie

We are running a payments (EFT transaction processing) application which is processing high volumes of transactions 24/7 and are currently investigating a better way of doing DB replication to our disaster recovery site. Our current and previous strategies have included using both DoubleTake and Redgate to replicate data to a warm stand-by. DoubleTake is the supported solution from the payments software vendor however their (DoubleTake's) support in South Africa is very poor. We had a few issues and simply couldn't ever resolve them so we had to give up on DoubleTake. We have been using Redgate to manually read the data from the primary site (via queries) and write to the DR site but this is: A bad solution Getting the software vendor hot and bothered whenever we have support issues as it has a tendency to interfere with the payment application which is very DB intensive. We recently upgraded the whole system to run on SQL 2008 R2 Enterprise which means we should probably be looking at using some of the built-in replication features. The server has 2 fairly large databases with a mixture of tables containing highly volatile transactional data and pretty static configuration data. Replication would be done over a WAN link to a separate physical site and needs to achieve the following objectives. RPO: Zero loss - This is transactional data with financial impact so we can't lose anything. RTO: Tending to zero - The business depends on our ability to process transactions every minute we are down we are losing money I have looked at a few of the other questions/answers but none meet our case exactly: SQL Server 2008 failover strategy - Log shipping or replication? How to achieve the following RTO & RPO with logshipping only using SQL Server? What is the best of two approaches to achieve DB Replication? My current thinking is that we should use mirroring but I am concerned that for RPO:0 we will need to do delayed commits and this could impact the performance of the primary DB which is not an option. Our current DR process is to: Stop incoming traffic to the primary site and allow all in-flight transaction to complete. Allow the replication to DR to complete. Change network routing to route to DR site. Start all applications and services on the secondary site (Ideally we can change this to a warmer stand-by whereby the applications are already running but not processing any transactions). In other words the DR database needs to, as quickly as possible, catch up with primary and be ready for processing as the new primary. We would then need to be able to reverse this when we are ready to switch back. Is there a better option than mirroring (should we be doing log-shipping too) and can anyone suggest other considerations that we should keep in mind?

Read the article
Interview with Tomas Ulin at the MySQL Innovation Day

- by Monica Kumar

MySQL Innovation Day held on June 5, 2012 was a great event for the MySQL engineers, users and customers to gather, share and network. I was able to get a few minutes with Tomas Ulin, Vice President of MySQL Engineering at Oracle, to ask him some questions. Here are the highlights of my interview with Tomas. Monica: This was the first MySQL Innovation Day, correct? Why now, what was the strategy behind hosting this kind of event? Tomas: In the last year, we have rolled out an incredible number of MySQL events worldwide – some targeted at developers that are new to MySQL and others for the MySQL savvy. At the MySQL Innovation Day, our first event of this kind,, we had a number of our key engineers presenting lightning talks delivering previews of key new features as well as discussing roadmap. Our goal is to keep an open dialogue with the MySQL community. In fact, we are hosting a two-day conference, another first, for the MySQL community called MySQL Connect on Sept. 29-30 in San Francisco. If you attended the MySQL Innovation Day and liked what we did, you are going to love MySQL Connect. We’ll have a lot more of our engineers and many users and community members presenting hour long sessions and hands on labs. Our engineers will be presenting new MySQL features as well offer previews of upcoming enhancements. Monica: What's the big take-away from today's MySQL Innovation Day? Tomas: I hope the most important takeaway for attendees was to see that Oracle has been driving, and continues to drive MySQL innovation with a steady stream of new great GA and Development Milestone releases. Monica: What were attendees most interested in? What feedback did they have? Tomas: Feedback from attendees was incredibly positive and encouraging. In particular, they liked the interaction with the MySQL engineers and were also excited about the new early access features in MySQL 5.6 and MySQL Cluster 7.3. In addition, sessions delivered by MySQL users like Facebook, Pinterest and Twitter were very well received. For example, Pinterest talked about using MySQL to scale from 0 to billions of page views/month, Twitter talked about “Scaling twitter with MySQL” and Facebook discussed the many options to implement MySQL master failover solutions. The presentations are already available for download while some of the session videos will be made available on the MySQL Innovation Day web page shortly. Monica: How would you distinguish the use of MySQL vs. Oracle Database? What key factors should customers consider? Tomas: MySQL and Oracle Database complement each other. They are very different products, best suited to different use cases. Customers can choose world-class solutions from Oracle to fulfill a variety of needs. MySQL is a great choice for enterprise web-based, custom and embedded apps. Oracle Database is the leading choice for enterprise packaged applications such as ERP, CRM as well as high-end data warehousing and business intelligence applications. Monica: What are the highlights of the current MySQL 5.6 Development Milestone Release and early access features for MySQL Cluster 7.3? Tomas: MySQL 5.6 development milestone release builds on MySQL 5.5 by improving: Optimizer for better Performance, Scalability Performance Schema for better instrumentation InnoDB for better transactional throughput Replication for higher availability, data integrity NoSQL options for more flexibility We announced some new early access features in MySQL 5.6, including binary log group commit. We also announced early access features in MySQL Cluster 7.3 including support for foreign key constraints. Monica: How do people get these releases? Tomas: You can access development milestone releases by going to: http://dev.mysql.com/downloads/mysqlThen select the “Development Release” tab. The MySQL Cluster 7.3 and other early access features can be downloaded at: http://labs.mysql.com Monica: What's coming up next for MySQL? Tomas: Our development team is working in overdrive, cranking out new features with community feedback. Don’t miss the MySQL Connect conference being held in San Francisco on Sept. 29 and 30th. My team and I will be there. I hope you can join us! Monica: Thank you for your time, Tomas. I look forward to seeing you at the MySQL Connect conference. To our followers, I hope you found this interview informative. I welcome your comments. Please stay tuned here for more updates on MySQL. Note: Monica Kumar is Senior Director of product marketing for Linux, Virtualization and MySQL at Oracle.

Read the article
SQL SERVER – TechEd India 2012 – Content, Speakers and a Lots of Fun

- by pinaldave

TechEd is one event which every developers and IT professionals are looking forward to attend. It is opportunity of life time and no matter how many time one gets chance to engage with it, it is never enough. I still remember every single moment of every TechEd I have attended so far. We are less than 100 hours away from TechEd India 2012 event.This event is the one must attend event for every Technology Enthusiast. Fourth time in the row I am going to attend this event and I am equally excited as the first time of the event. There are going to be two very solid SQL Server track this time and I will be attending end of the end both the tracks. Here is my view on each of the 10 sessions. Each session is carefully crafted and leading exeprts from industry will present it. Day 1, March 21, 2012 T-SQL Rediscovered with SQL Server 2012 – This session is going to bring some of the lesser known enhancements that were brought with SQL Server 2012. When I learned that Jacob Sebastian is going to do this session my reaction to this is DEMO, DEMO and DEMO! Jacob spends hours and hours of his time preparing his session and this will be one of those session that I am confident will be delivered over and over through out the next many events. Catapult your data with SQL Server 2012 Integration Services – Praveen is expert story teller and one of the wizard when it is about SQL Server and business intelligence. He is surely going to mesmerize you with some interesting insights on SSIS performance too. Processing Big Data with SQL Server 2012 and Hadoop – There are three sessions on Big Data at TechEd India 2012. Stephen is going to deliver one of the session. Watching Stephen present is always joy and quite entertaining. He shares knowledge with his typical humor which captures ones attention. I wrote about what is BIG DATA in a blog post. SQL Server Misconceptions and Resolutions – I will be presenting this Session along with Vinod Kumar. READ MORE HERE. Securing with ContainedDB in SQL Server 2012 – Pranab is expert when it is about SQL Server and Security. I have seen him presenting and he is indeed very pleasant to watch. A dry subject like security, he makes it much lively. A Contained Database is a database which contains all the necessary settings and metadata, making database easily portable to another server. This database will contain all the necessary details and will not have to depend on any server where it is installed for anything. You can take this database and move it to another server without having any worries. Day 3, March 23, 2012 Peeling SQL Server like an Onion: Internals Demystified – Vinod Kumar has been writing about this extensively on his other blog post. In recent conversation he suggested that he will be creating very exclusive content for this presentation. I know Vinod for long time and have worked with him along many community activities. I am going to pay special attention to the details. I know Vinod has few give-away planned now for attending the session now only if he shares with us. Speed Up – Parallel Processes and unparalleled Performance – Performance tuning is my favorite subject. I will be discussing effect of parallelism on performance in this session. Here me out, there will be lots of quiz questions during this session and if you get the answers correct – you can win some really cool goodies – I Promise! READ MORE HERE. Keep your database available – AlwaysOn – Balmukund is like an army man. He is always ready to show and prove that he has coolest toys in terms of SQL Server and he knows how to keep them running AlwaysON. Availability groups, Listener, Clustering, Failover, Read-Only replica etc all will be demo’ed in this session. This is really heavy but very interesting content not to be missed. Lesser known facts about SQL Server Backup and Restore – Amit Banerjee – this name is known internationally for solving SQL Server problems in 140 characters. He has already blogged about this and this topic is going to be interesting. A successful restore strategy for applications is as good as their last good known backup. I have few difficult questions to ask to Amit and I am very sure that his unique style will entertain people. By the way, his one of the slide may give few in audience a funny heart attack. Top 5 reasons why you want SQL Server 2012 BI – Praveen plans to take a tour of some of the BI enhancements introduced in the new version. Business Insights with SQL Server is a critical building block and this version of SQL Server is no exception. For the matter of the fact, when I saw the demos he was going to show during this session, I felt like that I wish I can set up all of this on my machine. If you miss this session – you will miss one of the most informative session of the day. Also TechEd India 2012 has a Live streaming of some content and this can be watched here. The TechEd Team is planning to have some really good exclusive content in this channel as well. If you spot me, just do not hesitate to come by me and introduce yourself, I want to remember you! Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLServer, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article
PASS Summit 2011 – Part III

- by Tara Kizer

Well we’re about a month past PASS Summit 2011, and yet I haven’t finished blogging my notes! Between work and home life, I haven’t been able to come up for air in a bit. Now on to my notes… On Thursday of the PASS Summit 2011, I attended Klaus Aschenbrenner’s (blog|twitter) “Advanced SQL Server 2008 Troubleshooting”, Joe Webb’s (blog|twitter) “SQL Server Locking & Blocking Made Simple”, Kalen Delaney’s (blog|twitter) “What Happened? Exploring the Plan Cache”, and Paul Randal’s (blog|twitter) “More DBA Mythbusters”. I think my head grew two times in size from the Thursday sessions. Just WOW! I took a ton of notes in Klaus' session. He took a deep dive into how to troubleshoot performance problems. Here is how he goes about solving a performance problem: Start by checking the wait stats DMV System health Memory issues I/O issues I normally start with blocking and then hit the wait stats. Here’s the wait stat query (Paul Randal’s) that I use when working on a performance problem. He highlighted a few waits to be aware of such as WRITELOG (indicates IO subsystem problem), SOS_SCHEDULER_YIELD (indicates CPU problem), and PAGEIOLATCH_XX (indicates an IO subsystem problem or a buffer pool problem). Regarding memory issues, Klaus recommended that as a bare minimum, one should set the “max server memory (MB)” in sp_configure to 2GB or 10% reserved for the OS (whichever comes first). This is just a starting point though! Regarding I/O issues, Klaus talked about disk partition alignment, which can improve SQL I/O performance by up to 100%. You should use 64kb for NTFS cluster, and it’s automatic in Windows 2008 R2. Joe’s locking and blocking presentation was a good session to really clear up the fog in my mind about locking. One takeaway that I had no idea could be done was that you can set a timeout in T-SQL code view LOCK_TIMEOUT. If you do this via the application, you should trap error 1222. Kalen’s session went into execution plans. The minimum size of a plan is 24k. This adds up fast especially if you have a lot of plans that don’t get reused much. You can use sys.dm_exec_cached_plans to check how often a plan is being reused by checking the usecounts column. She said that we can use DBCC FLUSHPROCINDB to clear out the stored procedure cache for a specific database. I didn’t know we had this available, so this was great to hear. This will be less intrusive when an emergency comes up where I’ve needed to run DBCC FREEPROCCACHE. Kalen said one should enable “optimize for ad hoc workloads” if you have an adhoc loc. This stores only a 300-byte stub of the first plan, and if it gets run again, it’ll store the whole thing. This helps with plan cache bloat. I have a lot of systems that use prepared statements, and Kalen says we simulate those calls by using sp_executesql. Cool! Paul did a series of posts last year to debunk various myths and misconceptions around SQL Server. He continues to debunk things via “DBA Mythbusters”. You can get a PDF of a bunch of these here. One of the myths he went over is the number of tempdb data files that you should have. Back in 2000, the recommendation was to have as many tempdb data files as there are CPU cores on your server. This no longer holds true due to the numerous cores we have on our servers. Paul says you should start out with 1/4 to 1/2 the number of cores and work your way up from there. BUT! Paul likes what Bob Ward (twitter) says on this topic: 8 or less cores –> set number of files equal to the number of cores Greater than 8 cores –> start with 8 files and increase in blocks of 4 One common myth out there is to set your MAXDOP to 1 for an OLTP workload with high CXPACKET waits. Instead of that, dig deeper first. Look for missing indexes, out-of-date statistics, increase the “cost threshold for parallelism” setting, and perhaps set MAXDOP at the query level. Paul stressed that you should not plan a backup strategy but instead plan a restore strategy. What are your recoverability requirements? Once you know that, now plan out your backups. As Paul always does, he talked about DBCC CHECKDB. He said how fabulous it is. I didn’t want to interrupt the presentation, so after his session had ended, I asked Paul about the need to run DBCC CHECKDB on your mirror systems. You could have data corruption occur at the mirror and not at the principal server. If you aren’t checking for data corruption on your mirror systems, you could be failing over to a corrupt database in the case of a disaster or even a planned failover. You can’t run DBCC CHECKDB against the mirrored database, but you can run it against a snapshot off the mirrored database.

Read the article
Part 1 - Load Testing In The Cloud

- by Tarun Arora

Azure is fascinating, but even more fascinating is the marriage of Azure and TFS! Introduction Recently a client I worked for had 2 major business critical applications being delivered, with very little time budgeted for Performance testing, we immediately hit a bottleneck when the performance testing phase started, the in house infrastructure team could not support the hardware requirements in the short notice. It was suggested that the performance testing be performed on one of the QA environments which was a fraction of the production environment. This didn’t seem right, the team decided to turn to the cloud. The team took advantage of the elasticity offered by Azure, starting with a single test agent which was provisioned and ready for use with in 30 minutes the team scaled up to 17 test agents to perform a very comprehensive performance testing cycle. Issues were identified and resolved but the highlight was that the cost of running the ‘test rig’ proved to be less than if hosted on premise by the infrastructure team. Thank you for taking the time out to read this blog post, in the series of posts, I’ll try and cover the start to end of everything you need to know to use Azure to build your Test Rig in the cloud. But Why Azure? I have my own Data Centre… If the environment is provisioned in your own datacentre, - No matter what level of service agreement you may have with your infrastructure team there will be down time when the environment is patched - How fast can you scale up or down the environments (keeping the enterprise processes in mind) Administration, Cost, Flexibility and Scalability are the areas you would want to think around when taking the decision between your own Data Centre and Azure! How is Microsoft's Public Cloud Offering different from Amazon’s Public Cloud Offering? Microsoft's offering of the Cloud is a hybrid of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) which distinguishes Microsoft's offering from other providers such as Amazon (Amazon only offers IaaS). PaaS – Platform as a Service IaaS – Infrastructure as a Service Fills the needs of those who want to build and run custom applications as services. Similar to traditional hosting, where a business will use the hosted environment as a logical extension of the on-premises datacentre. A service provider offers a pre-configured, virtualized application server environment to which applications can be deployed by the development staff. Since the service providers manage the hardware (patching, upgrades and so forth), as well as application server uptime, the involvement of IT pros is minimized. On-demand scalability combined with hardware and application server management relieves developers from infrastructure concerns and allows them to focus on building applications. The servers (physical and virtual) are rented on an as-needed basis, and the IT professionals who manage the infrastructure have full control of the software configuration. This kind of flexibility increases the complexity of the IT environment, as customer IT professionals need to maintain the servers as though they are on-premises. The maintenance activities may include patching and upgrades of the OS and the application server, load balancing, failover clustering of database servers, backup and restoration, and any other activities that mitigate the risks of hardware and software failures. The biggest advantage with PaaS is that you do not have to worry about maintaining the environment, you can focus all your time in solving the business problems with your solution rather than worrying about maintaining the environment. If you decide to use a VM Role on Azure, you are asking for IaaS, more on this later. A nice blog post here on the difference between Saas, PaaS and IaaS. Now that we are convinced why we should be turning to the cloud and why in specific Azure, let’s discuss about the Test Rig. The Load Test Rig – Topology Now the moment of truth, Of course a big part of getting value from cloud computing is identifying the most adequate workloads to take to the cloud, so I’ve decided to try to make a Load Testing rig where the Agents are running on Windows Azure. I’ll talk you through the above Topology, - User: User kick starts the load test run from the developer workstation on premise. This passes the request to the Test Controller. - Test Controller: The Test Controller is on premise connected to the same domain as the developer workstation. As soon as the Test Controller receives the request it makes use of the Windows Azure Connect service to orchestrate the test responsibilities to all the Test Agents. The Windows Azure Connect endpoint software must be active on all Azure instances and on the Controller machine as well. This allows IP connectivity between them and, given that the firewall is properly configured, allows the Controller to send work loads to the agents. In parallel, the Controller will collect the performance data from the agents, using the traditional WMI mechanisms. - Test Agents: The Test Agents are on the Windows Azure Public Cloud, as soon as the test controller issues instructions to the test agents, the test agents start executing the load tests. The HTTP requests are issued against the web server on premise, the results are captured by the test agents. And finally the results are passed over to the controller. - Servers: The Web Server and DB Server are hosted on premise in the datacentre, this is usually the case with business critical applications, you probably want to manage them your self. Recap and What’s next? So, in the introduction in the series of blog posts on Load Testing in the cloud I highlighted why creating a test rig in the cloud is a good idea, what advantages does Windows Azure offer and the Test Rig topology that I will be using. I would also like to mention that i stumbled upon this [Video] on Azure in a nutshell, great watch if you are new to Windows Azure. In the next post I intend to start setting up the Load Test Environment and discuss pricing with respect to test agent machine types that will be used in the test rig. Hope you enjoyed this post, If you have any recommendations on things that I should consider or any questions or feedback, feel free to add to this blog post. Remember to subscribe to http://feeds.feedburner.com/TarunArora. See you in Part II. Share this post : CodeProject

Read the article
Data management in unexpected places

- by Ashok_Ora

Normal 0 false false false EN-US X-NONE X-NONE Data management in unexpected places When you think of network switches, routers, firewall appliances, etc., it may not be obvious that at the heart of these kinds of solutions is an engine that can manage huge amounts of data at very high throughput with low latencies and high availability. Consider a network router that is processing tens (or hundreds) of thousands of network packets per second. So what really happens inside a router? Packets are streaming in at the rate of tens of thousands per second. Each packet has multiple attributes, for example, a destination, associated SLAs etc. For each packet, the router has to determine the address of the next “hop” to the destination; it has to determine how to prioritize this packet. If it’s a high priority packet, then it has to be sent on its way before lower priority packets. As a consequence of prioritizing high priority packets, lower priority data packets may need to be temporarily stored (held back), but addressed fairly. If there are security or privacy requirements associated with the data packet, those have to be enforced. You probably need to keep track of statistics related to the packets processed (someone’s sure to ask). You have to do all this (and more) while preserving high availability i.e. if one of the processors in the router goes down, you have to have a way to continue processing without interruption (the customer won’t be happy with a “choppy” VoIP conversation, right?). And all this has to be achieved without ANY intervention from a human operator – the router is most likely to be in a remote location – it must JUST CONTINUE TO WORK CORRECTLY, even when bad things happen. How is this implemented? As soon as a packet arrives, it is interpreted by the receiving software. The software decodes the packet headers in order to determine the destination, kind of packet (e.g. voice vs. data), SLAs associated with the “owner” of the packet etc. It looks up the internal database of “rules” of how to process this packet and handles the packet accordingly. The software might choose to hold on to the packet safely for some period of time, if it’s a low priority packet. Ah – this sounds very much like a database problem. For each packet, you have to minimally · Look up the most efficient next “hop” towards the destination. The “most efficient” next hop can change, depending on latency, availability etc. · Look up the SLA and determine the priority of this packet (e.g. voice calls get priority over data ftp) · Look up security information associated with this data packet. It may be necessary to retrieve the context for this network packet since a network packet is a small “slice” of a session. The context for the “header” packet needs to be stored in the router, in order to make this work. · If the priority of the packet is low, then “store” the packet temporarily in the router until it is time to forward the packet to the next hop. · Update various statistics about the packet. In most cases, you have to do all this in the context of a single transaction. For example, you want to look up the forwarding address and perform the “send” in a single transaction so that the forwarding address doesn’t change while you’re sending the packet. So, how do you do all this? Berkeley DB is a proven, reliable, high performance, highly available embeddable database, designed for exactly these kinds of usage scenarios. Berkeley DB is a robust, reliable, proven solution that is currently being used in these scenarios. First and foremost, Berkeley DB (or BDB for short) is very very fast. It can process tens or hundreds of thousands of transactions per second. It can be used as a pure in-memory database, or as a disk-persistent database. BDB provides high availability – if one board in the router fails, the system can automatically failover to another board – no manual intervention required. BDB is self-administering – there’s no need for manual intervention in order to maintain a BDB application. No need to send a technician to a remote site in the middle of nowhere on a freezing winter day to perform maintenance operations. BDB is used in over 200 million deployments worldwide for the past two decades for mission-critical applications such as the one described here. You have a choice of spending valuable resources to implement similar functionality, or, you could simply embed BDB in your application and off you go! I know what I’d do – choose BDB, so I can focus on my business problem. What will you do? /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article
What libraries provide cross-platform 3D and P2P support?

- by uckelman

I'm trying to find a constellation of libraries which, taken together, meet the following requirements: Smooth scaling, rotation, panning (in two dimensions). I'll have a large bitmap (or SVG, in some cases), maybe up to 10000x10000 pixels, which serves as map, with some middling number of small bitmaps (or, again, possibly SVG) that can be dragged around over it. I need to be able to zoom, rotate, and pan this scene; however, the view will always be normal to (i.e., looking head-on at) the large bitmap, so I'm not really using the depth dimension. Peer-to-peer. I'd like for multiple users to be able to connect in order to share one of the scenes mentioned above, preferably peer-to-peer, without much configuration by the user. I'm intending to have a server running for cases where users are unable to connect P2P; I'd like to have the failover happen automatically, or possibly have some way of promoting clients who are capable to be servers themselves. Synchronization. Once a user has started dragging one of the small bitmaps (a piece), no other user should be able to drag that piece until the drag stops. I haven't thought of exactly how to do this---there might be a simple solution, or this kind of synchronization might be something that a library provides. Cross(ish)-platform. I need to be able to run on Linux, Windows, and Mac OS. It would be nice to also be able to run on tablets. Having mostly the same code for all platforms is a plus, but not absolutely necessary. (L)GPL compatible. I'm planning to release under the LGPL or GPL, preferably the latter, so I need libraries which have compatible licenses. I'm not set on any particular language, I'd like to use the library or libraries which make the work easiest, though my preference is to work in at most two languages for the project. (The Model could potentially be in one language and the View in another, so they could talk to each other via some protocol I define, if that would get me a better selection of libraries to use.) Can anyone offer suggestions for what to use?

Read the article
Initial Cisco ASA 5510 Config

- by Brendan ODonnell

Fair warning, I'm a but of a noob so please bear with me. I'm trying to set up a new ASA 5510. I have a pretty simple set up with one /24 on the inside NATed to a DHCP address on the outside. Everything on the inside works and I can ping the outside interface from external devices. No matter what I do I can't get anything internal to route across the border to the outside and back. To try and eliminate ACL issues as a possibility I added permit any any rules to the incoming access lists on the inside and outside interfaces. I'd appreciate any help I can get. Here's the sh run. : Saved : ASA Version 8.4(3) ! hostname gateway domain-name xxx.local enable password xxx encrypted passwd xxx encrypted names ! interface Ethernet0/0 nameif outside security-level 0 ip address dhcp setroute ! interface Ethernet0/1 nameif inside security-level 100 ip address 10.x.x.x 255.255.255.0 ! interface Ethernet0/2 shutdown no nameif no security-level no ip address ! interface Ethernet0/3 shutdown no nameif no security-level no ip address ! interface Management0/0 nameif management security-level 100 ip address 192.168.1.1 255.255.255.0 management-only ! ftp mode passive dns domain-lookup inside dns server-group DefaultDNS name-server 10.x.x.x domain-name xxx.local same-security-traffic permit inter-interface same-security-traffic permit intra-interface object network inside-network subnet 10.x.x.x 255.255.255.0 object-group protocol TCPUDP protocol-object udp protocol-object tcp access-list outside_access_in extended permit ip any any access-list inside_access_in extended permit ip any any pager lines 24 logging enable logging buffered informational logging asdm informational mtu management 1500 mtu inside 1500 mtu outside 1500 no failover icmp unreachable rate-limit 1 burst-size 1 icmp permit any inside icmp permit any outside no asdm history enable arp timeout 14400 ! object network inside-network nat (any,outside) dynamic interface access-group inside_access_in in interface inside access-group outside_access_in in interface outside timeout xlate 3:00:00 timeout pat-xlate 0:00:30 timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00 icmp 0:00:02 timeout sunrpc 0:10:00 h323 0:05:00 h225 1:00:00 mgcp 0:05:00 mgcp-pat 0:05:00 timeout sip 0:30:00 sip_media 0:02:00 sip-invite 0:03:00 sip-disconnect 0:02:00 timeout sip-provisional-media 0:02:00 uauth 0:05:00 absolute timeout tcp-proxy-reassembly 0:01:00 timeout floating-conn 0:00:00 dynamic-access-policy-record DfltAccessPolicy user-identity default-domain LOCAL aaa authentication ssh console LOCAL aaa authentication http console LOCAL http server enable http 192.168.1.0 255.255.255.0 management http 10.x.x.x 255.255.255.0 inside http authentication-certificate management http authentication-certificate inside no snmp-server location no snmp-server contact snmp-server enable traps snmp authentication linkup linkdown coldstart warmstart telnet timeout 5 ssh 192.168.1.0 255.255.255.0 management ssh 10.x.x.x 255.255.255.0 inside ssh timeout 5 ssh version 2 console timeout 0 dhcp-client client-id interface outside dhcpd address 192.168.1.2-192.168.1.254 management dhcpd enable management ! threat-detection basic-threat threat-detection statistics access-list no threat-detection statistics tcp-intercept webvpn username xxx password xxx encrypted ! class-map inspection_default match default-inspection-traffic ! ! policy-map type inspect dns preset_dns_map parameters message-length maximum client auto message-length maximum 512 policy-map global_policy class inspection_default inspect dns preset_dns_map inspect ftp inspect h323 h225 inspect h323 ras inspect rsh inspect rtsp inspect esmtp inspect sqlnet inspect skinny inspect sunrpc inspect xdmcp inspect sip inspect netbios inspect tftp inspect ip-options inspect icmp ! service-policy global_policy global prompt hostname context no call-home reporting anonymous Cryptochecksum:fe19874e18fe7107948eb0ada6240bc2 : end no asdm history enable

Read the article
simple and reliable centralized logging inside Amazon VPC

- by Nakedible

I need to set up centralized logging for a set of servers (10-20) in an Amazon VPC. The logging should be as to not lose any log messages in case any single server goes offline - or in the case that an entire availability zone goes offline. It should also tolerate packet loss and other normal network conditions without losing or duplicating messages. It should store the messages durably, at the minimum on two different EBS volumes in two availability zones, but S3 is a good place as well. It should also be realtime so that the messages arrive within seconds of their generation to two different availability zones. I also need to sync logfiles not generated via syslog, so a syslog-only centralized logging solution would not fulfill all the needs, although I guess that limitation could be worked around. I have already reviewed a few solutions, and I will list them here: Flume to Flume to S3: I could set up two logservers as Flume hosts which would store log messages either locally or in S3, and configure all the servers with Flume to send all messages to both servers, using the end-to-end reliability options. That way the loss of a single server shouldn't cause lost messages and all messages would arrive in two availability zones in realtime. However, there would need to be some way to join the logs of the two servers, deduplicating all the messages delivered to both. This could be done by adding a unique id on the sending side to each message and then write some manual deduplication runs on the logfiles. I haven't found an easy solution to the duplication problem. Logstash to Logstash to ElasticSearch: I could install Logstash on the servers and have them deliver to a central server via AMQP, with the durability options turned on. However, for this to work I would need to use some of the clustering capable AMQP implementations, or fan out the deliver just as in the Flume case. AMQP seems to be a yet another moving part with several implementations and no real guidance on what works best this sort of setup. And I'm not entirely convinced that I could get actual end-to-end durability from logstash to elasticsearch, assuming crashing servers in between. The fan-out solutions run in to the deduplication problem again. The best solution that would seem to handle all the cases, would be Beetle, which seems to provide high availability and deduplication via a redis store. However, I haven't seen any guidance on how to set this up with Logstash and Redis is one more moving part again for something that shouldn't be terribly difficult. Logstash to ElasticSearch: I could run Logstash on all the servers, have all the filtering and processing rules in the servers themselves and just have them log directly to a removet ElasticSearch server. I think this should bring me reliable logging and I can use the ElasticSearch clustering features to share the database transparently. However, I am not sure if the setup actually survives Logstash restarts and intermittent network problems without duplicating messages in a failover case or similar. But this approach sounds pretty promising. rsync: I could just rsync all the relevant log files to two different servers. The reliability aspect should be perfect here, as the files should be identical to the source files after a sync is done. However, doing an rsync several times per second doesn't sound fun. Also, I need the logs to be untamperable after they have been sent, so the rsyncs would need to be in append-only mode. And log rotations mess things up unless I'm careful. rsyslog with RELP: I could set up rsyslog to send messages to two remote hosts via RELP and have a local queue to store the messages. There is the deduplication problem again, and RELP itself might also duplicate some messages. However, this would only handle the things that log via syslog. None of these solutions seem terribly good, and they have many unknowns still, so I am asking for more information here from people who have set up centralized reliable logging as to what are the best tools to achieve that goal.

Read the article
Multiple data centers and HTTP traffic: DNS Round Robin is the ONLY way to assure instant fail-over?

- by vmiazzo

Hi, Multiple A records pointing to the same domain seem to be used almost exclusively to implement DNS Round Robin as a cheap load balancing technique. The usual warning against DNS RR is that it is not good for high availability. When 1 IP goes down clients will continue to use it for minutes. A load balancer is often suggested as a better choice. Both claims are not completely true: When the traffic is HTTP then, most of the HTML browsers are able to automatically try the next A record if the previous is down, without a new DNS look-up. Read here chapter 3.1 and here. When multiple data centers are involved then, DNS RR is the only option to distribute traffic across them. So, is it true that, with multiple data centers and HTTP traffic, the use of DNS RR is the ONLY way to assure instant fail-over when one data center goes down? Thanks, Valentino Edit: Off course each data center has a local Load Balancer with hot spare. It's OK to sacrifice session affinity for an instant fail-over. AFAIK the only way for a DNS to suggest a data center instead of another is to reply with just the IP (or IPs) associated to that data center. If the data center becomes unreachable then all those IP are also unreachables. This means that, even if smart HTML browsers are able to instantly try another A record , all the attempts will fail until the local cache entry expires and a new DNS lookup is done, fetching the new working IPs (I assume DNS automatically suggests to a new data center when one fail). So, "smart DNS" cannot assure instant fail-over. Conversely a DNS round-robin permits it. When one data center fail, the smart HTML browsers (most of them) instantly try the other cached A records jumping to another (working) data center. So, DNS round-robin doesn't assure session affinity or the lowest RTT but seems to be the only way to assure instant fail-over when the clients are "smart" HTML browsers. Edit 2: Some people suggest TCP Anycast as a definitive solution. In this paper (chapter 6) is explained that Anycast fail-over is related to BGP convergence. For this reason Anycast can employ from 15 minutes to 20 seconds to complete. 20 seconds are possible on networks where the topology was optimized for this. Probably just CDN operators can grant such fast fail-overs. Edit 3:* I did some DNS look-ups and traceroutes (maybe some expert can double check) and: The only CDN using TCP Anycast seems to be CacheFly, other operators like CDN networks and BitGravity use CacheFly. Seems that their edges cannot be used as reverse proxies. Therefore, they cannot be used to grant instant failover. Akamai and LimeLight seems to use geo-aware DNS. But! They return multiple A records. From traceroutes seems that the returned IPs are on the same data center. So, I'm puzzled on how they can offer a 100% SLA when one data center goes down.

Read the article

Search Results

Search found 504 results on 21 pages for 'failover'.

Page 19/21 | < Previous Page | 15 16 17 18 19 20 21 | Next Page >

- by Most Valuable Yak (Rob Volk)

- by Ruma Sanyal

- by Tara Kizer

- by Tara Kizer

- by Steve Walker

- by Herve Roggero

- by floyd

- by Sirex

- by Jeff Atwood

- by James

- by longneck

- by user979276

- by mfinni

- by quanta

- by Campo

- by Adrian Hope-Bailie

- by Monica Kumar

- by pinaldave

- by Tara Kizer

- by Tarun Arora

- by Ashok_Ora

- by uckelman

- by Brendan ODonnell

- by Nakedible

- by vmiazzo

< Previous Page | 15 16 17 18 19 20 21 | Next Page >