Disaster recovery | Nancy Hidy Wilson's Blog

PASS Summit 2011 – What I Learned

Posted on October 28, 2011 by Nancy

In prior posts, I reflected on the things that I think make the PASS Summit such a great value – content, volunteers, Microsoft support, and networking opportunities. This post will concentrate on the content provided at this year’s Summit and basically what I learned over those 3 days.

In no particular order – here’s the brain dump (from my notes, of course, with occasional commentary thrown in!):

SQL Server Code Name “Denali” will officially be SQL Server 2012 and released in the first half of 2012.
Project Crescent is now Power View.
Microsoft is now promoting both on-premise databases and cloud-based databases as co-existing for the foreseeable future; but the cloud is getting bigger for SQL Azure (150 GB databases supported by year-end).
Big Data has arrived and Microsoft has taken notice – and action – by releasing the Microsoft SQL Server Connector for Apache Hadoop … with more coming in this arena.
I run the Houston Area SQL Server User Group so these next items are of importance to me personally:
- PASS may finally establish a speaker bureau to help local chapters find speakers
- PASS will be increasing services to chapter leaders, providing DNN (DotNetNuke) training, etc.
At (Principal Architect Escalation Engineer in Microsoft CSS) Bob Ward’s “Inside Tempdb” half-day session:
- The model database in Denali SQL Server 2012 will be 4MB up from the 2MB it has been forever. Why is this important? Well, recovery isn’t complete until tempdb is started and the default size for tempdb (if you don’t change it) is the size of model. 4MB isn’t much these days, but it is a change you should be aware of.
- tempdb is the “garbage dump of the engine”. It is used for user objects such as temp tables, table variables, temp procs, user-defined objects and online index space. It is also used by internal objects – sorts, work tables, work files (used for hash joins) and version store.
- Consider moving tempdb to its own storage volume if it has heavy I/O. SSD is an option, but as expensive as that is ensure you can’t make better use of it somewhere else first.
- Bob’s rule of thumb for how many data files to allocate for tempdb:
  - Less than 8 CPUs (cores), 1 file per CPU
  - 8 or more CPUs, start with 8 files and increase by sets of 4 until contention resides
  - Also see Paul Randal’s blog for a slight variation on this theme.
At the “What’s New in Manageability in Denali” Microsoft session:
- Since SSMS is now built on Visual Studio 2010, you will get multi-monitor support! Hooray!
- There is a new Database Recovery Advisor which will be able to build a restore plan for you based on available backup files (database, log, etc.) in a folder even if you don’t have any msdb backup history. It can also handle split backups. I’d cheer for this too, but seriously – this should have been in the product years ago!
- Log viewer will now work with offline instances and has improved performance.
- SCOM Management Pack (will be released at the same time as SQL Server 2012)
  - Ability to discover Availability Groups
  - Detailed knowledge of AlwaysOn tasks
  - Performance counter collections for AlwaysOn
  - Policy-Based Management integration, yes, integration – it’ll pull your PBM policies right into SCOM and alert on failures
  - Enhanced mirroring support – discovers and diagrams
  - 20 new rules for replication
  - Support for mount points (another “hooray/finally!”)
At the “AlwaysOn: Active Secondaries” Microsoft Session:
- You can offload backups to any replica so as to not impact production. The LSN is communicated back to the primary which then notifies all replicas so that all log backups in all replicas form a single log chain.
- If that sounds like a nightmare for a restore, remember the Database Recovery Advisor mentioned previously and restore is simple!
- Differential backups are not supported on secondaries.
- Only “Copy Only” full backup is supported on secondaries.
- Advisable to store backups centrally (so you can just point the Database Recovery Advisor to a single location to create the restore plan).
- Declarative policy to determine where backups occur automatically. This is advisory only, not enforced. Implement via a system function which returns a Boolean indicating if this is the preferred backup location.
- And, of course, probably the number one reason for multiple secondaries (besides general DBA paranoia) is to offload reporting workloads without having to use database snapshots.
In MVP Tim Ford’s “Periodic Table of DMOs” session:
- Using the format of the Periodic Table of the Elements from general science, Tim gives us a creative way to organize all the DMOs (Dynamic Management Objects – views and functions) now available to DBAs for troubleshooting and general understanding of what is happening within a database engine instance. I might actually use this to remember when and what to use.
SQL Server guru Paul Randal (blog) busted more DBA Myths in a spotlight session. As usual, not all myths were blatantly true or false, there were a few “it depends”! This session lasted until 6:30pm and I think most would have stayed until 9:30pm to listen to Paul’s in-depth, but humorous explanations about why these myths were true/false or especially “it depends”.
Allan Hirt, Clustering MVP, demonstrated “Denali on Windows Server Core”:
- Currently Denali is only supported on W2K8R2 SP1, no Windows 8 support announced yet (as of October 14, 2011).
- You’ll need to add .NET and Allan will post on his blog how to do this effectively
- You’ll need to add PowerShell 2.0. See KB976736 and use option 2!
- If installing a Windows Cluster, then all nodes must be Server Core or all must allow GUI.
Microsoft Senior Program Manager for the Database Engine Security features, Il-Sung Lee, presented “What’s New in Security for SQL Server Code Name Denali”:
- Windows groups can now have a default schema. The first group with the lowest principal id will be chosen if a user belongs to more than one group.
- User-defined schema roles
- All SKUs will now have the ability to specify server audits; database audits still require Enterprise Edition or higher.
- User-defined audit events (via sp_audit_write)
- Database Authentication (for contained databases)
- Lots of cryptography changes – deprecating older methods, supporting newer, more secure ones
- Il-Sung’s team blogs at: http://blogs.msdn.com/b/sqlsecurity/
SQLCAT brought in speakers from 4 companies who are early adopters of Denali & AlwaysOn to discuss their HA/DR requirements and migration paths to AlwaysOn. I’ll definitely be revisiting their solutions in the future and suspect there will be some whitepapers posted on SQLCAT’s website soon.
In “Where should I be encrypting my data”, MVP Denny Cherry provided a good refresher on some encryption basics:
- Encryption will almost always increase the size of stored data
- Encryption will decrease the usefulness of data deduplication
- Encryption will add CPU load
- Enabling TDE for an application database will also enable it for tempdb
- TDE is for “data at rest” including backups
- For “data on the wire” use IPSEC or SSL
- If your storage uses MPIO, then “data at rest” protection at the LUN level. If you copy a file to another server, it would be readable; if you detach and attach the LUN elsewhere it would not be readable.
I also attended one of the “Lightning Talk” sessions, where 8 speakers had 5 minutes each to convey their topic to the audience. Some were humorous, some were serious, and one was seriously humorous! Grant Fritchey’s “Backup Testing, The Rant” is not something any of us present will ever forget – nor will we ever forget the importance of actually testing your backup! I really wish that had been caught on video, because the audio on the DVD just won’t do the presentation justice. Two other speakers in the session covered topics near and dear to me – “Thinking of Hosting a SQLSaturday?” (John Sterett) and “Build Up” (Niko Neugebauer). Both encouraged attendees to get involved in their local SQL Server community activities – and if you don’t have a local group – start one!

Now to start making the list of all the interesting sessions I couldn’t get to that I’ll want to watch (or re-watch) as soon as the Conference DVDs arrive! I foresee team “lunch ‘n learns” being scheduled.

Filed under: PASS, SQL Server | Tagged: AlwaysOn, backup, Big Data, Denali, Disaster recovery, PASS Summit, Security, SQL Server, SQL Server 2012, tempdb | Comments Off on PASS Summit 2011 – What I Learned

My Denali First Glance – AlwaysOn Availability Groups

Posted on October 5, 2011 by Nancy

I’m finally finding some time to get more familiar with the new feature in Microsoft SQL Server Code-Named “Denali” called “AlwaysOn”. No, that is not a typo – capitalize “Always”, capitalize “On” and concatenate – “AlwaysOn”!

There are two flavors of AlwaysOn – failover clustering and availability groups. AlwaysOn Failover Clustering is exactly what it sounds like – enabling a SQL Server Denali Instance to failover to another node in a Windows Server Failover Cluster. This is also known as a Failover Cluster Instance (FCI). On the other hand, AlwaysOn Availability Groups can best be described as database mirroring on steroids – and that is what I’ll cover in this post. I’ll cover the differences in Denali’s FCI implementation from prior versions in a later post.

First, think of all the things you don’t like about database mirroring:

Only 1 mirror
The mirror is always “in recovery” and unusable outside of a snapshot
Failover is per user database (i.e. one at a time)
Have to choose between synchronous (high availability) and asynchronous (disaster recovery)
Requires a 3^rd instance as a witness if you want “high safety mode” and automatic failover

Here’s what AlwaysOn Availability Groups will provide:

Up to 4 replicas (the term “replica” essentially replaces “mirror”) in CTP3
Read-only access is provided to replicas
Multiple databases can be grouped together for failover defined as “Availability Groups”
Synchronous and Asynchronous replicas at the same time – covering both high availability and disaster recovery (hence HADR)
Windows Server Failover Clustering detects when failover is required – no third (or fourth or fifth) server required

Yes, you read that right; the Windows Failover Clustering feature is required to be installed on your Denali servers in order to use AlwaysOn Availability Groups. It does not mean, however, that you have to actually implement a Windows failover cluster of two or more nodes. But, it does mean that Windows Server 2008 Enterprise Edition is required….and that makes me believe that this feature will likely only be available in the Denali Enterprise Edition or higher once the final SKU feature sets are announced by Microsoft. However, SQL Server Enterprise Edition is required today to use asynchronous mode for database mirroring, so this shouldn’t come as a surprise.

Similar to database mirroring failover options, the availability failover modes are automatic or planned manual failover for synchronous replicas to prevent data loss or forced manual failover for asynchronous replicas which may incur data loss. If the secondary replica in synchronous mode is not synchronized with the primary replica, then a forced manual failover is also the only option.

For seamless client connectivity, when you establish an Availability Group you may also create a Virtual Network Name (VNN) which will be assigned a Virtual IP (VIP). Whenever the Availability Group fails over, the VNN will redirect connections to the new primary replica. Alternatively, if there are only two availability replicas in an Availability Group, then “old-style” database mirroring connection strings may be used until you can establish a VNN and update your application to use the VNN. If you want to add a third replica, then you must convert to VNN for your client connectivity.

In addition to having read-only access to secondary replicas, you can take backups of your secondary replicas. There are several options you will want to specify when setting up your Availability Groups (or alter afterwards) in order to properly control the backup of your primary and secondary replicas. Since this is still the CTP as I’m writing this, I’ll just point you to this BOL link for the most current info as it may change by the time you get around to implementing this.

The main purpose of this article is to make you aware of what is coming so that you can start planning for your future HADR requirements as you make the move to Denali when it is finally released. To dig deeper, check out the current BOL entry for AlwaysOn Availability Groups.

Filed under: SQL Server, SQLServerPedia Syndication | Tagged: AlwaysOn, Availability Groups, Denali, Disaster recovery, High availability, SQL Server, SQL Server 2012 | Comments Off on My Denali First Glance – AlwaysOn Availability Groups

TSQL2sday #19 – Disasters & Recovery (or Keep Your Chain Saw Handy)

Posted on June 14, 2011 by Nancy

What is TSQL2sday? Back in late 2009, Adam Machanic (blog | twitter) had this brilliant idea for a monthly SQL Server blogger event (the origin of TSQL2sday). This month’s event is hosted by Allen Kinsel (blog | twitter) and the selected topic is “Disasters & Recovery”.

Like Allen, I also live in the Greater Houston area – albeit far enough away from the coast that storm surge is not an issue in a hurricane like it is for Allen, but the rain, wind, and potential tornados are. Companies in Houston don’t just have Disaster Recovery (DR) plans, they have specific HURRICANE DR plans. I grew up hearing the stories of Hurricane Carla (1961) and experienced first-hand the aftermaths of Hurricane Alicia (1983) and Hurricane Rita (2005). But, Hurricane Ike (2008) was the first hurricane I stayed home for – and here is what I learned!

DR is all about preparedness. You have to think about what can happen in a disaster and then what you will need to survive and recover in the short-term and in the long-term. Short-term recovery is more about protecting assets from further damage. My neighborhood experienced a direct hit from Ike – we were in the “eye” for over 1.5 hours before the “backside” hit us. During that time, everyone did an initial assessment of the damage incurred during the “front-side” attack. Ike hit in the wee hours of the morning – so strong flashlights were a must have item. We were lucky – no damage could be seen to our roof or windows (we had boarded most, but started too late to get them all). However, several neighbors had trees fall through their roofs during this time. Those of us without damage helped out those with damage to quickly cover the holes with tarps before the backside hit. The problem was – we didn’t know we had 1.5 hours – for all we knew the backside would be on us in just a few minutes.

By the time that the worst of the storm had passed, it was just beginning to be daylight and we could start assessing the latest round of damage. We’d heard trees crashing to the ground all night long – including a house-shaking thud about 8am when a neighbor’s tree fell towards our house and just grazed our back porch. As the rain and wind subsided enough that we felt safe to venture outside, we were able to start assessing the damage to our house and neighborhood. However, we were not able to assess beyond our immediate neighbors’ houses due to multiple large trees which had fallen across the road in both directions.

My parents live about a half mile down the street, but decided to come “hunker down” at our house for the storm as we have a very large interior closet which could hold the four of us comfortably in case of tornadoes. We now needed to get to their house and check for damage, but due to the trees blocking the road – this was impossible via car. That was when we realized that our chain saw had been left at my parents’ house! My husband and father hiked the half mile over the trees and downed power lines. My parents’ house thankfully had no damage; so my father got on his tractor and my husband loaded the chain saws in the Kawasaki Mule and they began working their way back up the street, clearing the trees and debris to make the street passable.

Then it was time to hook up the generator. The houses in my neighborhood each have their own well and septic; so if we wanted water and basic facilities – we needed electricity. We had stock piled enough gasoline for 5-7 days to run the generator just enough each day to keep the refrigerator\freezer cold enough for our food and generate the water we needed. We cooked meals using the gas grill on the back porch – which we normally used 3-4 times a week. And, we had extra propane bottles for the grill.

Amazingly, for the first 2-3 days we actually still had the use of our landline phone. This was good, because we had no cell phone service those days! Then, about the time we started getting cell service again, the landline went out – I think someone cleaning up fallen trees wiped it out. Anyway, it was a good thing that we had both landline and cell services – I know a lot of people are giving up their landlines, but this experience will make me hold on to ours a little longer.

And of course, our cable modem for Internet connectivity was out of commission. But, that wasn’t a necessity in the immediate aftermath – especially as we did not have full electrical service.

All in all – we were very well prepared for our short-term recovery. We survived – we had shelter, food, and water – even ice!

At the end of the fourth day of cleanup with the sound of chain saws and generators constantly buzzing in my ears, we started the generator but could not get the water well pumping. It was already dark, and we decided that perhaps we’d not had proper power from the generator and had blown out our pump. I could live without A/C, but not water. We had finished as much cleanup as we could do, so we unloaded the contents of our freezer into an ice chest for our neighbor; and then we headed out of town to join my sister’s family at a hotel in Waco. My parents had already left to visit friends in the Texas Hill Country until power could be restored to the neighborhood.

From Waco, I was able to actually perform my job duties – all I needed was an Internet connection and my laptop! Our office building in Houston was officially closed except for essential personnel, and travel anywhere within the Houston area was still very risky due to all the down power lines and debris. Houston area government officials were still asking people to restrict their area travel due to these conditions. So, I surprised my international colleagues when I was able to participate in our regular weekly teleconference and catch up on email.

After a couple of days in Waco, we came back home to pack up and leave again – for Denver. My husband was already scheduled to attend a conference there and since power still wasn’t going to be restored to our neighborhood anytime soon, I decided to go with him (thanks to some frequent flyer miles!). Like Waco, Denver also has Internet connectivity and I had my laptop! J We did discover in the interim that our generator and water pump were okay – the circuit breaker on the generator had tripped and wasn’t providing the proper voltage to run the water well; we didn’t notice that in the dark. As our week in Colorado was drawing to a close, our neighbor called with the news that power had just been restored to the neighborhood – about 16 hours before we planned to be home. It had been 15 days since Ike invaded our lives.

So – what were the lessons learned?

Short-term recovery needs:

Have all of your necessary equipment with you (e.g. the chain saw).
Have redundant communication options (e.g. landline & cell – also you can use Onstar minutes, if you have it).
Understand fully how to operate and troubleshoot your equipment (e.g. the generator’s circuit breaker switches).
Physical resources (i.e. manpower and tools) – we got to know all of our neighbors much better as we all pitched in on the cleanup in our neighborhood. Those with less damage helped out those with more damage. Those with tractors and chain saws loaned them to neighbors without.

Long-term recovery needs:

Once basic necessities are met, the ability to find facilities which allow you to “return to work” (i.e. somewhere with Internet connectivity) to start regaining a sense of normalcy.
Consider a whole-house generator! (We considered and decided we can take several more trips to Colorado for the cost when assessed against the history of storms impacting the area. Of course, we recognize that similar to the warnings when investing in the stock market, past history is not a guaranteed indication of the future!)

These same lessons can be applied to your DR plan for your data center and SQL Servers. Do you have the proper redundancies available? Do you know the order in which all servers in the data center should be shut down and restarted, if needed? If you have limited power after the disaster, what are the critical servers required to be running? (e.g. in my household case it was the water first, then the refrigerator, then optional items). SQL Servers might be using Database Mirroring or Log Shipping to secondary data centers. Do you have scripts to stop or move processing between the primary and secondary sites, if an entire data center is likely to be down? Does your Operations staff understand what steps those scripts actually perform in case they need to troubleshoot, or perform the steps manually? That is, do they know how to reset the “circuit breaker”? Will your staff be able to work “remotely” if they can access the Internet? Do you know how long of an outage your data center can sustain on generator or other backup power? Do you have a plan if it unexpectedly goes out or exceeds its limit before main power is restored?

While there is sufficient warning to take precautions when hurricanes approach, other disasters (e.g. the recent massive tornados across the U.S. and earthquake in Japan) strike without warning. The time to plan for all disasters is now. Be sure that you have a family DR plan as well as one for your workplace!

Here’s hoping none of us has to implement either our personal or business hurricane plans this summer!

Filed under: T-SQL Tuesday | Tagged: Disaster recovery, SQL Server, TSQL2sday | Comments Off on TSQL2sday #19 – Disasters & Recovery (or Keep Your Chain Saw Handy)

Nancy Hidy Wilson's Blog

Categories

Recent Posts

Archives

Copyright Notice

PASS Summit 2011 – What I Learned

My Denali First Glance – AlwaysOn Availability Groups

TSQL2sday #19 – Disasters & Recovery (or Keep Your Chain Saw Handy)

Tags