What is TSQL2sday? Back in late 2009, Adam Machanic (blog | twitter) had this brilliant idea for a monthly SQL Server blogger event (the origin of TSQL2sday). This month’s event is hosted by Allen Kinsel (blog | twitter) and the selected topic is “Disasters & Recovery”.
Like Allen, I also live in the Greater Houston area – albeit far enough away from the coast that storm surge is not an issue in a hurricane like it is for Allen, but the rain, wind, and potential tornados are. Companies in Houston don’t just have Disaster Recovery (DR) plans, they have specific HURRICANE DR plans. I grew up hearing the stories of Hurricane Carla (1961) and experienced first-hand the aftermaths of Hurricane Alicia (1983) and Hurricane Rita (2005). But, Hurricane Ike (2008) was the first hurricane I stayed home for – and here is what I learned!
DR is all about preparedness. You have to think about what can happen in a disaster and then what you will need to survive and recover in the short-term and in the long-term. Short-term recovery is more about protecting assets from further damage. My neighborhood experienced a direct hit from Ike – we were in the “eye” for over 1.5 hours before the “backside” hit us. During that time, everyone did an initial assessment of the damage incurred during the “front-side” attack. Ike hit in the wee hours of the morning – so strong flashlights were a must have item. We were lucky – no damage could be seen to our roof or windows (we had boarded most, but started too late to get them all). However, several neighbors had trees fall through their roofs during this time. Those of us without damage helped out those with damage to quickly cover the holes with tarps before the backside hit. The problem was – we didn’t know we had 1.5 hours – for all we knew the backside would be on us in just a few minutes.
By the time that the worst of the storm had passed, it was just beginning to be daylight and we could start assessing the latest round of damage. We’d heard trees crashing to the ground all night long – including a house-shaking thud about 8am when a neighbor’s tree fell towards our house and just grazed our back porch. As the rain and wind subsided enough that we felt safe to venture outside, we were able to start assessing the damage to our house and neighborhood. However, we were not able to assess beyond our immediate neighbors’ houses due to multiple large trees which had fallen across the road in both directions.
My parents live about a half mile down the street, but decided to come “hunker down” at our house for the storm as we have a very large interior closet which could hold the four of us comfortably in case of tornadoes. We now needed to get to their house and check for damage, but due to the trees blocking the road – this was impossible via car. That was when we realized that our chain saw had been left at my parents’ house! My husband and father hiked the half mile over the trees and downed power lines. My parents’ house thankfully had no damage; so my father got on his tractor and my husband loaded the chain saws in the Kawasaki Mule and they began working their way back up the street, clearing the trees and debris to make the street passable.
Then it was time to hook up the generator. The houses in my neighborhood each have their own well and septic; so if we wanted water and basic facilities – we needed electricity. We had stock piled enough gasoline for 5-7 days to run the generator just enough each day to keep the refrigerator\freezer cold enough for our food and generate the water we needed. We cooked meals using the gas grill on the back porch – which we normally used 3-4 times a week. And, we had extra propane bottles for the grill.
Amazingly, for the first 2-3 days we actually still had the use of our landline phone. This was good, because we had no cell phone service those days! Then, about the time we started getting cell service again, the landline went out – I think someone cleaning up fallen trees wiped it out. Anyway, it was a good thing that we had both landline and cell services – I know a lot of people are giving up their landlines, but this experience will make me hold on to ours a little longer.
And of course, our cable modem for Internet connectivity was out of commission. But, that wasn’t a necessity in the immediate aftermath – especially as we did not have full electrical service.
All in all – we were very well prepared for our short-term recovery. We survived – we had shelter, food, and water – even ice!
At the end of the fourth day of cleanup with the sound of chain saws and generators constantly buzzing in my ears, we started the generator but could not get the water well pumping. It was already dark, and we decided that perhaps we’d not had proper power from the generator and had blown out our pump. I could live without A/C, but not water. We had finished as much cleanup as we could do, so we unloaded the contents of our freezer into an ice chest for our neighbor; and then we headed out of town to join my sister’s family at a hotel in Waco. My parents had already left to visit friends in the Texas Hill Country until power could be restored to the neighborhood.
From Waco, I was able to actually perform my job duties – all I needed was an Internet connection and my laptop! Our office building in Houston was officially closed except for essential personnel, and travel anywhere within the Houston area was still very risky due to all the down power lines and debris. Houston area government officials were still asking people to restrict their area travel due to these conditions. So, I surprised my international colleagues when I was able to participate in our regular weekly teleconference and catch up on email.
After a couple of days in Waco, we came back home to pack up and leave again – for Denver. My husband was already scheduled to attend a conference there and since power still wasn’t going to be restored to our neighborhood anytime soon, I decided to go with him (thanks to some frequent flyer miles!). Like Waco, Denver also has Internet connectivity and I had my laptop! J We did discover in the interim that our generator and water pump were okay – the circuit breaker on the generator had tripped and wasn’t providing the proper voltage to run the water well; we didn’t notice that in the dark. As our week in Colorado was drawing to a close, our neighbor called with the news that power had just been restored to the neighborhood – about 16 hours before we planned to be home. It had been 15 days since Ike invaded our lives.
So – what were the lessons learned?
Short-term recovery needs:
- Have all of your necessary equipment with you (e.g. the chain saw).
- Have redundant communication options (e.g. landline & cell – also you can use Onstar minutes, if you have it).
- Understand fully how to operate and troubleshoot your equipment (e.g. the generator’s circuit breaker switches).
- Physical resources (i.e. manpower and tools) – we got to know all of our neighbors much better as we all pitched in on the cleanup in our neighborhood. Those with less damage helped out those with more damage. Those with tractors and chain saws loaned them to neighbors without.
Long-term recovery needs:
- Once basic necessities are met, the ability to find facilities which allow you to “return to work” (i.e. somewhere with Internet connectivity) to start regaining a sense of normalcy.
- Consider a whole-house generator! (We considered and decided we can take several more trips to Colorado for the cost when assessed against the history of storms impacting the area. Of course, we recognize that similar to the warnings when investing in the stock market, past history is not a guaranteed indication of the future!)
These same lessons can be applied to your DR plan for your data center and SQL Servers. Do you have the proper redundancies available? Do you know the order in which all servers in the data center should be shut down and restarted, if needed? If you have limited power after the disaster, what are the critical servers required to be running? (e.g. in my household case it was the water first, then the refrigerator, then optional items). SQL Servers might be using Database Mirroring or Log Shipping to secondary data centers. Do you have scripts to stop or move processing between the primary and secondary sites, if an entire data center is likely to be down? Does your Operations staff understand what steps those scripts actually perform in case they need to troubleshoot, or perform the steps manually? That is, do they know how to reset the “circuit breaker”? Will your staff be able to work “remotely” if they can access the Internet? Do you know how long of an outage your data center can sustain on generator or other backup power? Do you have a plan if it unexpectedly goes out or exceeds its limit before main power is restored?
While there is sufficient warning to take precautions when hurricanes approach, other disasters (e.g. the recent massive tornados across the U.S. and earthquake in Japan) strike without warning. The time to plan for all disasters is now. Be sure that you have a family DR plan as well as one for your workplace!
Here’s hoping none of us has to implement either our personal or business hurricane plans this summer!
Filed under: T-SQL Tuesday | Tagged: Disaster recovery, SQL Server, TSQL2sday | Leave a comment »