Terms & Conditions | Privacy Policy
Fire. *
It can shine forth as a candle lighting the darkness. Or it can burn everything in it’s path, leaving only ashes and ruin.
How does your team handle fire drills? When things go bump in the night (or smack dab in the middle of a work day), is it an opportunity to shine? Or just a tempestuous conflagration that destroys attitudes, team unity, confidence and business value?
I had the opportunity to “fire-test” our tech team this week. At HealthTalker, we are working toward running our entire infrastucture “in the cloud” using Citrix’s XenServer platform. Cloud computing provides all sorts of useful infrastructure features that make our platform that much more robust.
One component of this infrastructure is having SAN systems set up to provide fast, network-based disk for the virtual machines. Having the disk on a separate, network device provides the backbone for much of the flexibility and redundancy of the cloud setup. Our SANs have some definite quirks to them that we are working through. Recently, the primary SAN required a system upgrade, which we performed. Unfortunately, the new kernel panicked which left the SAN non-bootable and because of the quirks of the enclosure, without any console, keyboard, or external drive access. In other words: bricked.
In an enterprise-grade setup, redundancy of components means that when (not if, but when!) this happens, there are failover options to keep things running with minimal or ideally no downtime. At HealthTalker, we are building out our infrastructure to be fully enterprise-grade across all aspects. However, like much in a start-up, we are building it out as we go, and at this time our backup SAN was not properly provisioned. Thus began The Fire Drill.
Except that it wasn’t a drill. It was the Real Thing™
Get the systems back! Get the websites back! Get the data back, if possible. Or if not possible get the systems up with older data (which we had but not as up-to-date as we would have liked). Can it be recovered? Can the SAN be restored? Can the SAN and the data be restored? Questions swirled. Frustrations mounted. Guestimates were all over the map. 36+ hours later, scorched and weary, we were able to fully recover the SAN and the data due to incredible dedication by our tech team, including Boris, our new IT guy (who only started the day before this firestorm hit).
Lessons learned are legion:
* Fire. For Andy (who loves “The Boss”).
This post has 0 comments. Make a comment.