Thursday, January 04, 2007

Why DR fails [2]

What became clear is that correct maintenance of production systems can actually lead to an INCREASE in the number of DR gaps, since keeping the system working means more change, and more change means more gaps, and so on. Current Enterprise Management Consoles, SRM systems, etc. just don’t help, because the problem they were intended to solve actually conflicts with DR assurance.

The next step we took after understanding the issue at hand was to define requirements for a DR and data protection gap detection tool. We begun by analyzing what could get wrong, figuring that if we could have found “signatures” of gaps, we could build an engine that can constantly search for them. Think of your DR checklist – things you need to check before starting a DR drill – and imagine how many tests you have there – dozens? This sounds about right, and in fact, when we started working with customers it turned out that each could contribute its share. It also happened that the overlap was significantly smaller than we have anticipated, so that the accumulated list grew alarmingly fast to contain thousands of possible gaps. As the magnitude of the problem became clearer, so was the conviction that no team of men or women can check everything manually. Some of the tests could take days to complete for just one server (!) – even when furnished with top-of-the-line monitoring, automation and SRM tools. Imagine how much time it would take to search for hundreds of gaps in a deployment of hundreds or even thousands of DR-protected servers. The analogy that comes to mind is looking for viruses manually with a checklist containing printout of thousands of virus signatures.

The way to find a solution was challenging and rewarding, resulting in our first three patents. I hope to dedicate my next post to some more of our insights regarding DR gaps.

1 comment:

Unknown said...

About 80% businesses failed after a major data disaster
happened to them. This we have already seen recently
in the UK when major floods caused a major breakdown for
various IT and non-IT companies to lost their data and they
were out of business.

To avoid this, any business, either SMB or enterprise, must
have disaster recovery plan. Bare Metal Recovery is one
technology which is available in the market but not all
people know about this. Check this out at,

www.unitrends.co.uk

They are the originator for Bare Metal term. Using this
technology one can restore OS and Data very quickly.