19-Nov-2003

Disaster recovery test

I've spent the last few days in New York doing our annual disaster recovery test.

This year, due to some intense pressure on the applications side (translation: they have higher priority things to do) we did not have any applications team involvement to assist with the test.

This made my part of the test quite simple. I simply restored my smallest cluster rather than all four of the critical clusters.

The big hitch this year was that, dispite careful preparation by Computer Operations, a canister failed to ship to the remote site. Of course this contained my documentation. So I had to do the recovery without any instructions.

Fortunately, I know the systems really well, and this was not much of a problem. However, if it had been one of the two huge production clusters, I think I would have been struggling.

Even without the documentation, I discovered some things on the cluster that can be improved. For example, the command procedure that mounts the disks is not on the system disk. This meant that I had to restore a second disk before I could figure out the logical to physical mapping of the disks. I attempted a good guess as to which physical disk I restored that to, but as I guessed wrong, I ended up restoring the disk twice.

Other than that, and the network being delayed (because of the same missing canister) the test was 100% successful.

Posted at November 19, 2003 3:46 PM
Tag Set:

Comments are closed