Image via CrunchBase
Here?s a nice simple post about how to recover from a failed windows cluster I?ve been running windows clusters at work on windows 2000/3 for both Exchange and SQL. They are nice, but a lot of the time they seem to add un-needed complexity.
It is possible to experience complete cluster failure, meaning both cluster nodes as well as the cluster disk fail. This article describes the steps to perform when repairing a failed cluster.
Repairing a failed cluster is a four-step process:
- Restore the first node.
- Restore the cluster disk.
- Restore the second node.
- Test the repaired cluster.
The first step in repairing a failed cluster is to perform a restore of the first node. This can be done by following the process described below:
- Install a new copy of Windows 2000 Advanced Server (or whatever version of Windows was previously installed).
- Restore the system and boot partition, the system state data, and any other information.
- Restart the cluster node.
Once the first node has been restored, you can proceed to restore the cluster disk. Restoring the cluster disk entails restoring the disk signature file. This file contains information necessary to identify and mount volumes. Obviously, if a disk is replaced, the disk with the original signature file will have to be restored. Here are the steps to restore the cluster disks.
- Use the Dumpcfg.exe utility. This resource kit utility can be used to extract the signature file from the Registry and restore it to a new disk.
- Stop the Cluster Service.
- Restore the cluster system state from backup. The contents of the Quorum disk are placed into a temporary directory.
- Use the clusres.exe utility to restore the contents of the temporary directory to the node?s registry.
- Restart the Cluster Service.
Once the cluster disk has been restored and the Cluster Service has successfully started on the first node, the second node can be restored. The steps required to restore the second node are identical to the steps used to restore the first node – the only difference being that the cluster disk does not have to be restored a second time.
The final step in the repair process is to test the restore to ensure that you were successful. To make sure that the restore was a success:
- Verify that the cluster resources are online.
- Verify that the different cluster groups and resources can successfully failover.
Verify that groups and resources can fail back to their preferred owner.
Repairing A Failed Cluster
Mon, 27 Oct 2008 17:00:00 GMT