With luck, you’ll never need any of the information below. If you do then (a) you have my sympathy, and (b) you’re welcome!
One of my clients has an aging Microsoft IIS installation comprising a variety of Windows 2000 & 2003 servers (SQL, IIS, Domain Controller, File Server, Linux VMWare) running on a mixture of Dell server hardware.
The main IIS installation runs on Windows 2000 Server on a Dell Poweredge 1950 using a Dell SAS 5/iR disk controller, to which are connected two SATA drives running independently (not RAID-ed). In July, the controller card failed catastrophically, rendering the server useless.
Fortunately, though this is old hardware, we were able to source a replacement controller card on eBay. Less fortunately, when I installed the new controller and attempted to boot from it, the server crashed a few seconds after displaying the Win2K splash screen with the dreaded blue screen of death:
Not great. After some investigation, I realised that though the replacement looked identical in all respects to the failed card (even the discrete components were positioned identically on the PCB), it had a slightly newer BIOS. This, it appeared, was sufficient for Windows to treat the controller card as a new, unrecognised device.
If you have to replace a disk controller on a Windows server, the usual advice is to install the new controller first, allow Windows to detect it, install any needed drivers, then — and only then — shut down, remove the old controller, and connect the hard drives to the new controller. Windows then has the needed drivers installed to allow it to boot Windows successfully.
Of course, in this scenario we didn’t have that luxury – the old controller was dead, so Windows wouldn’t boot at all. We needed to somehow install the updated drivers on the Windows system disk offline.
This turned out to be … tricky! Here are a few of the things I tried before figuring it out. (Needless to say, I copied the disk onto a fresh drive and performed my experiments on the copy. This ensured the original was always available if I needed to start over.)
1. Update controller drivers – FAIL
I downloaded updated drivers for SAS 5/iR controller from Dell’s website (here), extract drivers, manually copy driver files to Windows c:\winnt\system32\drivers folder, overwriting the older versions with the same name. This made no difference at all.
Then I discovered the C:\WinNT\NLDRVS sub-folder which holds the core third-party drivers used by Windows before the whole plug & play subsystem is up and running. This contains a series of numbered sub-folders, one for each driver. Again, I updated this to use the latest versions of the driver files, and again it made no difference.
2. Repair Windows Installation – FAIL
Next, I attempted a Windows 2000 repair, which was a lot harder than I expected.
After tracking down the original Windows 2000 installation CD, I was unable to press F6 to install additional drivers from floppy (remember that?) because the PowerEdge 1950 has no floppy drive. Windows won’t recognise a USB flash drive at this point either.
There are hardware floppy emulators around that will accept a USB memory stick and present the contents as a floppy drive using the old 34-pin floppy cable standard. Unfortunately, the PowerEdge doesn’t have an internal header to connect such a device to. You can also buy floppy drives with a USB interface but I didn’t have one of those to hand (or any floppy disks to use with such a device).
Eventually, I discovered nLite, an excellent utility that lets you build a custom Windows installation CD which includes your selection of third-party drivers, service packs, and other customisations. I also found WinSetupFromUSB which lets you install a Windows installation CD on a USB stick in such a way that even the Windows 2000 Installer can successfully boot from it. (Some deep magic is used to make this work).
Between these, I was able to create a slip-streamed Windows 2000 SP4 installation CD with the latest Dell SAS 5/iR drivers pre-installed. Booting with this, I could get to the Repair Windows menu, find my Windows installation, and let the automatic repair try and fix it.
This was also unsuccessful – the automatic repair didn’t notice that the drivers it had booted with were different to the ones pre-installed on the original Windows disk, so it didn’t update them.
3. Perform an in-place Windows upgrade – Partial Success
By now, having spent a lot of effort trying various things, I figured there was only one thing for it – perform an in-place upgrade of Windows 2000 using the process outlined by this TechRepublic article. This is essentially a new Windows installation on top of the existing install. Windows is smart enough to replace the system files with fresh versions while preserving all existing third-party software and user profiles.
In principle, this allows you to resolve any hardware-related Windows glitches without having to re-install all your application software. This sounded good, because the mission-critical software running on this particular server is complex and the original designers and implementers were long since gone, leaving no documentation behind them. Recreating it from scratch on a clean Windows installation would have been unthinkable.
The re-install process went smoothly, albeit slowly, and once completed, Windows booted successfully. Hurray! Job done, right?
Well, not quite. The original installation had somehow ended up with the WINNT folder on E:\ while the tiny 2 GB FAT16 boot partition was on C:\. After the re-install, WINNT was located on C:\, along with Program Files and other system folders. This, of course, broke lots of things.
I was able to fix most of them by adding a scheduled task to SUBST drive E:\ to drive C:\ at startup, which made most of the system much happier. A few services started before this remapping occurred, and I located those in the Registry and updated their path references by hand. Yes, this is all ugly and horrible, but by this point, I just needed to get things working by any means!
(Word to the wise: be careful with removable USB backup drives, which usually grab the first available drive letter. If that happens to be E:, it stops the drive letter mapping working correctly and you’re back to square one.)
Microsoft Office was still a little unhappy, but became much happier after I carried out a Repair Install. I also had to re-assign appropriate drive letters to some of the data partitions.
Finally, after all of this …. IIS started correctly, websites were accessible, and all was right with the world! Hurray, again!
4. When is a success not a success?
Not so fast. One of the critical components of the website was the ability to upload formatted Word documents which were then automatically converted to XML for processing by the content management system. This wasn’t working correctly; in fact, it wasn’t working at all.
The issue seemed to be related to a custom COM object that had been developed for the project, and a method in this object was failing during the conversion of the Word document. Everything I could see indicated it was somehow connected to the Microsoft Office installation (since presumably Word itself was involved in the conversion).
I spent more than a week trying to get to the bottom of this. I re-registered all COM objects and relevant DLLs, checked the system and application logs for errors, enabled IIS debugging, etc – all the usual things you would expect. When I dug deeper, using Microsoft’s ProcMon tool, the issue seemed to be related to an instance of Internet Explorer that was launched during the conversion.
After many hours pouring over ProcMon, IIS and Event logs, checking for unexpected failures buried in the midst of the many, many expected failures, I had to admit defeat. The server was working, but it wasn’t working reliably. It also had a tendency to hang random services during startup, and Windows Update refused to start, neither of which inspired confidence.
5. The Easy Way
By this stage, and with the client’s patience starting to reach its limits, I decided to use the knowledge gained working through the above to have another go, starting from scratch with the original disk again.
A chance remark on a discussion forum about SCSI adapter BIOS signatures being used by Windows to help identify the correct drive led me to a rarely visited part of the Windows 2000 registry known as the CriticalDeviceDatabase.(This no longer exists on modern versions of Windows).
Further research brought me to Michael Albert’s invaluable page on manually adding a mass storage device to an existing Windows installation. As one commenter rightly said, “Never delete this page!” The information it contains is invaluable, and not easily found elsewhere. So, thank you Michael!
The registry key HKEY_LOCAL_MACHINE / System / Control / CurrentControlSet / CriticalDeviceDatabase contains a series of sub-keys for all the devices needed to boot Windows. Third-party controller cards are referenced here by their PCI vendor, device and (crucially) subsystem code.
First, I needed to get the PCI code for the SAS 5/iR controller. On most Windows installations, you can visit Device Manager, open the Properties pages for the controller, and under the Details pane select Hardware IDs. However, on Windows 2000 this information isn’t so easily available. Instead, you need to run MSINFO32 and find the controller there, usually under SCSI devices.
When I ran MSINFO32 on my flakey Windows re-installation, the SAS 5/iR entry looked like this:
Checking Regedit on the same machine, I could see the following matching entry in the registry:
[HKEY_LOCAL_MACHINE\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1000&dev_0054&subsys_1f091028]
"Service"="SYMMPI"
"ClassGUID"="{4D36E97B-E325-11CE-BFC1-08002BE10318}"
However, there was a second, almost identical, entry:
[HKEY_LOCAL_MACHINE\CurrentControlSet\Control\CriticalDeviceDatabase\pci#ven_1000&dev_0054&subsys_1f061028]
"Service"="SYMMPI"
"ClassGUID"="{4D36E97B-E325-11CE-BFC1-08002BE10318}"
The only difference is the subsystem code, which has changed from 0x1f061028 to 0x1f091028. I concluded that the additional entry was the one used by the old controller card, and that it had survived the in-place Windows upgrade. For reasons best known to themselves, Dell must have revised the sub-function code when they updated the controller’s BIOS, possibly to provide an easy way for the driver to identify hardware with additional capability or some obscure hardware fix.
I went back to the original disk and copied the registry System hive from \WINNT\SYSTEM32\CONFIG\SYSTEM to my work computer, then loaded it into RegEdit by selecting HKEY_LOCAL_MACHINE, then using Load Hive and entering a temporary sub-key name (W2K-Recovery) to allow me to access it.
Drilling down there, I could see registry keys for ControlSet001 and CurrentControlSet002 but no CurrentControlSet. This is normal when editing an offline registry hive — CurrentControlSet is created dynamically by the operating system but is not part of the hive itself. Instead, I checked under the Select key which confirmed that the ‘Current‘ selection was set to 1 (indicating CurrentControlSet001). And sure enough, under CurrentControlSet001 / Control / CriticalDeviceDatabase, there was an entry for subsystem 0x1f061028 but not 0x1f091028.
I made a fresh clone of the original Windows drive, then manually added the CriticalDeviceDatabase entry for 0x1f091028 without changing anything else. (Again, I performed this by loading the System hive offline into RegEdit on my main work PC, making the modifications, then unloading it again and copying it back to the WINNT folder on the target disk.)
After this, the new drive booted straight into Windows with no issue. As it was the original Windows installation, everything was back to exactly the way it was before the disk controller died.
As with everything Windows related, an ounce of knowledge is worth a pound (or stone!) of experimenting! If you’ve made it this far, hopefully the information above will save you some wasted time and effort.