A cascade of technical glitches and mental errors

Posted on Tue 19 July 2005 in misc

This must be a very familiar tale, yet I feel the urge to tell it again... about two weeks ago my hard drive started making that horrible 'click' noise that indicates that it is about time for it to shuffle off its all too mortal coil (and spindle, and platter, and bits).

Your author/hero, being the diligently paranoid sort, ran his weekly backup and continued working away. The occasional pause and drive seek error did not deter him from pushing onwards in his quest to deliver the goods for his employer; no, nothing was going to prevent him from making his dates -- and with regular backups, he had no fear of the inevitable crash.

Surprisingly, the crash never came. However, the moment the hard drive started making a sound exactly like heavy, horror-show static, he put in a call to his local technicians for support: specifically, a request for a new hard drive and a drive image copy to rectify the damage with minimal downtime. The trusty technician happened to have a hard drive of just the right sort on hand, and within three hours the laptop was returned to your author in almost fully working condition. There were two unexpected events that set the stage for today's stress-filled adventure:

  • Lab guy noted that "I was surprised that you had some Linux partitions, but it looks like Ghost copied them over without any problems. I did have to fdisk /mbr though, so you might have to reinstall your bootloader." Okay, fair enough... that kind of thing is old hat, even though you would expect the lab support guys to run into more dual-boot machines by now.
  • Lab guy mentioned "I also updated some of your drivers for you." Oh, I thought, I really didn't want you to do that, but I didn't say anything other than "Thanks!" because, after all, he had saved my laptop from a nasty death with minimal interruption to my work.

Unfortunately, the updated drivers were a problem... you see, when I got my laptop, I had major problems getting the internal wireless card working with our Cisco LEAP access points, so I had to resort to our support guys -- and that support guy said "Heh, these cards don't work very well, let me replace the internal card with a Cisco mini-PCI." I acquiesced, he replaced the internal cared, and I suddenly had a non-standard laptop. Now, the "updated" drivers turned out to be drivers for non-existing equipment in my laptop, plus our company's whiz-bang-location-aware network profile management software... which never really worked for me, and which I had laboriously replaced with standard Windows XP utilities. You can tell I wasn't really pleased about their return to my system.

I quickly uninstalled the drivers for the non-existent hardware in my laptop without any problem, then got down to work on resolving this double stack of networking drivers. Giving our internal software the benefit of the doubt, I decided to rip out the stuff that was known to work (MISTAKE #1) for the stuff our company preferred. That was carried off without too much of a problem, but I was now left with the need to configure my work wireless profile. As it turned out, there was a template profile that I could use simply by reinstalling this access connection software. I saw that the version listed in our internal docs was much older than the version the support guy installed on my laptop, but the docs said / claimed / stated that the installer would detect newer versions of the software and only install the template profiles. Great!

Imagine my horror when the installer happily went on an uninstalling binge. We're not just talking about uninstalling its newer self; this devilspawn installer rolled back a whole set of Windows updates! Well, to be fair, it didn't quite roll them all back; one Windows update happened to be uninstalled, but the installer was unable to reinstall the older version of the DLLs. It just so happened that the buggered DLL was dhcpcsvc.dll -- the DHCP client that was at the root of anything useful in my network interactions. And yes, I was suddenly without network access.

So I panicked, a little bit; it wouldn't be so bad if we ran our mail / calendars on plain old POP and iCal, but we rely on Lotus Notes, and there are ID files that you have to copy to enable a different client to connect to your mail, and all of this good and complex stuff. Hmm, thinks I again, my Linux partition had Notes running under Wine a few months back. Maybe I could rely on that until I can get the tech support guys to un-screw my laptop?

But of course, here's where the tech guy's fdisk /mbr command destroyed my GRUB bootloader, forcing me to boot up with a Linux rescue CD and reinstall the bootloader. Here's basically what I did:

  • bash# mount /dev/hda8 /mnt/restore
  • bash# mount /dev/hda7 /mnt/restore/boot
  • bash# chroot /mnt/restore
  • bash# grub-install /dev/hda1

Anyone catch the mistake? Oh go ahead, call it a blunder. In my haste to get back to work, I installed the bootloader directly on the first partition of the hard drive, typically (and sadly in this case) the C: NTFS drive. The correct approach, by the way, is to install the bootloader on the master boot record (MBR) at /dev/hda0. I have done this, oh, a couple of hundred times, but was so focused on the messed up C: drive that I messed it up even more.

At least Linux was happy to boot, and I was able to run Notes under Wine, and host my most important meeting of the week. Goody. (That was after about two hours of trying to get my Notes ID file, which comically required me to return to our Notes administrator multiple times as he first gave me his ID file, then gave me my ancient, no-longer-valid ID file, then finally gave me the right file).

All that I had to do was figure out how to correct this little matter of the corrupted C: NTFS partition on my laptop so I could continue working on my OSCON tutorial. My support ticket showed that it would be a day before I was going to get any love, so I figured I was on my own for tonight; I was determined to get this going, at least so I could copy the latest versions of the files to my home desktop so I could continue working there. Searches on Google for 'corrupted NTFS drive' or 'NTFS boot partition' etc all turned up lots of useful advice for Windows NT and Windows 2000, which seemed close enough to XP that I was willing to try it out -- but unfortunately they all required a boot floppy, and my laptop has no floppy. Boot CD maybe? Grr...

Finally (truly, finally, this is it), I decided to trust in Microsoft's good hearts and rebooted the laptop with my legitimate copy of Windows XP Professional. I skipped the "Automatic System Restore" boot option -- I don't have that much trust, thank you -- then let it load its scads of drivers, before I had the choice of a Manual Restore console. I took that, scanned the list of DOS commands that were available, and gave FIXBOOT a shot. And the dang thing worked -- it recognized that the partition was NTFS, restored the Windows MBR, and let me reboot. Cool! I also copied DHCPCSVC.DLL to the C:\windows\system32 directory using the handy expand F:\i386\DHCPCSVC.DL_ DHCPCSVC.DLL command and got some level of networking functionality back.

So I'm in the game, again, and maybe, just maybe, this experience will help someone else who winds up in the same sad position as me.