System freezing up? Check your hardware

Thomas Hawk is trying a tech support experiment, in which he posts problems with his PC and then requests help. Instead of posting in his comments, I’m going to cover one of his problems here:

Problem number 1. My computer seems to be inexplicably freezing up (yes it’s a Windows machine, I know, I know, get a Mac) periodically. These are really bad freeze ups. Control-alt-delete does not return my PC. I can’t alt tab. Total freeze up. The only way to get my computer back is to restart. The last time it happened I had Pandora on in the background (but this is probably just coincidence) the music even stops and stutters as the freeze happens. The most recent thing I’ve installed is Windows new Live One Care. My next step is going to be to uninstall Live One Care and see if that helps me out at all.

By a curious coincidence, the same thing has happened to me within the past two weeks. Based on the symptoms, Thomas’s problem has nothing to do with software and everything to do with hardware. Here’s my story, and how I resolved it.

I have a Dell PowerEdge 600SC server running Windows Server 2003. It’s about 3-1/2 years old, and it has been running nonstop with virtually no problems for all that time. Over the years, I’ve added some big hard drives, and about a year and a half ago I replaced the original 2GB of RAM with 4GB so I could run multiple virtual machines on this box.

For the past month or so, this system has been responding slowly on some activities, especially file copies over the network. Then, about two weeks ago, the server froze up one day. Simply stopped responding. The power was still on, but the screen was black and the system didn’t respond to mouse input. I pressed the power button to restart, and when it came back on, I checked the System log in Event Viewer to see if there were any events captured there that might shed light on the error. Nope. Every recorded system event up until the crash was perfectly normal.

(Note to Thomas: Be sure to check Event Viewer. From Control Panel’s Classic view, double-click Administrative Tools, then double-click Event Viewer.)

The fact that there were no events listed is actually a crucial troubleshooting piece of information. It means that whatever happened was a complete surprise to the Windows code that’s running in kernel mode and supervising the whole system. Essentially, it means Windows was mugged.

A few days later, it happened again. This time, when I restarted, I booted into Dell’s Diagnostic Utilities partition and ran its comprehensive series of diagnostics. They showed no hardware problems. I also ran a quick memory test that showed no problems. Baffled, I restarted the system. Maybe it’s a failing motherboard, I thought, or a system that’s overheating.

When it happened again the next day, I decided to run a more comprehensive memory test. And sure enough, when I ran the full suite of memory tests included with Dell’s diagnostic suite, I found that the error correcting code (ECC) in one of the server’s memory modules was causing unrecoverable errors. Now, an unrecoverable memory error is bad news and would completely explain why (1) the system was locking up and (2) the lockups had no apparent relation to any software running.

Using another diagnostic tool, I ran a different suite of tests, which showed that the fault was in the memory module in DIMM slot A. This particular system has four slots, each with a 1GB stick of RAM in it. The RAM is installed in pairs. I wasn’t sure which slot was DIMM slot A, so I took out the modules on either end and then reseated the other two DIMMs in the remaining slots.

I restarted and ran another memory diagnostic. This time the system passed with flying colors. I now a highly confident that one of the two modules I removed is defective. They’re still under warranty, so I should be able to return them for replacement.

Lessons learned:

Most system and application failures are fairly easy to identify. Random failures often indicate hardware problems.

Bad RAM, overheating, and defective hard disks, in order, are the most common hardware failures in my experience.

Hardware can fail over time. Most people assume that the problem is software because they haven’t changed any hardware lately

Hope that helps, Thomas!

18 thoughts on “System freezing up? Check your hardware

  1. Wow, real pro from Ed Bott. Thanks Ed. I’ve got some more comments up back at my post but I did find a memory checker from Microsoft to try and run on my memory and I’m going to try and use that once I can figure out how to get the .iso image onto a disc (thanks for the advice on that too).

    Are the Dell Diagnostic tools that you mentioned free by chance? I have 2 other Dell PCs but this one is not actually a Dell. Is there a place to get those Dell Diagnostic tools that I might try running on this PC.

    My freeze ups happened about 6 times in maybe 4 days. The only change that I believe I made to my PC since they started was installing Windows Live One Care (I wanted to test it out and do a write up on it maybe). I’ve subsequently uninstalled Windows Live One Care and it’s been about 10 hours or so without a freeze up. If they go away completely I’ll probably chalk it up to that. If they come back though I will need to dig deeper.

    Thanks for your feedback and advice on this. I’m hoping this experiment of blogging all of my technical problems going forward will help make me a more knowledgeable PC user as well as offer information in the archive for others having similar problems via Googling them in the future. I imagine my list is going to get pretty long as I seem to have at least one thing or two popping up each day.

  2. Thomas, the Dell diagnostics I used were preinstalled on my system, which was sold as a server. If you still have your factory installation, check next time you boot for a Utility menu. On this system it’s accessed via F10, I think, with F2 being Setup.

    I just did a quick search of the Dell site, and yes, they do indeed have Diagnostic Utilities available for desktops. Go to dell.com, choose Technical Support, click the Drivers and Downloads area, and then look under the Diagnostics category.

  3. Haven’t read the original post, but look for a bad/failing/overworked power supply, especially if you have a high-end video card made in the last few years.

  4. Just to clarify – the things that Ed talks about are a lot more likely, but if one of those doesn’t clear it up, look at the power supply.

  5. I had the same problem about 6 months ago. Unexplained crashes, freeze ups. Think Ed suggested I try checking the memory, which I did and sure enough – a bad RAM chip, which I replaced and all has been well ever since.

    Mark

  6. Spot on as always, Ed. I’ve found that random lockups with no corresponding dump are almost always caused by bad memory (or a bad / flaky video card), a faulty power supply or a hard drive going south. I’ve had all three in different measures over time, so I’ve gotten used to the “feel” (the pattern of failure) in each as well…

  7. I still wouldn’t rule out Windows One Care as the source. I had the same problem(s): repeated random freeze-ups w/BSOD. I tore out my hair trying to figure it out – checking hardware, memory, Event Viewer for clues, etc. Uninstalling One Care solved the problem two months ago. I really think that is a buggy program that should still be in beta.

  8. Ed,

    Excellent post. System Freezes in Windows 2000, XP and 2003 are almost always hardware related since these version of windows are very recoverable from software errors. This has been one of my longest issues getting through to people. You have no idea how many people blame Microsoft for their defective hardware. Overclockers are notorious. Which is why I made a simply troubleshooting guide:

    http://mywebpages.comcast.net/SupportCD/DiagnoseXP.html

  9. Ed,

    In a timely coincidence, my Dad brought me his XP system that was locking up bad just after he logged onto his desktop.

    Through a lengthy troubleshooting session, I ended up hunting down the culprit as being Microsoft’s Automatic Updates process (wuauclt.exe). I also worked out a “workaround” for him.

    Don’t know if this will help you, but it was very enlightening.

    I posted an extensive log of the troubleshooting steps on my blog:

    Thawing an XP System

    –Claus

  10. Wow! Bob. What is your secret?

    ‘I have a Dell PowerEdge 600SC server running Windows Server 2003. It’s about 3-1/2 years old, and it has been running nonstop with virtually no problems for all that time. Over the years, I’ve added some big hard drives, and about a year and a half ago I replaced the original 2GB of RAM with 4GB so I could run multiple virtual machines on this box.’

    How did you replace the hard drives and RAM with out powering the system down? I think we should be told. This could revolutionise tech support : )

  11. MrG: if the hardware supports it, Windows Server 2003 Enterprise Edition supports hot-add memory (reference: http://www.microsoft.com/whdc/system/pnppwr/hotadd/hotaddmem.mspx). Virtually any server with SCSI, SATA or Serial Attached SCSI (SAS) drives supports hot-swap of disks, although of course if you’re not to lose any data, the disks must be part of a fault-tolerant set.

    If you’re running Exchange Server, it’s recommended to disable hot-add memory support. Exchange has some pretty serious memory management issues, so it’s best to a) dedicate a server to running Exchange and b) follow all of Microsoft’s memory configuration recommendations.

  12. Could it be that Windows Live One Care isn’t necessarily the problem, but rather that program revealed a hardware problem (e.g. maybe a piece of bad RAM is only being hit when that program is running)?

  13. I too had random windows XP freeze ups. This article was an interesting read and when I pulled 1GB out and saw my PC run without lockups on the 1GB left I was impressed. It is nice to see someone adding value to the net instead of regurgitating things elsewhere.

    Thanks a lot.

  14. I HAVE a similar problem … XP just freezes … I had two 512 MB RAM sticks installed .. microsoft’s windiag utility didn’t show any errors on any of my RAM sticks ….
    Yesterday I removed one of the memory sticks and computer didn’t freeze for the next 24 hours … after that I replaced that stick with the other stick that I thought was faulty … and still no freeze ups after 6 hours ….. My computer used to freeze in maximum 2.5 hours with the two sticks installed ….

  15. I might have to argue this theory as I have 2 computers both running Vista and both seem to be freezing up due to Pandora. Not sure what the issue is but it seems to be a software. It freezes up in both Mozilla and Firefox and the machines are a compaq, and a custom. At first I thought it had something to do with my GeForce MX 4000 card but now I do not think so. Sucks cause I LOVE PANDORA!

  16. I feel I should add that it’s possible some of the freezes people are experiencing (and there seem to be quite a lot of people, myself included who are experiencing them) could be down to early/flaky drivers.

    I’ve checked everything in my machine. New HD, swapped DIMMS, replaced GPU, disabled onboard sound, removed unnecessary cards, USB devices, everything, and still I get random freezes.

    XP never once failed for me, so I’ll be using that for a while until I get to the bottom of it.

    My current theory is motherboard drivers, since they’re the only thing I’ve not been able to try/replace (none available from what I can tell).

    Theoretically a driver failure would result in a BSOD (as I understand it), as that’s what happened with XP, which at least allowed you to diagnose the problem, but all I get with Vista is freezes, and a corresponding big fat nothing in the event log.

    It’s true that this type of freeze is highly likely to be hardware related, but given that the drivers are fairly closely involved in the kernel’s ability to do anything useful with the system, if they keel over in an unexpected way, it’s possible the kernel itself might keel over in an unexpected way too.

    Feel free to blast my theories back down to earth 🙂

Comments are closed.