3 comments

FrustratedwithPCXSmallMicrosoft introduced the Reliability Monitor in Windows Vista, and improved it in Windows 7. This tool tracks errors and installations, and compiles data regularly to calculate a value called a Stability Index (SI) that ranges from 1.0 to 10.0 and represents system stability. This article explains how to keep the SIs on your PCs at or near 10.0.

In an ongoing quest for more and better systems management, Microsoft introduced a tool called the Reliability Monitor in Windows Vista. The Reliability Monitor (RM) is part of the Performance Monitor (perfmon.exe, often simply called “perfmon”) environment, a Microsoft Management Console (MMC) snap-in that provides tools to analyze system performance and reliability. The Reliability Monitor filters data that perfmon routinely gathers to track various types of runtime errors, system updates, and software and driver installations.

Understanding the Reliability Monitor

Here’s how Microsoft explains the Vista Reliability Monitor on TechNet: “Reliability Monitor provides a system stability overview and trend analysis with detailed information about individual events that may affect the system’s overall stability, such as software installations, operating system updates, and hardware failures. It begins collecting data at the time of system installation.”

To launch the Reliability Monitor in Windows Vista and Windows 7, start by typing “reli” into the Start menu search box, then hit Enter. In Vista you must then click the Reliability Monitor entry beneath the Monitoring Tools item in the left hand column as shown here (this maneuver isn’t necessary in Windows 7 where RM is a standalone tool):

Figure 1: By default, the Vista RM charts daily stability over the past 30 days.

Figure 1: By default, the Vista RM charts daily stability over the past 30 days.

The Windows 7 RM version trims the display area from 30 to 20 entries, but offers the ability to toggle between views by “Days” or “Weeks.” Alas, this tool omits listing the numeric SI value, even though it uses those values to draw its charts. (You can, however use this window’s Save Reliability History to create an XML file that contains a log of all the SI values that RM tracks, and view it in a Web browser.) Here’s what the Windows 7 version of RM looks like:

Figure 2: By contrast, the Windows 7 RM charts daily stability over the past 20 days.

Figure 2: By contrast, the Windows 7 RM charts daily stability over the past 20 days.

What RM Reports

The headings for the information columns in RM vary between Windows 7 and Windows Vista, so I map them together with explanations in Table 1. These strings appear at the lower right of each chart, and are marked with red X, yellow exclamation point, or blue “I” icons to denote errors, warnings, and information items respectively.

Vista Win7 Explanation
Application Failures Application failures Indicates an application crash, freeze, hang, etc.
Windows Failures Windows failures Indicates some Windows component or the entire OS failed
Miscellaneous failures Miscellaneous failures Applies especially to improper Windows shutdowns
None Warnings Appears when applications or system components stop responding to input, or other Warning events occur (added in Windows 7 to separate failures or errors from warnings)
Software
(Un)Installs
Information Applies to system updates, plus driver and software installs, uninstalls, and configuration changes
Hardware Failures None Reports on hardware failures (dropped in Windows 7 because these appeared so seldom in Vista)

Table 1: Side-by-side listing, comparison, and explanation of RM reporting entries

For both Windows Vista and Windows 7, RM is a handy tool and a great way to assess system stability and reliability. Windows 7 RM offers two notable improvements over the Vista version, however. First and foremost, it makes RM data available via the Windows Management Interface (WMI). Also, it offers the ability to review stability data over a longer time horizon by selecting the Weeks entry in its “View by:” control, as depicted in the next screen capture.

The WMI export feature is very important to IT professionals. They can write PowerShell scripts or use WMI-related cmdlets to gather and track this information through a centralized dashboard or control center. This makes RM a much more useful tool in an IT professional’s management and monitoring toolset.

Figure 3: The Windows 7 RM “View by Weeks” chart offers a longer-term view of system stability.

Figure 3: The Windows 7 RM “View by Weeks” chart offers a longer-term view of system stability.

As the preceding figure attests, I’ve had plenty of problems shaking down the system from which the chart originates and coercing it into stable, trouble-free operation. Since I installed Windows 7 Ultimate x86 on this system on August 9, 2009, I’ve hit 10.0 values only in five of the 20 weeks of record available, and have dipped below 5.0 in 10 of those weeks. Along the way, I learned a few important lessons about how to keep SI values up and steady. In the remaining sections of this story, I’ll share them with you.

How To Maximize SI on User PCs

Having now experimented with RM on a couple of test and production systems for over 20 months, I’ve seen many different causes for dips in SI values over time. Such causes have included driver problems, faulty driver install defaults, software incompatibilities, and — to be completely frank and unflattering — unthinking if not downright unsafe computing behavior. In turn, this has led me to formulate the following set of rules to keep SI values as high on Windows Vista and Windows 7 desktops as possible. I’ll also provide a short form for each rule, which later serves as a heading for a more detailed discussion of that rule.

  1. Never install software on a production machine unless it is absolutely necessary for the user to get his or her job done. [Install only necessary software]
  2. Don’t install new device drivers until time and testing confirm that they work as advertised or expected. [If drivers ain’t broke, don’t fix ‘em]
  3. Hide unwanted updates, or push only necessary updates.
  4. If an application or the OS asks you to reboot the PC, don’t delay that action overmuch. [When asked to reboot, do so soon]
  5. It’s a bad idea to mix and match security software components from multiple vendors. [Stick to security suites or test like crazy]
  6. Keep hardware configurations as constant as possible.

Install Only Necessary Software

After paying close attention to RM and its SI reports on Vista since March, 2008, and Windows 7 since first installing a beta in January, 2009, I’ve been bitten more often by instability issues emanating from “test” or “experimental” installs of non-production software – more than anything else. This argues eloquently for formulating group policies (GPOs) for end-user PCs that restrict (or block) users’ ability to install software on their computers. It also taught me to rigorously separate production from test systems, and to only install non-production software on test machines (and even then, preferably in a VM that could be easily replaced at need or on demand). Corollary: Don’t designate any software as production until it’s been thoroughly tested and proven stable over a 30-day period.

If Drivers Ain’t Broke, Don’t Fix ‘em

Hardware vendors often release new drivers at surprising levels of frequency. Some graphics cards get new drivers as often as once a quarter, and sometimes even more often than that. I learned the hard way to wait at least a month to install new drivers, and then only after checking vendor and third-party Web forums to make sure they were stable, then testing them on target hardware that matched production configurations before deploying those drivers. I also observed at least one case where the installer for an otherwise stable driver detected the target hardware for the install improperly and thus also installed the wrong driver.

Testing will flush out these kinds of issues, and help you resolve them before production deployment can occur. The same 30-days of stability that’s recommended for possible production software also applies to drivers as well.

Hide Unwanted Updates, or Push Only Necessary Updates

If you’re managing updates for your users, you can decide what to push; if not, you’ll want to issue recommendations when it’s safe for them to hide or ignore various items that will appear in their Windows Update listings. Here’s an excellent case in point: Although 32 language pack updates are available for Windows Vista and Windows 7 desktops, very few users need any (or many) of these. Get ‘em outta here!

When Asked to Reboot, Do So Soon

Any time systems or applications are updated, or drivers updated or replaced, it’s possible, if not likely that the installer will request a system reboot when the update process completes. This enables that software to schedule one-time activities (RunOnce) after the next start-up to delete unwanted files, reset special registry entries, and so forth. Although it’s possible (and sometimes may even be necessary) to delay the reboot, I’ve learned that it’s a good idea to reboot as soon as possible to avoid potential mishaps. If system changes occur that the RunOnce activities will erase after the next reboot, you risk leaving systems in an anomalous state.

If possible, use scripting tools that can force reboots as soon as possible after a reboot request occurs and keep running after the restart, or break scripts into portions that correspond to activities leading up to and then succeeding any necessary reboots.

Stick to Security Suites or Test Like Crazy

Although “best of breed” security applications — such as antivirus, antispyware, rootkit detectors, firewalls, spam filters, pop-up blockers, and so forth — seldom all come from the same vendor, there is risk involved in mixing and matching various components from multiple vendors. I observed this with numerous combinations of antivirus, antispyware, heuristic malware detection tools while seeking to lock down Vista and Windows 7 systems over an extended period of time. Eventually, I settled on a reasonably capable security suite instead of a combination of such tools because that eliminated a major cause of system instability (occasional bluescreens, more frequent system hangs or performance slowdowns).

You want your production environments to be as rock-solid as possible. Because security software ties into network and system I/O at a fairly low level, incompatibilities among security software components can – and sometimes will – cause problems. Avoid this approach unless you have no other choice, then test and tweak like crazy until test systems stabilize sufficiently to warrant production deployment.

Keep Hardware Configurations as Constant as Possible

Every time a single system component changes in a “standard” user configuration, you really must test the whole shebang thoroughly and completely before deploying into production. Of course, this applies mostly to device drivers, but occasionally hardware changes can provoke apparently odd or unrelated software problems as well (often, because some device drivers don’t get along well with low-level system or security software components). Again: the only way to make sure things will work properly in production is to test them thoroughly before deploying them. Ignore all temptations to do otherwise as much as you can, or this will turn around and bite you on the hindquarters.

Keep a Weather Eye on the SI to Stay Ahead of Trouble

All in all, access to stability information from Windows Vista and 7 is a big plus for IT professionals. Even though it does make more work for us, it also provides insight into the health and well-being of user systems (and their behaviors in working with their systems). If you keep a close eye on RM reports, you’ll be able to identify computers (or users) with stability issues, and proactively work to address them. Over time, this will help you create a more stable and reliable desktop environment for your users, and help you keep up with the constant flow of hardware and software changes and updates. That’s a big win-win in any workplace!

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

COMMENTS

  • [...] This post was mentioned on Twitter by Esther Schindler, sjvn and topsy_top20k, topsy_top20k_en. topsy_top20k_en said: RT @ExpertVoice: Maximize the Stability Index on Your PCs by Ed Tittel: http://bit.ly/7CPuIe Tips for troubleshooting #Win7 and #Vista [...]

  • [...] Production Systems Settle Down At Long Last Anybody who’s been following this blog for any length of time knows I’ve been battling incessant stability problems on my production PC for some time now. In fact, I have two PCs that alternate between test and production roles (a hardware configuration table follows later in this posting), both of which have settled down completely in the last three weeks or so. I upgraded both machines in the first half of August, 2009, but it’s taken me some time to work out all the kinks I’ve encountered along the way. I even wrote a story on this topic for my good friend and colleague Esther Schindler at ITExpertVoice.com; it’s entitled “Maximize the Stability  Index on Your PCs.” [...]

  • [...] PowerShell ISE also can tap into the Stability Index and connect with various Windows management tools and group policies. function getFlashVersion(){ [...]

LEAVE A COMMENT





*fields marked with an asterisk are required fields.

FM IT Expert Voice is a partnership between Dell® and Federated Media. Privacy Statement