Welcome to WebProNews Breaking eBusiness and Search News
Advertise | Newsletter | Sitemap | News Feeds News Feed 
 WebProNews Search Part of the iEntry network iEntry inc. 

Troubleshooting Mistakes

A.P. Lawrence
Expert Author
Published: 2006-08-21

WebProNews RSS Feed


The very first part of troubleshooting is identifying the problem. That's not always easy even for skilled professionals.

It's definitely not easy for the typical computer user, so when you get the call (we're assuming that you are the professional who gets called with problems), what you are told may not match reality. This isn't to imply that users are stupid, or ignorant, or careless (though some are all of those things), but simply that they may misinterpret symptoms and miss seeing the real problem.

Professionals do the same thing. In my career I've had more than one telephone call where someone describes themselves as a competent Windows administrator but apologizes for not "knowing Unix". Sometimes we end up having an easy conversation where the problem really is simply that they need a little (sometimes a very little) Unix guidance to help them fix their issue. Sometimes it's a little more involved: they've hit a tough nut and they'd really have needed years of experience to have any hope of fixing things.

Sometimes it's not like that at all. More than once the immediate problem was a dead, non-booting machine. I don't mean that Linux or Unix was trying to load and failing along the way, I mean that you could push the power button and the lights would come on and that was it. Nothing more. No BIOS display, no disk spin up, no beeps, nothing. Just dead. And yet here we have a supposedly competent Windows support person asking me what to do. What's that have to do with me? It's not a Unix issue - we haven't got that far yet. It might become a Unix issue: if the hard drive has been damaged by whatever caused the stubborn nothingness being seen now, we might need a data recovery firm with knowledge of Unix/Linux file systems. Even if it's just a missing boot sector, repairing that certainly requires OS specific knowledge. But right now? This is a low level hardware issue. Maybe a failed motherboard, power supply or missing/unplugged cables. Whatever is going on, right now it has nothing to do with Linux or Unix.

If you are dealing with a non-computer savvy user, remember that they may not understand things that seem obvious to you. For example, the user may understand that the hard drive stores his operating system and files, but may not realize that the initial BIOS information that flashes by at boot doesn't come from there. So while you would think very differently about a machine that displays BIOS information but does not continue versus a machine that displays no BIOS data at all, the user of that machine might not. You need to interpret problem reports with an eye toward the reporters knowledge.

But you know that. You also know that if it is your suspicion that somebody did something they shouldn't have, the user may not be willing to admit to it. You are going to take everything with a big helping of salt, and decide for yourself what the problem is. After all, you are the professional.

OK. But professionals also misinterpret things. You probably know this too: what you think you know can hurt you more than what you don't know. Do I assume too much? Maybe so: I know I make mistakes like that, and I've sure seen other people do the same thing, but you could be different. If so, you can either skip the rest of this post or read it with relish while you savor your superiority.

I can remember the first bad troubleshooting mistake I made. It cost me a good customer - not because they were angry with me, but because I had them switch to hardware and software I did not support. I advised them to switch because I thought the OS and hardware they were running on had reached its performance limits. They were running a product called Glovia on a SCO Unix 80386 box. There were only about twenty users, but the background code had been getting more and more complicated over the years, and the system was slowing down badly. I tried increasing swap, adding more memory, and everything else that I could think of, but it kept getting worse. As their Glovia programmer was constantly adding new features, I assumed that these new routines were simply overtaxing the system: heck, I could see it in the sar reports: both the cpu and disk i/o were under excessive load. Basically, I just gave up and agreed with the advice they had received from Glovia and their programmer: upgrade to a big HP/UX system. They did, performance returned to acceptable levels, but because I didn't know much about HP/UX, another consultant took over my position. I felt good about it overall: I had done the right thing, and I had more clients than I needed anyway. All parties pleased, time to move on.

But I was very, very wrong. I assumed the increasing load was from the heavy new tasks being added weekly, so I just didn't look far enough. I had done some "ps" runs, but had missed seeing something very important. The clutter of Glovia processes blinded me: I didn't see the big lumbering elephant in the crowd of dancing lambs. What I missed was an MMDF process called "deliver". The reason I probably missed it was because I was looking for processes that were gaining time right now: I'd take two "ps" snapshots and "diff" them (this procedure is covered in more detail in a later chapter). The processes that popped out had used cpu time between the snapshots. If I had been lucky, "deliver" would have been in that list, but my timing was unfortunate: although "deliver" was using a lot of cpu, it didn't happen to be sucking any at the times I happened to look.

I know this because I accidentally saved some printouts from that system. For some reason I had tucked them in my briefcase and forgot all about them. When I found them a few years later while searching for something else, I happened to take a quick glance, recognized where they were from, and immediately had an awful feeling in the pit of my stomach. I got that feeling because I noticed "deliver" and saw that it had a lot of accumulated time. In the intervening years, I had seen that at other SCO Unix jobs, and I knew what it meant. It meant that there were thousands, perhaps tens of thousands, of mail messages backed up on the system. That "deliver" process was trying desperately to run through them to see if they could now be delivered. It would do a lot of disk i/o and consume a lot of cpu in the process, and then it would go away until it was scheduled to run again. Eventually there were so many messages that it was almost always running - except when I ran my snapshots, of course. Just my luck, I guess - or more likely I just didn't run enough of them because I saw all those Glovia processes and "knew" they had to be the problem.

Why didn't the customer notice backed up email? Because it was root's messages that were being delayed (due to a lock file on root's mail folder) and nobody cared about root's mail. User's mail of course was getting slower and slower, but I took that as symptomatic rather than closer to causal. My loss: I saw what I expected to see, I didn't see anything else, and I gave away a good account years before I had to.

*Originally published at APLawrence.com

Bookmark WebProNews:

Receive Our Daily Email of Breaking eBusiness News


About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

WebProNews RSS Feed

More Expert Articles Articles

Contact WebProNews
Advertisement





TOP NEWS

Targeted Information for Business
WebProNews is part of the iEntry network

Internet Business: Marketing: Small Business:
WebProNews MarketingNewz SmallBusinessNewz
WebProWorld AdvertisingDay PromoteNews
EcommNewz SalesNewz EntrepreneurNewz

Software: Search Engines: Web Design:
WebMasterFree Jayde B2B DesignNewz
NetworkingFiles SearchZA FlashNewz
SecurityConfig SearchNewz WebSiteNotes

Developer: IT Management: Security:
DevWebPro ITManagement SecurityProNews
DevNewz SysAdminNews SecurityConfig
TheDevWeb NetworkingFiles NetworkNewz

The iEntry Network consists of over 100 web publications reaching millions of Internet Professionals. Contact us to advertise.
eBUSINESS RESOURCES






 Advertise | Contact Us | Corporate | Newsletter | Sitemap | Submit an Article | News Feeds
 WebProNews is an iEntry, Inc. ® publication - $line) { echo $line ; } ?> All Rights Reserved
About WebProNews
WebProNews is the number one source for eBusiness News. Over 5 million eBusiness professionals read WebProNews and other iEntry business and tech publications.

WebProNews provides real-time coverage of internet business.

Free Email Newsletters:
WebProNews SearchNewz
WebProWorld DevWebPro
Marketing SecurityNews
Plus over 100 other newsletters!

Send me relevant info on products and services.


WebProWorld
Ten most recent posts.

NetworkingFiles
Featured Software

WebProNews in the News
View all recent mentions of WebProNews from around the world!

Recent Articles On ...
Google eBusiness
Yahoo Ask Jeeves
MSN Blogs
Search Engines Blogging
Affiliate Programs Marketing
eCommerce Advertising
eBay Sun Microsystems
AOL Adsense
Microsoft Adwords
Oracle IBM
Amazon Apple
SEM Mac
SEO iPod
Adsense XBox
PR Adobe



iEntry.com WebProWorld RSS Feed WebProWorld Contact WebProNews Print Version Email a friend Bookmark us