Welcome to WebProNews Breaking eBusiness and Search News
Advertise | Newsletter | Sitemap | News Feeds News Feed 
 WebProNews Search Part of the iEntry network iEntry inc. 

Another RAID Failure

A.P. Lawrence
Expert Author
Published: 2004-11-18

WebProNews RSS Feed


There must be something in the air. I've had another RAID failure. This time, it was a hardware RAID, specifically a seven year old DPT controller (DPT was subsequently bought by Adaptec).

The "Windows consultant" called me first, saying that he had come in and found the machine beeping, and realized this must be a drive failure. He also said that the backup had failed, and gave the usual apologetic "I'm not a Unix guy" (funny, though - he runs his own website on a Linux box). I understand the concern, but as I pointed out to him, this isn't an OS issue at all- the RAID is OS independent. However, I understood his worry about the backup because that could indicate something more serious like a controller or motherboard problem.

This is too important a system to leave to chance, so I cleared my schedule and drove down to the site. It's not that I don't trust the Windows guy, but I didn't want information filtered through a telephone - either from him or from me to him. Too easy to make an awful, irretrievable mistake.

Upon arrival, I ran the "dptmgr" and confirmed that indeed, ID 3 showed as failed. I also looked at the Microlite Edge printout and could see that the failure was just in one file - a Hard Read Error 6. It happened to be a log file, so if that was all it was, I wasn't too concerned. However, there are two places that could come from - either real read errors from the array, or file system inconsistency - the inode containing pointers to impossible blocks. I explained to the customer that the failed drive wouldn't cause real read errors - the RAID reconstructs the missing data. Therefore, if this really was bad reads, we had a very serious problem.

However, nothing in system logs (messages) had any disk read errors, so it looked like file system damage was the more likely cause. This would most likely be related to the RAID failure - the disk might not have failed instantly, and have caused some corruption as it died. If it truly was confined to that one file, we'd be fortunate indeed. I ran an "fsck -ofull" (SCO system) and sure enough, it identified problems with the same file Microlite BackupEDGE had complained about, and was able to clear everything out and give us back a good filesystem. That was a relief.

Now, of course, we needed to fix the failed drive. We had a bit of low comedy there - the last time I had seen the cabinet the drives were in was seven years ago, and I don't think the Windows guy had ever seen it. We couldn't figure out how to open it to get at the drives! But that wasn't what really bothered me. It was the replacement drives he had that had me worried. When we had originally installed these drives, we had tagged each drive with a paper sticky tag giving its ID. The drive he was proposing to replace the failed one with had such a tag on it, making me suspect that it was a bad drive previously removed from this box. However, we had nothing else - it's hard to find SCSI-3 drives off the shelf nowadays, so after finally figuring out how to get the old drive out, we put in the replacement and started the rebuild process. Based on the percentage counter, I knew it would take close to three hours for a rebuild. There's no reason the system couldn't be used while rebuilding, but the customer and the Windows guy said they'd prefer to just wait. I went along, and we went for a long lunch.

Shortly after we came back, the rebuild failed. I wasn't overly surprised. By now, we had found new drives which were on their way by Fedex, but there was little more we could do today. I told the customer to let people back on but to warn them that there was a small possibility of losing whatever they posted in that day (if we lost another drive, we'd be dead). I left.

The next morning, I called the customer again. He said that the backup had failed again. I asked for specifics, but was told there was no printout. I checked the Edge logs, and it looked to me like it had been interrupted part way through the verify. I asked if the database was "up" this morning (we shutdown the database before the backup and restart it when it is done). I was told, no, that the Windows guy had rebooted the machine this morning because the database wasn't running. I wish people wouldn't reboot machines - it's simple to start the database and I just can't stand the Windows "reboot fixes everything" mentality. Anyway, I could tell from the logs what happened - because the RAID was running degraded, it was much slower backing up. It just hadn't finished its verifty by the time the workday started - and to make it worse, some people had come in early because they lost so much time the day before. Since it hadn't finished, it hadn't restarted the database. I couldn't be 100% certain that the verify would have passed, but the backup had no errors, and the verify was OK up to the reboot anyway. I explained that much to the customer, and we reset the backup to start earlier as a temporary fix.

The new drive should be there tomorrow. Unless something really unfortunate happens, we should get this back in shape then.

*Originally published at APLawrence.com

Receive Our Daily Email of Breaking eBusiness News


About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

WebProNews RSS Feed

More Articles

Contact WebProNews
Advertisement





TOP NEWS

Targeted Information for Business
WebProNews is part of the iEntry network

Internet Business: Marketing: Small Business:
WebProNews MarketingNewz SmallBusinessNewz
WebProWorld AdvertisingDay PromoteNews
EcommNewz SalesNewz EntrepreneurNewz

Software: Search Engines: Web Design:
WebMasterFree Jayde B2B DesignNewz
NetworkingFiles SearchZA FlashNewz
SecurityConfig SearchNewz WebSiteNotes

Developer: IT Management: Security:
DevWebPro ITManagement SecurityProNews
DevNewz SysAdminNews SecurityConfig
TheDevWeb NetworkingFiles NetworkNewz

The iEntry Network consists of over 100 web publications reaching millions of Internet Professionals. Contact us to advertise.
eBUSINESS RESOURCES






 Advertise | Contact Us | Corporate | Newsletter | Sitemap | Submit an Article | News Feeds
 WebProNews is an iEntry, Inc. ® publication - $line) { echo $line ; } ?> All Rights Reserved
About WebProNews
WebProNews is the number one source for eBusiness News. Over 5 million eBusiness professionals read WebProNews and other iEntry business and tech publications.

WebProNews provides real-time coverage of internet business.

Free Email Newsletters:
WebProNews SearchNewz
WebProWorld DevWebPro
Marketing SecurityNews
Plus over 100 other newsletters!

Send me relevant info on products and services.


WebProWorld
Ten most recent posts.

NetworkingFiles
Featured Software

WebProNews in the News
View all recent mentions of WebProNews from around the world!

Recent Articles On ...
Google eBusiness
Yahoo Ask Jeeves
MSN Blogs
Search Engines Blogging
Affiliate Programs Marketing
eCommerce Advertising
eBay Sun Microsystems
AOL Adsense
Microsoft Adwords
Oracle IBM
Amazon Apple
SEM Mac
SEO iPod
Adsense XBox
PR Adobe



iEntry.com WebProWorld RSS Feed WebProWorld Contact WebProNews Print Version Email a friend Bookmark us