Debugging A Network Failure
As is happening in much of the U.S. right now, we are experiencing extreme heat in New England, and of course that means high electrical loads from air conditioning, and also means late afternoon thunderstorms with lightning.
This morning I had a call from a Boston customer. "Everything's down", he said, "Internet and our servers. We can't get to anything. But.. the guys who came in early are still working."
That last bit told me that the servers weren't down. Most likely machines weren't getting IP addresses from the router. People who had come in early had been able to get IP's and still had them. Latecomers didn't. So.. dead router? Maybe. We tried power cycling it, no change. Lights on it looked right, but he still wasn't working.
I had him walk back to his office and try "ipconfig/renew" from a DOS window. No response, just hung. As I couldn't reach his router remotely, a dead router was certainly a possibility. But we needed to be sure.
I asked him if he had a laptop in the building; he didn't. So, if a computer won't come to the router, we'll pick up the router and bring it to a computer. "Doesn't it need to be connected to the T1?", he asked. Well, sure, for Internet access, but not to hand out IP's. So I had him unplug the router and carry it to his office. Unplug his machine from the wall, plug that wire into a LAN port on the router, do the "ipconfig/renew" again. Bingo - he had an IP address. The router is not dead, at least not on the LAN side. So I had him bring it back and plug it back in where it belonged. Just to be sure, I had him walk back to his office and do "ipconfig/renew" again. No luck.
Ok, we have a wiring problem or a switch port problem. I knew the wiring was new and I had tested it myself, so I doubted that. But sometimes mice will chew wires, so it might have to be checked. But before trying that, I asked him to trace the wire from the LAN side of the router to his office switch. He had free ports there, so I had him switch it. Sent him walking back to his office (poor guy was getting a lot of exercise this morning) and try "ipconfig/all". It worked, telling us that a dead port on his switch was the problem. I had him reboot and try to access his server. That now worked.
But the Internet still didn't, and I still couldn't access his router. It would have been surprising if I could: a dead port on the LAN side of his system wouldn't prevent me from accessing his router over the Internet. Back to look at the router.
The T1 unit plugs into a five port switch and one wire runs from that to the WAN side of the router. The other ports are for other devices with public IP's, but those are still on an older DSL line. As soon as whoever handles those devices gets their act together and reconfigures them, they can plug into this little switch too, but right now the other ports are empty. I asked him to pull both wires out of the little switch and put them back into unused ports. Ayup, instant success: I could access his router.
So, what happened? Probably a power surge of some kind early this morning. Did he have surge protectors? Yes, but.. some were old. And who knows if a surge protector will really work anyway? If only a small spike gets through, that may not bother some equipment at all but could kill other equipment dead. Or it might just temporarily confuse it: unplug the switch for ten seconds and it might be fine. But.. it might also be weakened by the experience, so considering the cost, I recommend just replacing anything suspect. In this case, he needs new switches anyway and it wouldn't hurt to have an electrician look over the wiring rat's nest and improve it.
This kind of debugging and fault resolution is no different than any other: test, isolate, test, repeat. It can be easier if you have a laptop and spare equipment, but mostly it's just old fashioned logic.
Good time to review your electrical systems, isn't it?
*Originally published at APLawrence.com
Add to Del.icio.us | Digg | Yahoo! My Web | Furl
Our Daily Email of Breaking eBusiness News
About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com
WebProNews RSS Feed
More Expert Articles Articles