Welcome to WebProNews Breaking eBusiness and Search News
Advertise | Newsletter | Sitemap | News Feeds News Feed 
 WebProNews Search Part of the iEntry network iEntry inc. 

Fixing 404 Errors

A.P. Lawrence
Expert Author
Published: 2005-01-18

WebProNews RSS Feed


A 404 error is what you get when your browser tries to access a page that doesn't exist. Maybe you mistyped something, or the link you followed was mistyped by someone else, or maybe the webmaster moved it or renamed it or just deleted it. It's annoying for you, and sites that care about your visit try to avoid it happening.

Well, we can't stop 404's 100%, and frankly dealing with it is an annoyance for those of us maintaining the website too. It's bad enough that other sites cause us problems with incorrect links, but it is really annoying when we cause our own problems.

Unfortunately, tracking these things down and fixing them is a bit of a pain. The "Custom 404" page and associated script referred to above corrects a lot of common errors automatically, and tries to offer help when it can't just redirect you to the right page, but I need to keep updating it as I find new sources of errors. Sometimes the fix is as simple as just making a symbolic link, but if it is from an outside source, I want to correct it if I can. Even if it was caused by my own error, I may still want to add correction code in case that original error gets picked up by someone else.

So, to help me find errors, I have a Perl script that reads in the error_log, and compares it to a log of "corrections" already made by the Custom 404 script (this is necessary because the 404 ends up in my logs even though it was corrected). The script ignores pages that have already been corrected, and spits out a list of 404's I need to at least investigate. Many of these will be confused web spiders - it's really amazing how dumb some of these things are. For example, /MacOSX/macosxcupstofile.html contains this text:

sudo lpadmin -p tofile -E -v socket://localhost:12000 -m raw

Dumb spiders regularly think that is a link:

[Sun Jul 11 07:07:05 2004] [error] [client 217.107.152.79] File
does not exist:
/usr/local/www/vhosts/vps.pcunix.com/htdocs/MacOSX/socket://localhost:12000/


I have the script count the number of uncorrected 404 occurences so that I can devote immediate effort to the more serious problems. The output of the script might look something like this:

/blog/b930.html 2
/SCOFAQ/news:comp.unix.admin 1
/cgi-bin/fmail.pl 1
/Books/creatingcoolwebsites.html 10
/e51/SCOFAQ/FAQ_scotec8xsession.html 1


Obviously I need to jump on that "creatingcoolwebsites.html" problem right away.

See that "fmail.pl"? That's a script kiddy trying to break in:

205.158.224.234 - - [12/Jul/2004:12:22:04 +0000] "POST /cgi-bin/fmail.pl
HTTP/1.0" 404 2317 " http://aplawrence.com/" "-"


Checking his other attempts proves it:

205.158.224.234 - - [12/Jul/2004:12:21:05 +0000] "POST /cgi-bin/formmail.pl
HTTP/1.0" 404 2320 " http://aplawrence.com/" "-"
205.158.224.234 - - [12/Jul/2004:12:22:04 +0000] "POST /cgi-bin/fmail.pl
HTTP/1.0" 404 2317 " http://aplawrence.com/" "-"


Nothing to worry about there.

The actual script is pretty simple:

#!/usr/bin/perl
# ck404.pl
open(LOG,"www/logs/error_log");
open(C,"www/data/corrections");
%foo=();
%foo2=();
while() {
chomp;
s/->.*//;
s/^ *//;
s/ *$//;
$foo{$_}=$_;
}
close C;
while() {
chomp;
s/.*htdocs//;
s/.*cgi-bin//cgi-bin/;
s/^ *//;
s/ *$//;
next if $foo{$_};
$foo2{$_}++;
}
foreach (keys %foo2) {
print "$_ $foo2{$_}n";
}


This does generate some extra garbage now and then; it doesn't need to be perfect - it's just a helper script that saves me time.

Well, I've got a few hundred 404's I need to go look at..most of them will probably be spider errors, or things I can easily fix, but invariably there will be some new 404 mixup to deal with, and the Custom 404 code will grow some more.

*Originally published at APLawrence.com

Receive Our Daily Email of Breaking eBusiness News


About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

WebProNews RSS Feed

More Articles

Contact WebProNews
Advertisement





TOP NEWS

Targeted Information for Business
WebProNews is part of the iEntry network

Internet Business: Marketing: Small Business:
WebProNews MarketingNewz SmallBusinessNewz
WebProWorld AdvertisingDay PromoteNews
EcommNewz SalesNewz EntrepreneurNewz

Software: Search Engines: Web Design:
WebMasterFree Jayde B2B DesignNewz
NetworkingFiles SearchZA FlashNewz
SecurityConfig SearchNewz WebSiteNotes

Developer: IT Management: Security:
DevWebPro ITManagement SecurityProNews
DevNewz SysAdminNews SecurityConfig
TheDevWeb NetworkingFiles NetworkNewz

The iEntry Network consists of over 100 web publications reaching millions of Internet Professionals. Contact us to advertise.
eBUSINESS RESOURCES






 Advertise | Contact Us | Corporate | Newsletter | Sitemap | Submit an Article | News Feeds
 WebProNews is an iEntry, Inc. ® publication - $line) { echo $line ; } ?> All Rights Reserved
About WebProNews
WebProNews is the number one source for eBusiness News. Over 5 million eBusiness professionals read WebProNews and other iEntry business and tech publications.

WebProNews provides real-time coverage of internet business.

Free Email Newsletters:
WebProNews SearchNewz
WebProWorld DevWebPro
Marketing SecurityNews
Plus over 100 other newsletters!

Send me relevant info on products and services.


WebProWorld
Ten most recent posts.

NetworkingFiles
Featured Software

WebProNews in the News
View all recent mentions of WebProNews from around the world!

Recent Articles On ...
Google eBusiness
Yahoo Ask Jeeves
MSN Blogs
Search Engines Blogging
Affiliate Programs Marketing
eCommerce Advertising
eBay Sun Microsystems
AOL Adsense
Microsoft Adwords
Oracle IBM
Amazon Apple
SEM Mac
SEO iPod
Adsense XBox
PR Adobe



iEntry.com WebProWorld RSS Feed WebProWorld Contact WebProNews Print Version Email a friend Bookmark us