Welcome to WebProNews Breaking eBusiness and Search News
Advertise | Newsletter | Sitemap | News Feeds News Feed 
 WebProNews Search Part of the iEntry network iEntry inc. 

Handling Missing Data In Inputs

A.P. Lawrence
Expert Author
Published: 2004-10-22

WebProNews RSS Feed


Missing data can be very annoying to a programmer. In fact, it is so annoying that very often we'll write separate programs to clean up data and eliminate unpleasant conditions so that the main program doesn't have to deal with it. Here, I'll show some examples of the kind of problems we see.

Let's take a comman data format, a TAB delimited file. A simplistic Perl program to read such a file might be:



An equivalent shell script might be



The Perl script works, but the shell script doesn't. Here's the output if the imput file looks like this



The Perl script produces



but the shell script messes up:



If this were a problem with Perl, we'd handle it like this:



But things can be worse. For example, if we are processing what was once a report format, we may have no delimiters, just empty space. We might see something like this:



You can't process that with delimiters, but you can use unpack:



Which will produce:



Comma separated value files can be annoying if they also contain commas within quoted fields. You can't use split because of that. There are at least two ways to handle that: either use the Text::Parsewords module:



Or (assuming the data is regular enough), replace commas not inside quotes with a different delimiter and then split it. I think ParseWords is easier.

But sometimes none of that is going to work either. I'm working on a project right now where the input data can have up to three fields, but any of the three can be missing and there are no delimiters and no spacing. The only way to determine what we have is to know that the field one, if present, is alpha, field two is a whole integer, and field three will always have decimal points. So



means that I have 1 and 3 on line 1, only 2 on line 2, and only 3 on line 3. It's actually much worse than this; there are other fields, some of which are always present and some which are not, and it is quite a challenge to normalize this stuff to be able to massage the data. The way to handle it is to do splits on / /, and then determine what we got. So it's something like this:



*Originally Published at http://www.aplawrence.com

Receive Our Daily Email of Breaking eBusiness News


About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

WebProNews RSS Feed

More Articles

Contact WebProNews
Advertisement





TOP NEWS

Targeted Information for Business
WebProNews is part of the iEntry network

Internet Business: Marketing: Small Business:
WebProNews MarketingNewz SmallBusinessNewz
WebProWorld AdvertisingDay PromoteNews
EcommNewz SalesNewz EntrepreneurNewz

Software: Search Engines: Web Design:
WebMasterFree Jayde B2B DesignNewz
NetworkingFiles SearchZA FlashNewz
SecurityConfig SearchNewz WebSiteNotes

Developer: IT Management: Security:
DevWebPro ITManagement SecurityProNews
DevNewz SysAdminNews SecurityConfig
TheDevWeb NetworkingFiles NetworkNewz

The iEntry Network consists of over 100 web publications reaching millions of Internet Professionals. Contact us to advertise.
eBUSINESS RESOURCES






 Advertise | Contact Us | Corporate | Newsletter | Sitemap | Submit an Article | News Feeds
 WebProNews is an iEntry, Inc. ® publication - $line) { echo $line ; } ?> All Rights Reserved
About WebProNews
WebProNews is the number one source for eBusiness News. Over 5 million eBusiness professionals read WebProNews and other iEntry business and tech publications.

WebProNews provides real-time coverage of internet business.

Free Email Newsletters:
WebProNews SearchNewz
WebProWorld DevWebPro
Marketing SecurityNews
Plus over 100 other newsletters!

Send me relevant info on products and services.


WebProWorld
Ten most recent posts.

NetworkingFiles
Featured Software

WebProNews in the News
View all recent mentions of WebProNews from around the world!

Recent Articles On ...
Google eBusiness
Yahoo Ask Jeeves
MSN Blogs
Search Engines Blogging
Affiliate Programs Marketing
eCommerce Advertising
eBay Sun Microsystems
AOL Adsense
Microsoft Adwords
Oracle IBM
Amazon Apple
SEM Mac
SEO iPod
Adsense XBox
PR Adobe



iEntry.com WebProWorld RSS Feed WebProWorld Contact WebProNews Print Version Email a friend Bookmark us