Welcome to WebProNews Breaking eBusiness and Search News
Advertise | Newsletter | Sitemap | News Feeds News Feed 
 WebProNews Search Part of the iEntry network iEntry inc. 

Perl Sorting

A.P. Lawrence
Expert Author
Published: 2004-04-14

WebProNews RSS Feed


Arrays often need sorting. Perl has built-in ways to help you, but as usual, there's more than one way to do it.

To play with the examples shown here, you'll need a file containing a few lines of words. I'll use this:


Each
and
everyone
listen:
Exactly
one
way
is
not
the
only
way
except
when
it
is


and we'll call it "unsorted". Our first Perl sort is simple:

#!/usr/bin/perl
my @words=<>;

foreach(sort @words) {
print;
}


and in our usually highly creative way, we'll call it "t.pl". Running it with "unsorted" as an argument gives us these results:

$ ./t.pl unsorted
Each
Exactly
and
everyone
except
is
is
it
listen:
not
one
only
the
way
way
when


In some cases, that might be good enough for our needs, but it might not be (and if it were, we'd have to end this article here). So what do we need to do if we need more? Let's say we want to ignore the upper/lower case distinctions and sort in "dictionary" order, like command line "sort -f".

Well, Perl provides a way for us to do part of the sorting. That is, we can provide a subroutine that the Perl "sort" will call to decide whether one thing is greater or smaller than another. Perl will still take care of shuffling things around for us.

The subroutine is a little strange because we don't have to process arguments. For efficiency reasons, the two things we get are always $a and $b and it's up to us to compare them and return a numeric result indicating equality (0), that $a is less than $b (a negative number) or greater (a positive number). So here's our new t.pl:

#!/usr/bin/perl
my @words=<>;

foreach(sort mysort @words) {
print;
}

sub mysort {

lc($a) cmp lc($b);


The "cmp" does the job of comparing, and "lc" translates to lower case so that the result is case insensitive:

and
Each
everyone
Exactly
except
is
is
it
listen:
not
one
only
the
way
way
when


We can get as complicated as we like in the comparing subroutine. Here's one that sorts in order of the number of "e"'s in each word. A pretty artificial example, but it shows what you can do:

#!/usr/bin/perl
my @words=<>;

foreach(sort mysort @words) {
print;
}

sub mysort {
$aa=$a;$bb=$b;
($aa =~ tr/eE/eE/) <=> ($bb=~ tr/eE/eE/) || lc($a) cmp lc($b);


The counting of the "e"'s is done using the "tr" operator and the comparison needs to use the arithmetic compare <=>. We copy $a and $b to temporary variables because if we didn't, tr would change them and we'd just get numbers as our result (though in other situations, that might be just what you wanted).

By using "||" and retaining the dictionary method, words with the same number of "e"'s stay in order. Without that, we'd get this:

not
and
only
way
it
is
way
is
Each
the
listen:
Exactly
one
when
except
everyone


but with it:

and
is
is
it
not
only
way
way
Each
Exactly
listen:
one
the
when
except
everyone


Now we get to the more complicated ways. If our "unsorted" were very large, it could take a while to run. At http://perlmonks.thepen.com/145659.html, I found this:

#!/usr/bin/perl
my @words=<>;
#Guttman Rosler Transform or GRT
my @sorted=map { substr($_,4) }
sort
map { pack("LA*",tr/eE/eE/,$_) } @words;
foreach(@sorted) {
print;


This needs a lot of explanation. First, "map". Perl's "map" function is like a "foreach" loop: whatever you want to do to an array is in the first argument, the array you work on is its second. It's better than a foreach loop though, because it gives us back a new array. As a very simple example:

@lcwords=map {lc($_)} @words;

So the "map { pack("LA*",tr/eE/eE/,$_) } @words;" part of the example above returns a new array that consists of a packed 4 byte number followed by the orginal contents of each line. That number is, of course, the count provided by "tr". We use pack here because it's very quick, but if we had more complex needs, we could use sprintf or even build the string ourselves. We just have to make sure we can un-build it at the end.

It's then ordered by the "sort" on the previous line, and then the first "map { substr($_,4) }" works on that array, stripping out the packed 4 byte number that made our previous sorting work.

In spite of the double map use, this will actually run much faster than what we did before. Even on a relatively small file, "time" will show that this can be twice as fast.

Receive Our Daily Email of Breaking eBusiness News


About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com

WebProNews RSS Feed

More Articles

Contact WebProNews
Advertisement





TOP NEWS

Targeted Information for Business
WebProNews is part of the iEntry network

Internet Business: Marketing: Small Business:
WebProNews MarketingNewz SmallBusinessNewz
WebProWorld AdvertisingDay PromoteNews
EcommNewz SalesNewz EntrepreneurNewz

Software: Search Engines: Web Design:
WebMasterFree Jayde B2B DesignNewz
NetworkingFiles SearchZA FlashNewz
SecurityConfig SearchNewz WebSiteNotes

Developer: IT Management: Security:
DevWebPro ITManagement SecurityProNews
DevNewz SysAdminNews SecurityConfig
TheDevWeb NetworkingFiles NetworkNewz

The iEntry Network consists of over 100 web publications reaching millions of Internet Professionals. Contact us to advertise.
eBUSINESS RESOURCES






 Advertise | Contact Us | Corporate | Newsletter | Sitemap | Submit an Article | News Feeds
 WebProNews is an iEntry, Inc. ® publication - $line) { echo $line ; } ?> All Rights Reserved
About WebProNews
WebProNews is the number one source for eBusiness News. Over 5 million eBusiness professionals read WebProNews and other iEntry business and tech publications.

WebProNews provides real-time coverage of internet business.

Free Email Newsletters:
WebProNews SearchNewz
WebProWorld DevWebPro
Marketing SecurityNews
Plus over 100 other newsletters!

Send me relevant info on products and services.


WebProWorld
Ten most recent posts.

NetworkingFiles
Featured Software

WebProNews in the News
View all recent mentions of WebProNews from around the world!

Recent Articles On ...
Google eBusiness
Yahoo Ask Jeeves
MSN Blogs
Search Engines Blogging
Affiliate Programs Marketing
eCommerce Advertising
eBay Sun Microsystems
AOL Adsense
Microsoft Adwords
Oracle IBM
Amazon Apple
SEM Mac
SEO iPod
Adsense XBox
PR Adobe



iEntry.com WebProWorld RSS Feed WebProWorld Contact WebProNews Print Version Email a friend Bookmark us