Random Errors In Perl
There is a simple mistake I make frequently with random numbers. I'll use Perl to illustrate, but trust me: I can screw this up in just about any language. And I have...
To make it both easy and obvious, let's say we want to randomly choose one of three things. Again to make it easy, we'll say that the integers 0, 1 and 2 will work for whatever it is we have in mind. So here's some code:
last if $count++ > 10000;
print "x: $x $x $x\n";
print "y: $y $y $y\n";
If you think that both the x and y arrays ought to contain roughly similar numbers in each element, you've made the mistake too. In fact, the code above will output something very much like this:
x: 2475 5033 2494
y: 3276 3343 3383
The "x" method is strongly biased toward the value of "1". That's because rand() returns values between 0 and 2 (but never 2), and int() always "rounds down" - it returns "1" for any number greater than 1 and less than 2. That biases the output toward 1. If that doesn't make sense to you, pretend for a moment that rand() could only generate integers or the .5 value between two integers - in other words, it could produce 0, 0.5, 1.0, and 1.5 only for rand(2) and 0, 0.5, 1.0, 1.5, 2.0 and 2.5 for rand(3). Here's what happens:
|Value||0||0.5 ||1.0||1.5 ||2.0||2.5|
|int(value)||0||0 ||1||1 ||2||2|
|int(value+.5)||0||1 ||1||2 ||n/a||n/a|
Do you see it now? Because int() doesn't round up as many of us seem to think it might, the x value clusters around "1"
I've carelessly and unwittingly made that mistake many times. I know better, but I find myself doing this quite often. No doubt it comes from a desire to "round up", even though that's completely unnecessary for the task at hand. The most likely explanation is that seeing "int()" triggers memories of rounding and overrides my sense of what I'm actually trying to accomplish. I'm a bit on "automatic pilot" there, letting my subconscious fill in the details, and it wants to add .5 when it sees "int()".
When you are trying to choose three or more items, this mistake does damage, but limits it to the lower and upper elements. For example, the output when we use :
will look something like:
x: 994 2017 1985 2013 2009 984
y: 1697 1660 1687 1675 1622 1661
Only 0 and 5 get shorted. Not great, but at least we do get some "randomness" (biased as it is). However, when limited to just two elements, we may not even notice the mistake at all because it's unbiased. To see that, this time pretend that rand() can only pick in increments of .25:
|Value ||0||0.25 ||0.50||0.75|
So if just picking betweeen two choices, the mistake is harmless.. but because it is harmful for larger sets, you really shouldn't get into the habit.
*Originally published at APLawrence.com
View All Articles by A.P. Lawrence
Our Daily Email of Breaking eBusiness News
About the Author:
A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com
WebProNews RSS Feed
More Expert Articles Articles