Archive     Blog     Cast     Forum     RSS     Books!     Poll Results     About     Search     Fan Art     Podcast     More Stuff     Random     Support on Patreon New comics Mon-Fri; reruns Sat-Sun
<   No. 2653   2010-05-02   >

1 {scene: the water cooler}
1 Mercutio: I'm running an analysis of our network traffic.
2 Mercutio: To get an estimate of the maximum hourly variance, I'm using a bootstrap resampling of data logged from different days.
3 Mercutio: Being a discrete stochastic process, the number of packets inherently follows a Poisson distribution.
4 Ophelia: Sounds... fascinating.
4 Mercutio: I call this the Poiss-in-Boots method.
4 Ophelia: Okay, kill me now.

 First (1) | Previous (2652) | Next (2654) || Latest Rerun (2611) | Latest New (5250) First 5 | Previous 5 | Next 5 | Latest 5 Shakespeare theme: First | Previous | Next | Latest || First 5 | Previous 5 | Next 5 | Latest 5 This strip's permanent URL: http://www.irregularwebcomic.net/2653.html Annotations off: turn on Annotations on: turn off

Bootstrap resampling is a method of statistical analysis which can give estimates of the distribution of a random variable from a single sample of data, from which only a single estimate of the variable might otherwise be calculated.

To give an example, let's say we want to estimate the average height of people in a certain population, and give a possible range of heights so that we're 90% sure that the average height lies within that range. We go away and measure the heights of 100 people. Calculating the average height of our sample is easy - add the heights up and divide by 100. But the actual average height of the entire population could be different, because we haven't measured everyone.

To give ourselves some idea of how different the actual average height could be from our average, we can resample the data we have. To do this, we randomly pick a number of the measurements we've already made, and we allow each measurement to be picked any number of times. For example, 10 random picks from out sample of 100 heights might be measurement numbers {12, 28, 28, 39, 41, 55, 87, 89, 93, 99}. The 28th measurement is in this sample twice - that's fine. Each time we choose a new random sample, we choose from the entire set of measurements, including any we might already have picked.

Okay, this sample is our bootstrap resample. We calculate the average height of those 10 measurements.

Then we take another resampling of 10 heights, and calculate the average of those.

And we do it again.

And again.

And again.

And we keep on doing it for maybe hundreds, or even thousands of resamples. This sort of calculation is usually done on a computer, so it's not too tedious and time-consuming.

When we're done, we'll have a few hundred or thousand different average heights, each one calculated from a bootstrap resampling of our original 100 measurements. And these averages will cover a range of values that gives us a good estimate of how different the actual average height might be from our initial estimate of it. If we want to give a range of heights for which we're 90% sure that the actual average height lies within it, we pick a range that includes 90% of our bootstrap resampled averages!

How neat is that?

 LEGO® is a registered trademark of the LEGO Group of companies, which does not sponsor, authorise, or endorse this site. This material is presented in accordance with the LEGO® Fair Play Guidelines.

My comics: Irregular Webcomic! | Darths & Droids | Eavesdropper | Planet of Hats | The Dinosaur Whiteboard | mezzacotta
My blogs: dangermouse.net (daily updates) | 100 Proofs that the Earth is a Globe (science!) | Carpe DMM (long form posts) | Snot Block & Roll (food reviews)
More comics I host: The Prisoner of Monty Hall | Lightning Made of Owls | Square Root of Minus Garfield | iToons | Comments on a Postcard | Awkward Fumbles