Archive Cast Forum RSS Books! Poll Results About Search Fan Art Podcast More Stuff Random 
Updates: Monday, Tuesday, Thursday, Friday; reruns all other days

1 {scene: the water cooler}
1 Mercutio: I'm running an analysis of our network traffic.
2 Mercutio: To get an estimate of the maximum hourly variance, I'm using a bootstrap resampling of data logged from different days.
3 Mercutio: Being a discrete stochastic process, the number of packets inherently follows a Poisson distribution.
4 Ophelia: Sounds... fascinating.
4 Mercutio: I call this the PoissinBoots method.
4 Ophelia: Okay, kill me now.
First (1)  Previous (2652)  Next (2654)  Latest Rerun (1655) 
Latest New (3778) First 5  Previous 5  Next 5  Latest 5 Shakespeare theme: First  Previous  Next  Latest  First 5  Previous 5  Next 5  Latest 5 This strip's permanent URL: http://www.irregularwebcomic.net/2653.html
Annotations off: turn on
Annotations on: turn off

Bootstrap resampling is a method of statistical analysis which can give estimates of the distribution of a random variable from a single sample of data, from which only a single estimate of the variable might otherwise be calculated.
To give an example, let's say we want to estimate the average height of people in a certain population, and give a possible range of heights so that we're 90% sure that the average height lies within that range. We go away and measure the heights of 100 people. Calculating the average height of our sample is easy  add the heights up and divide by 100. But the actual average height of the entire population could be different, because we haven't measured everyone.
To give ourselves some idea of how different the actual average height could be from our average, we can resample the data we have. To do this, we randomly pick a number of the measurements we've already made, and we allow each measurement to be picked any number of times. For example, 10 random picks from out sample of 100 heights might be measurement numbers {12, 28, 28, 39, 41, 55, 87, 89, 93, 99}. The 28th measurement is in this sample twice  that's fine. Each time we choose a new random sample, we choose from the entire set of measurements, including any we might already have picked.
Okay, this sample is our bootstrap resample. We calculate the average height of those 10 measurements.
Then we take another resampling of 10 heights, and calculate the average of those.
And we do it again.
And again.
And again.
And we keep on doing it for maybe hundreds, or even thousands of resamples. This sort of calculation is usually done on a computer, so it's not too tedious and timeconsuming.
When we're done, we'll have a few hundred or thousand different average heights, each one calculated from a bootstrap resampling of our original 100 measurements. And these averages will cover a range of values that gives us a good estimate of how different the actual average height might be from our initial estimate of it. If we want to give a range of heights for which we're 90% sure that the actual average height lies within it, we pick a range that includes 90% of our bootstrap resampled averages!
How neat is that?
LEGO^{®} is a registered trademark of the LEGO Group of companies,
which does not sponsor, authorise, or endorse this site. This material is presented in accordance with the LEGO^{®} Fair Play Guidelines. 