Home | Installation | Control Streams | Bootstrap
| Randomization Test | Visual
Predictive Check | Autocovariate
| Files | References
Last Updated: 3 April 2012
The bootstrap was invented by Brad Effron, Prof Statistics at Stanford in the 1970s. Davison & Hinkley have published an excellent text Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics , No 1) by A. C. Davison, D. V. Hinkley (Contributor) (Paperback - November 1997) .
My interest in applying the bootstrap to NONMEM was stimulated by John Parke and led to a publication based on an early implementation of the nmbs procedure (Parke, Holford & Charles 1999). A subsequent publication used it to describe the time course of washout of anti-parkinsonian drug effect (Hauser & Holford 2002).
The purpose of the bootstrap is to try and learn about the statistical parameters of a distribution e.g. mean, standard error when the true distribution is unknown and one just has a set of observations. The key idea is to use the set of observations as an empirical representation of the true distribution.
Suppose the true distribution is called F and the empirical distribution is Fhat. If we sample from Fhat we can use this sample to compute a statistic, Thetastar, which is an estimate of the true value of Theta as a parameter of F. If we sample many times from Fhat we can learn about the distribution of Thetastar and describe its properties e.g. its standard error or its 95% confidence interval.
This kind of sampling is "sampling with
replacement" i.e. it is possible to get one of the original observations
more than once in the bootstrap sample.
#Data is the empirical dist vector Fhat[] of length n
#Let
NBOOT be the number of bootstrap samples
#Should be at least 100 for se and perhaps 1000 or
more
#for 95% CI
for (i=1; i <= NBOOT; i++ ) {
#Sample the elements of Fhat n times using a uniform random distribution
for ( j=1; j <= n; j++ ) {
isub=int(n*rand())+1 # randomly sample a subject
BS[j] = Fhat[isub]
}
#Estimate parameter from the bootstrap sample e.g. the average
Thetastar[i] = average(BS)
}
#Describe the distribution of Thetastar e.g. standard error
se = stdev(Thetastar)
An implementation of the bootstrap using awk (bs.awk):
awk -f bs.awk -v NBOOT=100 -v SEED=981124 cep.dat
nboot 100 Mean 0.0714337 SE 0.0114872
#bs
-v NBOOT= -v SEED=
#awk
srand will start RNG at same place if SEED is left
blank
#value of SEED does
not change starting place
BEGIN {
if ( NBOOT == "" ) NBOOT=100
if
( SEED != "" ) srand(SEED)
for
(i=1; i <= 10; i++ ) #warm up RNG
rand()
MT="." #missing value
}
#main program
{
#
read the data
n=getFhat()
# compute
the bootstrap
bootstrap(n,NBOOT)
exit
}
function getFhat()
{
n=0
ok=getline
do
{
n++
Fhat[n]=$NF
ok=getline
}
while ( ok > 0 )
return n
}
function bootstrap(n,nboot,
i) {
for
( i=1; i <= nboot; i ++ ) {
Thstar[i]=theta(n,Fhat)
}
stats(nboot,Thstar)
}
function theta(n,Fhat, j,pwr,fhat) {
sum=0; nobs=0
for
( j=1; j <= n; j++ ) {
fhat=Fhat[int(n*rand())+1]
if ( fhat != MT ) {
# your favourite "statistic" gets
calculated from the data
sum=sum+fhat
nobs++
}
}
if
( nobs > 0 ) {
# finish the calculation of the statistic
return sum/nobs
}
else {
return 0
}
}
function stats(nboot,Thstar)
{
sum=0
ssq=0
for
( i=1; i <= nboot ; i++ ) {
thstar=Thstar[i]
sum=sum+thstar
ssq=ssq+thstar*thstar
}
# NB The
"data" in this case is the mean of each BS sample
#
so the std deviation of this set of means is the BS
#
standard error!
if
( nboot > 0 ) {
mean=sum/nboot
if ( nboot > 1 ) {
var=0
for (i=1; i<=nboot; i++) {
resid=Thstar[i]-mean
var=var+resid*resid
}
sd=sqrt(var/(nboot-1))
} else {
sd=MT
}
}
else {
mean= MT
}
printf("nboot %5s\tMean %8s\tSE %8s\n",nboot,mean,sd)
}
Home | Installation | Control Streams | Bootstrap | Randomization Test | Visual Predictive Check | Autocovariate | Files | References