Bootstrap


Home | Installation | Control Streams | Bootstrap | Randomization Test | Visual Predictive Check | Autocovariate | Files | References

 

Last Updated: 3 April 2012

 

The bootstrap was invented by Brad Effron, Prof Statistics at Stanford in the 1970s.  Davison & Hinkley have published an excellent text Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics , No 1)  by A. C. Davison, D. V. Hinkley (Contributor) (Paperback - November 1997) .

 

My interest in applying the bootstrap to NONMEM was stimulated by John Parke and led to a publication based on an early implementation of the nmbs procedure (Parke, Holford & Charles 1999). A subsequent publication used it to describe the time course of washout of anti-parkinsonian drug effect (Hauser & Holford 2002).

 

The purpose of the bootstrap is to try and learn about the statistical parameters of a distribution e.g. mean, standard error when the true distribution is unknown and one just has a set of observations. The key idea is to use the set of observations as an empirical representation of the true distribution.

 

Suppose the true distribution is called F and the empirical distribution is Fhat. If we sample from Fhat we can use this sample to compute a statistic, Thetastar, which is an estimate of the true value of Theta as a parameter of F. If we sample many times from Fhat we can learn about the distribution of Thetastar and describe its properties e.g. its standard error or its 95% confidence interval.

 

This kind of sampling is "sampling with replacement" i.e. it is possible to get one of the original observations more than once in the bootstrap sample.
 

Algorithm

#Data is the empirical dist vector Fhat[] of length n

#Let NBOOT be the number of bootstrap samples
#Should be at least 100 for se and perhaps 1000 or more
#for 95% CI

for (i=1; i <= NBOOT; i++ ) {

 

#Sample the elements of Fhat n times using a uniform random distribution

    for ( j=1; j <= n; j++ )  {

        isub=int(n*rand())+1 # randomly sample a subject

        BS[j] = Fhat[isub]

   }

    

#Estimate parameter from the bootstrap sample e.g. the average

    Thetastar[i] = average(BS)

}

#Describe the distribution of Thetastar e.g. standard error

se = stdev(Thetastar)

 

An implementation of the bootstrap using awk (bs.awk):

awk -f bs.awk -v NBOOT=100 -v SEED=981124 cep.dat

nboot 100 Mean 0.0714337 SE 0.0114872

#bs -v NBOOT= -v SEED=
#awk srand will start RNG at same place if SEED is left blank
#value of SEED does not change starting place
 

BEGIN {
    if ( NBOOT == "" ) NBOOT=100
    if ( SEED != "" ) srand(SEED)
    for (i=1; i <= 10; i++ ) #warm up RNG
        rand()
    MT="." #missing value
}

#main program
{
    # read the data
    n=getFhat()
    # compute the bootstrap
    bootstrap(n,NBOOT)
    exit
}

function getFhat() {
    n=0
    ok=getline
    do {
        n++
        Fhat[n]=$NF
        ok=getline
    } while ( ok > 0 )
    return n
}

function bootstrap(n,nboot, i) {
    for ( i=1; i <= nboot; i ++ ) {
        Thstar[i]=theta(n,Fhat)
    }
    stats(nboot,Thstar)
}

function theta(n,Fhat, j,pwr,fhat) {
    sum=0; nobs=0
    for ( j=1; j <= n; j++ ) {
        fhat=Fhat[int(n*rand())+1]
        if ( fhat != MT ) {
            # your favourite "statistic" gets calculated from the data
            sum=sum+fhat
            nobs++
        }
    }
    if ( nobs > 0 ) {
        # finish the calculation of the statistic
        return sum/nobs
    } else {
        return 0
    }
}

function stats(nboot,Thstar) {
    sum=0
    ssq=0
    for ( i=1; i <= nboot ; i++ ) {
        thstar=Thstar[i]
        sum=sum+thstar
        ssq=ssq+thstar*thstar
    }

    # NB The "data" in this case is the mean of each BS sample
    # so the std deviation of this set of means is the BS
    # standard error!
    if ( nboot > 0 ) {
        mean=sum/nboot
        if ( nboot > 1 ) {
            var=0
            for (i=1; i<=nboot; i++) {
                resid=Thstar[i]-mean
                var=var+resid*resid
            }
            sd=sqrt(var/(nboot-1))
        } else {
            sd=MT
        }
    } else {
        mean= MT
    }
    printf("nboot %5s\tMean %8s\tSE %8s\n",nboot,mean,sd)
}
 

Home | Installation | Control Streams | Bootstrap | Randomization Test | Visual Predictive Check | Autocovariate | Files | References