Implementations of bootstrapping for computing linguistic frequencies

This is a companion page for the paper

Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables Heikki Mannila, Terttu Nevalainen and Helena Raumolin-Brunberg, in xxx, Cambridge University Press, 2011, to appear.

Below is a simple Matlab function for obtaining the bootstrap estimates Other implementations will be added. (April 21, 2011)

function [bs025,bsmedian,bs975] = bs(D,S);

% D is a two-column data set with frequencies of the forms A and B
% The function computes the boostrap estimate for the frequency of A
% using S samples and for each sample average of averages
% 

[n,m] = size(D);

if m>2 or m < 2
    'the data should have two columns'
    stop
else
    
    for i = 1:S % for S bootstrapsamples
      indices = floor((n)*rand(n,1))+1;
      % indices is a list of n random integers from [1,n]
      
      Dbs = D(indices,:);
      % Dbs: the bootstrap sample 
      
      nonzero = sum(Dbs')>0;
      % the observations that have at least one data point
      
      bsaveave = mean(Dbs(nonzero,1)./(Dbs(nonzero,1)+Dbs(nonzero,2)));
      % Compute the average of averages 
      % (written in a very Matlab-specific way)
      
      bsall(i) = bsaveave;
    end

    aveaves = sort(bsall);
    bs025 = aveaves(round(0.025*S));
    bsmedian = aveaves(round(0.5*S));
    bs975 = aveaves(round(0.975*S));
end