Resampling

Assignment 2

Posted by Xiaolu on March 21, 2019

1. Definition

In statistics, resampling is a relative broad concept of re-construct samples. Any method doing one of the following can be called resampling:

  • Estimating the precision of sample statistics (medians, variances) by using subsets of available data or drawing randomly with replacement from a set of data points.
  • Exchanging labels on data points when performing significance tests.
  • Validating models by using random subsets.

Here in this article, we mainly focus on the first one, in which the most common techniques are bootstrapping and jackknifing.

2. Methods

2.1 Bootstrap

In statistics, bootstrapping is any test or metric that relies on random sampling with replacement.

Approach of bootstrapping

Alt text

Formally, the bootstrap works by treating inference of the true probability distribution J, given the original data, as being analogous to inference of the empirical distribution Ĵ, given the resampled data. The accuracy of inferences regarding Ĵ using the resampled data can be assessed because we know Ĵ. If Ĵ is a reasonable approximation to J, then the quality of inference on J can in turn be inferred. In simple words, bootstrap is to randomly select data from original sample with replacement so that the new generated samples will be asymptotic to underlying population.

2.2 Jackknife

Jackknife, unlike bootstrap, only takes one point out of the original sample to generate new samples. Jackknife pre-dates bootstrap and it is essentially a linear approximation of the bootstrap.

Approach of jackknifing

Alt text

The jackknife estimate of a parameter can be found by estimating the parameter for each subsample omitting the i-th observation. For example, if we intend to estimate the population mean of x, we compute the mean for each subsample consisting of all but the i-th data point.

Alt text

Then for each point we do the same procedure, and the mean of this sampling distribution is the average of these n estimates.

Alt text

3. Advantages

  • No need of the theoretical distribution of a statistic. Because the resampling procedure is distribution-independent. It provides an indirect method to assess the properties of the distribution underlying the sample and the parameters of interest that are derived from this distribution.
  • No need of large sample size for statistical inference. Since resampling generates samples itself and then use the new larger samples to estimate statistics.

Here is an example showing that advantages of resampling (Bootstrapping). The green-bar histogram below, which is clearly left-skewed, shows how well 159 participants performed on a L1 German vocabulary test (WST). We intend to calculate the sample mean, but we want to disregard the lowest 10% of scores. For this original sample, this yields a value of 32.7 (the bottom 10% of this sample (16 values) is dropped). However, we cannot trust the result because data points are dropped. And then we run bootstrapping to verify it.

Alt text

Using the approach mentioned above, we randomly draw data form the original set to generate new samples. The three new samples in middle row obtained by resampling with replacement from the original set and we drop bottom 10% (smaller than red dash line) and record their asymmetrically trimmed means. After 1000 times, blue histogram shows that to what extent the trimmed means vary in the new, randomly created samples. To construct a 80% confidence interval around the asymmetrically trimmed mean of 32.7, we can sort the values in the blue histogram from low to high and look up the values that demarcate the middle 80% (roughly 31.9 and 33.6).

4. Limitation

Resampling automatically assume that the original sample appropriately reflects the properties of the underlying population. However, this assumption is not always true because sample loses the properties of true distribuion more or less. If our sample is too small, or apparently violates the prabable distribution that we presume, it is better not to use resampling.

References

Resampling

Bootstrap

Jackknife

janhove.github.io