Title: | Resampling Functions |
---|---|
Description: | Bootstrap, permutation tests, and jackknife, featuring easy-to-use syntax. |
Authors: | Tim Hesterberg |
Maintainer: | Tim Hesterberg <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 0.6 |
Built: | 2025-02-14 03:40:11 UTC |
Source: | https://github.com/cran/resample |
Resampling functions, including one- and two-sample bootstrap and permutation tests, with an easy-to-use syntax.
See library(help = resample)
for version number, date, etc.
A list of datasets is at
resample-data
,
The main resampling functions are:
bootstrap
,
bootstrap2
,
permutationTest
, permutationTest2
.
Methods for generic functions include:
print.resample
,
plot.resample
, hist.resample
,
qqnorm.resample
, and
quantile.resample
.
Functions that calculate confidence intervals for bootstrap
and bootstrap2
objects: CI.bca
,
CI.bootstrapT
,
CI.percentile
,
CI.t
.
Functions that generate indices for random samples:
samp.bootstrap
,
samp.permute
.
This is called by the main resampling functions, but can also be
called directly:
resample
.
I will post the newest versions to https://www.timhesterberg.net/r-packages. See that page to join a list for announcements of new versions.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
data(Verizon) ILEC <- with(Verizon, Time[Group == "ILEC"]) CLEC <- with(Verizon, Time[Group == "CLEC"]) #### Sections in this set of examples ### Different ways to specify the data and statistic ### Example with plots and confidence intervals. ### Different ways to specify the data and statistic # This code is flexible; there are different ways to call it, # depending on how the data are stored and on the statistic. ## One-sample Bootstrap # Ordinary vector, give statistic as a function bootstrap(CLEC, mean) # Vector by name, give statistic as an expression bootstrap(CLEC, mean(CLEC)) # Vector created by an expression, use the name 'data' bootstrap(with(Verizon, Time[Group == "CLEC"]), mean(data)) # A column in a data frame; use the name of the column temp <- data.frame(foo = CLEC) bootstrap(temp, mean(foo)) # Put function arguments into an expression bootstrap(CLEC, mean(CLEC, trim = .25)) # Put function arguments into a separate list bootstrap(CLEC, mean, args.stat = list(trim = .25)) ## One-sample jackknife # Syntax is like bootstrap, e.g. jackknife(CLEC, mean) ## One-sample permutation test # To test H0: two variables are independent, exactly # one of them just be permuted. For the CLEC data, # we'll create an artificial variable. CLEC2 <- data.frame(Time = CLEC, index = 1:length(CLEC)) permutationTest(CLEC2, cor(Time, index), resampleColumns = "index") # Could permute "Time" instead. # resampleColumns not needed for variables outside 'data' permutationTest(CLEC, cor(CLEC, 1:length(CLEC))) ### Two-sample problems ## Different ways to specify data and statistic ## Two-sample bootstrap # Two data objects (one for each group) bootstrap2(CLEC, data2 = ILEC, mean) # data frame containing y variable(s) and a treatment variable bootstrap2(Verizon, mean(Time), treatment = Group) # treatment variable as a separate object temp <- Verizon$Group bootstrap2(Verizon$Time, mean, treatment = temp) ## Two-sample permutation test # Like bootstrap2, e.g. permutationTest2(CLEC, data2 = ILEC, mean) ### Example with plots and confidence intervals. boot <- bootstrap2(CLEC, data2 = ILEC, mean) perm <- permutationTest2(CLEC, data2 = ILEC, mean, alternative = "greater") par(mfrow = c(2,2)) hist(boot) qqnorm(boot) qqline(boot$replicates) hist(perm) # P-value perm # Standard error, and bias estimate boot # Confidence intervals CI.percentile(boot) # Percentile interval CI.t(boot) # t interval using bootstrap SE # CI.bootstrapT and CI.bca do't currently support two-sample problems. # Statistic can be multivariate. # For the bootstrap2, it must have the estimate first, and a standard # error second (don't need to divide by sqrt(n), that cancels out). bootC <- bootstrap(CLEC, mean, seed = 0) bootC2 <- bootstrap(CLEC, c(mean = mean(CLEC), sd = sd(CLEC)), seed = 0) identical(bootC$replicates[, 1], bootC2$replicates[, 1]) CI.percentile(bootC) CI.t(bootC) CI.bca(bootC) CI.bootstrapT(bootC2) # The bootstrapT is the most accurate for skewed data, especially # for small samples. # By default the percentile and BCa intervals are "expanded", for # better coverage in small samples. To turn this off: CI.percentile(bootC, expand = FALSE)
data(Verizon) ILEC <- with(Verizon, Time[Group == "ILEC"]) CLEC <- with(Verizon, Time[Group == "CLEC"]) #### Sections in this set of examples ### Different ways to specify the data and statistic ### Example with plots and confidence intervals. ### Different ways to specify the data and statistic # This code is flexible; there are different ways to call it, # depending on how the data are stored and on the statistic. ## One-sample Bootstrap # Ordinary vector, give statistic as a function bootstrap(CLEC, mean) # Vector by name, give statistic as an expression bootstrap(CLEC, mean(CLEC)) # Vector created by an expression, use the name 'data' bootstrap(with(Verizon, Time[Group == "CLEC"]), mean(data)) # A column in a data frame; use the name of the column temp <- data.frame(foo = CLEC) bootstrap(temp, mean(foo)) # Put function arguments into an expression bootstrap(CLEC, mean(CLEC, trim = .25)) # Put function arguments into a separate list bootstrap(CLEC, mean, args.stat = list(trim = .25)) ## One-sample jackknife # Syntax is like bootstrap, e.g. jackknife(CLEC, mean) ## One-sample permutation test # To test H0: two variables are independent, exactly # one of them just be permuted. For the CLEC data, # we'll create an artificial variable. CLEC2 <- data.frame(Time = CLEC, index = 1:length(CLEC)) permutationTest(CLEC2, cor(Time, index), resampleColumns = "index") # Could permute "Time" instead. # resampleColumns not needed for variables outside 'data' permutationTest(CLEC, cor(CLEC, 1:length(CLEC))) ### Two-sample problems ## Different ways to specify data and statistic ## Two-sample bootstrap # Two data objects (one for each group) bootstrap2(CLEC, data2 = ILEC, mean) # data frame containing y variable(s) and a treatment variable bootstrap2(Verizon, mean(Time), treatment = Group) # treatment variable as a separate object temp <- Verizon$Group bootstrap2(Verizon$Time, mean, treatment = temp) ## Two-sample permutation test # Like bootstrap2, e.g. permutationTest2(CLEC, data2 = ILEC, mean) ### Example with plots and confidence intervals. boot <- bootstrap2(CLEC, data2 = ILEC, mean) perm <- permutationTest2(CLEC, data2 = ILEC, mean, alternative = "greater") par(mfrow = c(2,2)) hist(boot) qqnorm(boot) qqline(boot$replicates) hist(perm) # P-value perm # Standard error, and bias estimate boot # Confidence intervals CI.percentile(boot) # Percentile interval CI.t(boot) # t interval using bootstrap SE # CI.bootstrapT and CI.bca do't currently support two-sample problems. # Statistic can be multivariate. # For the bootstrap2, it must have the estimate first, and a standard # error second (don't need to divide by sqrt(n), that cancels out). bootC <- bootstrap(CLEC, mean, seed = 0) bootC2 <- bootstrap(CLEC, c(mean = mean(CLEC), sd = sd(CLEC)), seed = 0) identical(bootC$replicates[, 1], bootC2$replicates[, 1]) CI.percentile(bootC) CI.t(bootC) CI.bca(bootC) CI.bootstrapT(bootC2) # The bootstrapT is the most accurate for skewed data, especially # for small samples. # By default the percentile and BCa intervals are "expanded", for # better coverage in small samples. To turn this off: CI.percentile(bootC, expand = FALSE)
Basic resampling. Supply the data and statistic to resample.
bootstrap(data, statistic, R = 10000, args.stat = NULL, seed = NULL, sampler = samp.bootstrap, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE) bootstrap2(data, statistic, treatment, data2 = NULL, R = 10000, ratio = FALSE, args.stat = NULL, seed = NULL, sampler = samp.bootstrap, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE) permutationTest(data, statistic, R = 9999, alternative = "two.sided", resampleColumns = NULL, args.stat = NULL, seed = NULL, sampler = samp.permute, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE, tolerance = .Machine$double.eps ^ 0.5) permutationTest2(data, statistic, treatment, data2 = NULL, R = 9999, alternative = "two.sided", ratio = FALSE, paired = FALSE, args.stat = NULL, seed = NULL, sampler = samp.permute, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE, tolerance = .Machine$double.eps ^ 0.5)
bootstrap(data, statistic, R = 10000, args.stat = NULL, seed = NULL, sampler = samp.bootstrap, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE) bootstrap2(data, statistic, treatment, data2 = NULL, R = 10000, ratio = FALSE, args.stat = NULL, seed = NULL, sampler = samp.bootstrap, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE) permutationTest(data, statistic, R = 9999, alternative = "two.sided", resampleColumns = NULL, args.stat = NULL, seed = NULL, sampler = samp.permute, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE, tolerance = .Machine$double.eps ^ 0.5) permutationTest2(data, statistic, treatment, data2 = NULL, R = 9999, alternative = "two.sided", ratio = FALSE, paired = FALSE, args.stat = NULL, seed = NULL, sampler = samp.permute, label = NULL, statisticNames = NULL, block.size = 100, trace = FALSE, tolerance = .Machine$double.eps ^ 0.5)
data |
vector, matrix, or data frame. |
statistic |
a function, or expression (e.g. |
R |
number of replicates (bootstrap samples or permutation resamples). |
treatment |
a vector with two unique values.
For two-sample applications, suppy either |
data2 |
an object like |
alternative |
one of |
ratio |
logical, if |
resampleColumns |
integer, or character (a subset of the column names of |
args.stat |
a list of additional arguments to pass to |
paired |
logical, if |
seed |
old value of .Random.seed, or argument to set.seed. |
sampler |
a function for resampling, see |
label |
used for labeling plots (in a future version). |
statisticNames |
a character vector the same length as the vector returned by
|
block.size |
integer. The |
trace |
logical, if |
tolerance |
when computing P-values, differences smaller than |
There is considerable flexibility in how you specify the data and statistic.
For the statistic
, you may supply a function, or an expression.
For example, if data = x
, you may specify any of
statistic = mean
statistic = mean(x)
statistic = mean(data)
If data
is a data frame, the expression may refer to columns in
the data frame, e.g.
statistic = mean(x)
statistic = mean(myData$x)
statistic = mean(myData[, "x"])
If data
is not just the name of an object, e.g.
data = subset(myData, age > 17)
, or if data2
is supplied, then use the name 'data', e.g.
statistic = colMeans(data)
a list with class
"bootstrap"
, "bootstrap2"
,
"permutationTest"
,
or "permutationTest2"
,
that inherits from "resample"
,
with components:
observed |
the value of the statistic for the original data. |
replicates |
a matrix with |
n |
number of observations in the original data, or vector of length 2 in two-sample problems. |
p |
|
R |
number of replications. |
seed |
the value of the seed at the start of sampling. |
call |
the matched call. |
statistics |
a data frame with |
The two-sample versions have an additional component:
resultsBoth |
containing resampling results from each data set.
containing two components,
the results from resampling each of the two samples. These are
|
There are functions for printing and plotting these objects,
in particular print
, hist
, qqnorm
,
plot
(currently the same as hist
),
quantile
.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
resample-package
,
samp.bootstrap
,
CI.percentile
,
CI.t
.
# See full set of examples in resample-package, including different # ways to call the functions depending on the structure of the data. data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean) bootC hist(bootC) qqnorm(bootC)
# See full set of examples in resample-package, including different # ways to call the functions depending on the structure of the data. data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean) bootC hist(bootC) qqnorm(bootC)
Call cat, with sep=""
and/or newline at end.
cat0(...) cat0n(...) catn(...)
cat0(...) cat0n(...) catn(...)
... |
R objects, like for |
cat0
and cat0n
call cat
with sep = ""
.
catn
and cat0n
print a final newline).
None (invisible NULL
).
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
cat("Print this") # That printed without a final newline. catn("Print this") cat0n("10,", "000")
cat("Print this") # That printed without a final newline. catn("Print this") cat0n("10,", "000")
Bootstrap confidence intervals - percentile method or t interval.
CI.percentile(x, confidence = 0.95, expand = TRUE, ..., probs = sort(1 + c(-1, 1) * confidence) / 2) CI.t(x, confidence = 0.95, expand = TRUE, probs = sort(1 + c(-1, 1) * confidence) / 2) CI.bca(x, confidence = 0.95, expand = TRUE, L = NULL, probs = sort(1 + c(-1, 1) * confidence) / 2) CI.bootstrapT(x, confidence = 0.95, probs = sort(1 + c(-1, 1) * confidence) / 2)
CI.percentile(x, confidence = 0.95, expand = TRUE, ..., probs = sort(1 + c(-1, 1) * confidence) / 2) CI.t(x, confidence = 0.95, expand = TRUE, probs = sort(1 + c(-1, 1) * confidence) / 2) CI.bca(x, confidence = 0.95, expand = TRUE, L = NULL, probs = sort(1 + c(-1, 1) * confidence) / 2) CI.bootstrapT(x, confidence = 0.95, probs = sort(1 + c(-1, 1) * confidence) / 2)
x |
|
confidence |
confidence level, between 0 and 1. The default 0.95 gives a 95% two-sided interval. |
expand |
logical, if |
... |
additional arguments to pass to |
probs |
probability values, between 0 and 1. |
L |
vector of length |
CI.bootstrapT
assumes the first dimension of the statistic
is an estimate, and the second is proportional to a SE for the
estimate. E.g. for bootstrapping the mean, they could be the mean and s.
This is subject to change.
CI.bca
and CI.bootstrapT
currently only support
a single sample.
a matrix with one column for each value in probs
and one row
for each statistic.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
This discusses the expanded percentile interval: Hesterberg, Tim (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, https://arxiv.org/abs/1411.5279.
bootstrap
,
bootstrap2
,
ExpandProbs
(for the expanded intervals).
# See full set of examples in resample-package, including different # ways to call all four functions depending on the structure of the data. data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean, seed = 0) bootC2 <- bootstrap(CLEC, c(mean = mean(CLEC), sd = sd(CLEC)), seed = 0) CI.percentile(bootC) CI.t(bootC) CI.bca(bootC) CI.bootstrapT(bootC2)
# See full set of examples in resample-package, including different # ways to call all four functions depending on the structure of the data. data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean, seed = 0) bootC2 <- bootstrap(CLEC, c(mean = mean(CLEC), sd = sd(CLEC)), seed = 0) CI.percentile(bootC) CI.t(bootC) CI.bca(bootC) CI.bootstrapT(bootC2)
Quick and dirty function for column variances and standard deviations.
colVars(x, na.rm = FALSE) colStdevs(x, ...)
colVars(x, na.rm = FALSE) colStdevs(x, ...)
x |
data frame, matrix, or vector. These versions do not support higher-dimensional arrays. |
na.rm |
logical. Should missing values (including |
... |
other arguments passed to |
A numeric or complex array of suitable size, or a vector if the result is
one-dimensional. The dimnames
(or names
for a vector
result) are taken from the original array.
There are better versions of these functions in the aggregate package
https://www.timhesterberg.net/r-packages.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
x <- matrix(rnorm(12), 4) colVars(x) colStdevs(x)
x <- matrix(rnorm(12), 4) colVars(x) colStdevs(x)
Deprecated functions
... |
arguments to pass to the replacement functions. |
limits.percentile
, limits.t
and limits.bootstrapT
have been renamed "CI.*".
See the replacement functions.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
CI.percentile
,
CI.t
,
CI.bootstrapT
.
Compute modified quantiles levels, for more accurate confidence intervals. Using these levels gives sider intervals, with closer to desired coverage.
ExpandProbs(probs, n)
ExpandProbs(probs, n)
probs |
vector of numerical values between 0 and 1. |
n |
number of observations. |
Bootstrap percentile confidence interval for a sample mean correspond roughly to
instead of
where
is like s but computed using a divisor of n instead of n-1. Similarly for other statistics, the bootstrap percentile interval is too narrow, typically by roughly the same proportion.
This function finds modified probability levels probs2, such that
z_probs2 sqrt((n-1)/n) = t_probs,n-1 so that for symmetric data, the bootstrap percentile interval approximately matches the usual $t$ confidence interval.
A vector like probs
, but with values closer to 0 and 1.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
This discusses the expanded percentile interval: Hesterberg, Tim (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, https://arxiv.org/abs/1411.5279.
probs <- c(0.025, 0.975) n <- c(5, 10, 20, 40, 100, 200, 1000) outer(probs, n, ExpandProbs)
probs <- c(0.025, 0.975) n <- c(5, 10, 20, 40, 100, 200, 1000) outer(probs, n, ExpandProbs)
This is equivalent to {if(test) yes else no}
.
The advantages of using this function are better formatting, and a more
natural syntax when the result is being assigned; see examples below.
With 5 arguments, this is equivalent to
{if(test1) yes else if(test2) u else v}
(where arguments are given by name, not position).
IfElse(test, yes, no, ...)
IfElse(test, yes, no, ...)
test |
logical value; if |
yes |
any object; this is returned if |
no |
normally any object; this is returned if |
... |
there should be 3, 5, 7, etc. arguments to this function; arguments 1, 3, 5, etc. should be logical values; the other arguments (even numbered, and last) are objects that may be returned. |
test
should be a scalar logical, and only one of yes
or
no
is evaluated, depending on whether test = TRUE
or
test = FALSE
, and yes
and no
may be any objects.
In contrast, for
ifelse
, test is normally a vector, both yes
and no
are evaluated, even if
not used, and yes
and no
are vectors the same length as
test
.
with three arguments, one of yes
or no
.
With k arguments, one of arguments 2, 4, ..., k-1, k.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
IfElse(TRUE, "cat", "dog") IfElse(FALSE, "one", TRUE, "two", "three") IfElse(FALSE, "one", FALSE, "two", "three")
IfElse(TRUE, "cat", "dog") IfElse(FALSE, "one", TRUE, "two", "three") IfElse(FALSE, "one", FALSE, "two", "three")
Basic resampling. Supply the data and statistic to resample.
jackknife(data, statistic, args.stat = NULL, label = NULL, statisticNames = NULL, trace = FALSE)
jackknife(data, statistic, args.stat = NULL, label = NULL, statisticNames = NULL, trace = FALSE)
data |
vector, matrix, or data frame. |
statistic |
a function, or expression (e.g. |
args.stat |
a list of additional arguments to pass to |
label |
used for labeling plots (in a future version). |
statisticNames |
a character vector the same length as the vector returned by
|
trace |
logical, if |
a list with class "jackknife"
that inherits from "resample"
,
with components:
observed |
the value of the statistic for the original data. |
replicates |
a matrix with |
n |
number of observations in the original data, or vector of length 2 in two-sample problems. |
p |
|
R |
number of replications. |
seed |
the value of the seed at the start of sampling. |
call |
the matched call. |
statistics |
a data frame with |
There are functions for printing and plotting these objects,
in particular print
, plot
, hist
, qqnorm
,
quantile
.
The current version only handles a single sample.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
# See full set of examples in resample-package data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) jackknife(CLEC, mean)
# See full set of examples in resample-package data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) jackknife(CLEC, mean)
Methods for common generic functions. The methods operate primarily on the replicates (resampled statistics).
## S3 method for class 'resample' print(x, ...) ## S3 method for class 'resample' hist(x, ..., resampleColumns = 1:x$p, xlim = NULL, xlab = NULL, main = "", col = "blue", border = 0, breaks = "FD", showObserved = TRUE, legend = TRUE, args.legend = NULL) ## S3 method for class 'resample' plot(x, ...) ## S3 method for class 'resample' qqnorm(y, ..., resampleColumns = 1:y$p, ylab = NULL, pch = if(y$R < 100) 1 else ".") ## S3 method for class 'resample' quantile(x, ...)
## S3 method for class 'resample' print(x, ...) ## S3 method for class 'resample' hist(x, ..., resampleColumns = 1:x$p, xlim = NULL, xlab = NULL, main = "", col = "blue", border = 0, breaks = "FD", showObserved = TRUE, legend = TRUE, args.legend = NULL) ## S3 method for class 'resample' plot(x, ...) ## S3 method for class 'resample' qqnorm(y, ..., resampleColumns = 1:y$p, ylab = NULL, pch = if(y$R < 100) 1 else ".") ## S3 method for class 'resample' quantile(x, ...)
x , y
|
a |
... |
additional arguments passed to the corresponding generic function. |
resampleColumns |
integer subscripts, or names of statistics. When a statistic is a vector, resampleColumns may be used to select which resampling distributions to plot. |
xlim |
limits for the x axis. |
xlab , ylab
|
x and y axis labels. |
main |
main title |
col |
color used to fill bars, see |
border |
color of the order around the bars, see |
breaks |
method for computing breaks, see |
showObserved |
logical, if |
legend |
logical, if |
args.legend |
|
pch |
plotting character, see |
hist.resample
displays a histogram overlaid with a density
plot, with the observed value of the statistic indicated.
plot.resample
currently just calls hist.resample
.
For quantile.resample
, a matrix with one row for each
statistic and one column for each value in probs
.
This uses type=6
when calling
quantile
, for wider (more accurate) quantiles than
the usual default.
The other functions are not called for their return values.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
resample-package
,
bootstrap
,
bootstrap2
,
jackknife
,
permutationTest
,permutationTest2
,
quantile
.
# See full set of examples in resample-package data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean, seed = 0) print(bootC) hist(bootC) qqnorm(bootC) quantile(bootC, probs = c(.25, .975)) # That is the percentile interval with expand = FALSE CI.percentile(bootC)
# See full set of examples in resample-package data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean, seed = 0) print(bootC) hist(bootC) qqnorm(bootC) quantile(bootC, probs = c(.25, .975)) # That is the percentile interval with expand = FALSE CI.percentile(bootC)
Front end to quantile, using type = 6 (appropriate for resampling)
Quantile(x, ..., type = 6)
Quantile(x, ..., type = 6)
x |
|
... |
Other arguments passed to |
type |
With |
This is a front end to quantile
.
A vector or matrix of quantiles.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
quantile(1:9, .2) Quantile(1:9, .2)
quantile(1:9, .2) Quantile(1:9, .2)
This function is called by bootstrap
and other
resampling functions to actually perform resampling, but may also be
called directly.
resample(data, resampleFun, sampler, R = 10000, seed = NULL, statisticNames = NULL, block.size = 100, trace = FALSE, ..., observedIndices = 1:n, call = match.call())
resample(data, resampleFun, sampler, R = 10000, seed = NULL, statisticNames = NULL, block.size = 100, trace = FALSE, ..., observedIndices = 1:n, call = match.call())
data |
vector, matrix, or data frame. |
resampleFun |
a function with argument |
sampler |
a function like |
R |
number of resamples. |
seed |
old value of .Random.seed, or argument to set.seed. |
statisticNames |
a character vector the same length as the vector returned by
|
block.size |
integer. The |
trace |
logical, if |
... |
addition arguments passed to |
observedIndices |
integer vector of indices, used for calculating the observed value.
When this is called by |
call |
typically the call to |
This is called by
bootstrap
,
bootstrap2
,
permutationTest
, and
permutationTest2
to actually perform resampling.
The results are passed back to the calling function, which may
add additional components and a class, which inherits from
"resample"
.
This may also be called directly. In contrast to the other functions,
where you have flexibility in how you specify the statistic, here
resampleFun
must be a function.
an object of class "resample"
; this is
a list with components:
observed |
the observed statistic, length |
replicates |
a matrix with |
n |
number of observations |
p |
the length of the statistic returned by |
R |
number of resamples. |
seed |
the value of |
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
bootstrap
,
bootstrap2
,
permutationTest
,
permutationTest2
,
samp.bootstrap
,
samp.permute
.
For an overview of all functions in the package, see
resample-package
.
# See full set of examples in resample-package, including different # ways to call all the functions depending on the structure of the data. data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean, seed = 0) bootC
# See full set of examples in resample-package, including different # ways to call all the functions depending on the structure of the data. data(Verizon) CLEC <- with(Verizon, Time[Group == "CLEC"]) bootC <- bootstrap(CLEC, mean, seed = 0) bootC
Data sets for use in examples.
TV
has measurements of minutes of commercials per half-hour, for
"Basic" and "Extended" (extra-cost) cable TV stations.
Verizon
has repair times, with two groups, CLEC and ILEC,
customers of the "Competitive" and "Incumbent" local exchange carrior.
TV 10 observations: Time,Cable Verizon 1687 observations: Time,Group
The TV and Verizon datasets are used in What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum
Hesterberg, Tim (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, https://arxiv.org/abs/1411.5279.
See resample-package
for an overview of resampling functions.
data(TV); summary(TV) Basic <- with(TV, Time[Cable == "Basic"]) Extended <- with(TV, Time[Cable == "Extended"]) data(Verizon); summary(Verizon) ILEC <- with(Verizon, Time[Group == "ILEC"]) CLEC <- with(Verizon, Time[Group == "CLEC"])
data(TV); summary(TV) Basic <- with(TV, Time[Cable == "Basic"]) Extended <- with(TV, Time[Cable == "Extended"]) data(Verizon); summary(Verizon) ILEC <- with(Verizon, Time[Group == "ILEC"]) CLEC <- with(Verizon, Time[Group == "CLEC"])
Generate indices for resampling.
samp.bootstrap(n, R, size = n - reduceSize, reduceSize = 0) samp.permute(n, R, size = n - reduceSize, reduceSize = 0, groupSizes = NULL, returnGroup = NULL)
samp.bootstrap(n, R, size = n - reduceSize, reduceSize = 0) samp.permute(n, R, size = n - reduceSize, reduceSize = 0, groupSizes = NULL, returnGroup = NULL)
n |
sample size. For two-sample permutation tests, this is the sum of the two sample sizes. |
R |
number of vectors of indices to produce. |
size |
size of samples to produce. For example, to do "what-if" analyses, to estimate the variability of a statistic had the data been a different size, you may specify the size. |
reduceSize |
integer; if specified, then |
groupSizes |
|
returnGroup |
|
To obtain disjoint samples without replacement,
call this function multiple times, after setting the same random
number seed, with the same groupSizes
but different values of
returnGroup
. This is used for two-sample permutation tests.
If groupSizes
is supplied then size
is ignored.
matrix with size
rows and R
columns
(or groupSizes(returnGroup)
rows).
Each column contains indices for one bootstrap sample, or one permutation.
The value passed as R
to this function is typically the
block.size
argument to bootstrap
and other
resampling functions.
Tim Hesterberg [email protected],
https://www.timhesterberg.net/bootstrap-and-resampling
This discusses reduced sample size: Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930, https://drive.google.com/file/d/1eUo2nDIrd8J_yuh_uoZBaZ-2XCl_5pT7.
samp.bootstrap(7, 8) samp.bootstrap(7, 8, size = 6) samp.bootstrap(7, 8, reduceSize = 1) # Full permutations set.seed(0) samp.permute(7, 8) # Disjoint samples without replacement = subsets of permutations set.seed(0) samp.permute(7, 8, groupSizes = c(2, 5), returnGroup = 1) set.seed(0) samp.permute(7, 8, groupSizes = c(2, 5), returnGroup = 2)
samp.bootstrap(7, 8) samp.bootstrap(7, 8, size = 6) samp.bootstrap(7, 8, reduceSize = 1) # Full permutations set.seed(0) samp.permute(7, 8) # Disjoint samples without replacement = subsets of permutations set.seed(0) samp.permute(7, 8, groupSizes = c(2, 5), returnGroup = 1) set.seed(0) samp.permute(7, 8, groupSizes = c(2, 5), returnGroup = 2)