| Title: | Data Sets for "Mathematical Statistics with Resampling and R" (3rd Ed) |
|---|---|
| Description: | Data sets for Chihara and Hesterberg (2022, ISBN: 978-1-119-87404-1) "Mathematical Statistics with Resampling in R" (3rd Ed). |
| Authors: | Laura Chihara [aut], Tim Hesterberg [aut, cre] |
| Maintainer: | Tim Hesterberg <[email protected]> |
| License: | CC0 |
| Version: | 1.0 |
| Built: | 2026-05-18 07:28:52 UTC |
| Source: | https://github.com/cran/resampledata3 |
Data sets for Chihara and Hesterberg (2022, ISBN: 978-1-119-87404-1) "Mathematical Statistics with Resampling in R" (3rd Ed). https://github.com/lchihara/MathStatsResamplingR
# For a list of datasets do: library(help = resampledata3)# For a list of datasets do: library(help = resampledata3)
Alcohol content and calories for a sample of ale and lager beers.
AlelagerAlelager
A data frame with 31 observations on the following 4 variables.
IDSubject ID
TypeBeer: ale or lager
AlcoholPercentage alcohol content
CaloriesNumber of calories
Levels of arsenic, chlorine and cobalt in a sample of 271 wells in Bangladesh.
BangladeshBangladesh
A data frame with 271 observations on the following 3 variables.
ArsenicArsenic level, ppb
ChlorineChlorine level, ppb
CobaltCobalt level, ppb
https://www2.bgs.ac.uk/groundwater/health/arsenic/Bangladesh/data.html
Reproduced with the permission of the British Geological Survey, copyright UKRI. All Rights Reserved.
Beer and hotwings consumption by a sample of patrons at a Minneapolis bar.
BeerwingsBeerwings
A data frame with 30 observations on the following 4 variables.
IDSubject ID
HotwingsNumber of hotwings consumed
BeerOunces of beers consumed
GenderGender of patron (M/F)
Data collected by Nicole Catchpole in 2004 (private communication).
Price of textbooks at a college bookstore.
BookPricesBookPrices
A data frame with 44 observations on the following 3 variables.
SubjectBiology Chemistry Computer Science Economics Educational Studies Geology Mathematics Physics Political Science Psychology SOAN
AreaClassification of subject as either Math & Science or Social Sciences
PricePrice in U.S.~dollars
Data collected by R.~Hien and S.~Becker in 2010 (private communication).
Fish supply (kg) and demand for bushmeat in Ghana.
BushmeatBushmeat
A data frame with 30 observations on the following 4 variables.
FishFish supply (in kg.) per capita
BiomassBiomass
YearYear
ChangePercent change in biomass
Biomass of large mammals was calculated for each year by multiplying the number of animals observed in 700 walking counts of 10 to 15 km each by species-specific body weights. The products of these calculations were then summed across all species.
Brashares, Arces, Sam, Coppolillo, Sinclaire, Balmford, Bushmeat hunting, wildlife declines, and fish supply in West Africa, Science. 2004 Nov 12.
Nutritional data on meals served in a college cafeteria.
CafeteriaCafeteria
A data frame with 41 observations on the following 9 variables.
IDa numeric vector
Typetype of meal, Meat or Vegetarian
Caloriesnumber of calories
Carbohydratesnumber of carbohydrates
Fiberfiber content
Fatfat content
Cholesterolcholesterol
Proteinprotein
Sodiumsodium
Stephenson (private communication).
Nutritional data on a sample of cereals.
CerealsCereals
A data frame with 43 observations on the following 5 variables.
IDa numeric vector
Agetarget consumer, adult or children
Shelflocation of cereal, bottom, middle, or top shelf
Sodiumgramsodium content in grams
Proteingramprotein content in grams
Data on O-rings in 23 space shuttle flights prior to the Challenger shuttle disaster of January 1986.
ChallengerChallenger
A data frame with 23 observations on the following 3 variables.
DateData of launch
TemperatureAir temperature at launch (F)
IncidentBinary variable, 1 if one of the 0-rings on one of the booster rockets was damaged, 0 otherwise
https://archive.ics.uci.edu/ml/datasets/Challenger+USA+Space+Shuttle+O-Ring
Dala, S.~R., Fowlkes, E.~B., Hoadley, B (1989). Risk analysis of the space shuttle: pre-Challenger prediction of failure. J.~American Statistical Association, 84, 945-957.
Times from a sample of men who completed the Chicago marathon in 2015.
data("ChiMarathonMen")data("ChiMarathonMen")
A data frame with 80 observations on the following 4 variables.
nameName of competitor
DivisionAge group
FinishFinish time
FinishMinTime in minutes
https://chicago-history.r.mikatiming.com/
Female cuckoos lay their eggs on the ground and then move them to the nests of other birds. Latter gathered data on the lengths of the cuckoo eggs found in these foster nests.
data("Cuckoos")data("Cuckoos")
A data frame with 120 observations on the following 2 variables.
EggsLengths of eggs (mm) of cuckoos
BirdSpecies of birds: HedgeSparrow, MeadowPipit, PiedWagtail, Robin, TreePipit, Wren
Tippett, L. H. C. (1952). The Methods of Statistics, 4th Edition. Wiley.
Latter, O. (1902). An enquiry into the dimensions of the Cuckoo's egg and the relation of the variations to the size of eggs of the foster-parent, with notes on coloration. Biometrika 1 (2): 164-176.
Scores of 12 female divers (10 m platform) in the 2017 FINA World Championships.
data("Diving2017")data("Diving2017")
A data frame with 12 observations on the following 4 variables.
NameName of competitor
CountryCountry
SemifinalScore in the semi-finals
FinalScore in the finals.
Competitors perform 5 dives in each round and the sum of these 5 dives determines who moves on to the next round.
https://www.fina.org/competitions/213/17th-fina-world-championships-2017/results?disciplines=DV
Measurements of eyes of 40 people.
data("Eyes")data("Eyes")
A data frame with 40 observations on the following 6 variables.
IDSubject ID
ageAge of subject
handDominant hand of subject, left or right
eyeDominant eye of subject, left or right
leftPDLeft pupillary distance (mm)
rightPDRight pupillary distance (mm)
Westfield (private communication).
A random sample of driver fatalities in 2009 in Pennsylvania.
FatalitiesFatalities
A data frame with 100 observations on the following 3 variables.
IDSubject ID
AlcoholAlcohol involved? 1 = yes, 0 = no
AgeAge
The drivers were driving a car, SUV, or light pickup truck (vehicles such as motor homes, convertibles, or commercial vehicles are excluded).
http://www.nhtsa.gov/FARS
Mercury levels (ppm) in a sample of fish caught in Minnesota
FishMercuryFishMercury
A data frame with 30 observations on the following variable.
MercuryMercury level in ppm
Minnesota pollution control agency.
Length of delays for flights on American Airlines and United Airlines in 2009
data("FlightDelays")data("FlightDelays")
A data frame with 4029 observations on the following 10 variables.
IDSubject ID
CarrierAirline: American Airlines AA or United Airlines UA
FlightNoFlight number
DestinationDestination: BNA, DEN, DFW, IAD, MIA, ORD, STL
DepartTimeDeparture time: 4-8am 4-8pm 8-Mid 8-Noon Noon-4pm
DayDay of week
MonthMonth: May or June
FlightLengthLength of flight
DelayDelay time (in minutes)
Delayed30Delayed more than 30 minutes? No or Yes
All departures of AA or UA flights from LaGuardia Airport in May or June of 2009.
https://www.bts.gov/topics/airlines-and-airports/quick-links-popular-air-carrier-statistics
Data on births of a random sample of girls in Alaska or Wyoming in 2004.
data("Girls2004")data("Girls2004")
A data frame with 80 observations on the following 6 variables.
IDSubject ID
StateState: AK or WY
MothersAgeAge of mother: 15-19, 20-24, 25-29, 30-34, 35-39, 40-44
SmokerMother a smoker? No or Yes
WeightWeight of baby (grams)
GestationGestation time (weeks)
http://wonder.cdc.gov/natality-current.html
Prices of a sample of grocery items at Target or Walmart.
GroceriesGroceries
A data frame with 30 observations on the following 4 variables.
ProductGrocery item
SizePackage size
TargetPrice at Target
WalmartPrice at Walmart
General Social Survey data from 2018
GSS2018GSS2018
A data frame with 2348 observations on the following 17 variables.
IDSubject ID
RegionMidwest, Northeast, South, West
GenderNowGender of subject: A gender not listed here, Man, Not applicable, Transgender, Woman
AgeAge
MaritalMarital status: Divorced, Married, Never married, Separated, Widowed
DegreeEducation: Bachelor Graduate, High school Junior college, Less than high school
EmployedEmployed? No or Yes
IncomeIncome level
PolviewsPolitical views:Conservative, Extremely liberal, Extremely conservative, Liberal, Moderate, Slightly conservative, Slightly liberal
Pres16Voted for whom in presidential election of 2016? Clinton, Other, Trump
DeathPenaltyOpinion on death penalty: Favor, Oppose
CourtsHow courts deal with criminals: About right, Dont know, Not harsh enough, Too harsh
AttendAttendance at religious services: Monthly, Never, Occasionally, Weekly
PostlifeBelieve in life after death? Dont know, No, Yes
HappyGeneral happiness level: Not too happy, Pretty happy, Very happy
SatfinSatisfaction with financial situation: More or less, Not at all, Satisfied
EnergyGovernment spending on developing alternative energy sources: About right, Dont know, Too little, Too much
https://gss.norc.org
Nutritional information on a sample of ice cream.
data("IceCream")data("IceCream")
A data frame with 39 observations on the following 7 variables.
BrandBrand of ice cream
VanillaCaloriesCalories in vanilla
VanillaFatFat (gm) in vanilla ice cream
VanillaSugarSugar (gm) in vanilla ice cream
ChocolateCaloriesCalories in chocolate ice cream
ChocolateFatFat (gm) in chocolate ice cream
ChocolateSugarSugar (gm) in chocolate ice cream
Birth weight of boys born in Illinois.
ILBoysILBoys
A data frame with 241 observations on the following 2 variables.
MothersAgeAge range of mother: 15-19, 20-24, 25-29
WeightWeight of baby (gm)
Random sample of boys born to mothers in Illinois in 2004. Births are restricted to single births only and gestation lengths of at least 37 weeks.
Data on female illiteracy in a sample of countries where illiteracy is more than 5%.
IlliteracyIlliteracy
A data frame with 94 observations on the following 4 variables.
IDCountry ID
CountryName of country
IllitPercentage of women over 15 years old who are illiterate (2003)
BirthsNumber of births per woman in that country (2005)
www.unesco.org, www.data.worldbank.org
Winning lottery numbers for Fantasy 5 in California.
LotteryLottery
A data frame with 500 observations on the following variable.
WinNumber
In Fantasy 5, a lottery game in California, a player tries to match 5 numbers chosen from 1 through 39. This data are the winning numbers for the daily games from 5 May 2010 through 15 August 2010.
http://www.calottery.com/play/draw-games/fantasy-5
Data from a study on math anxiety in a sample of primary and secondary school students in Italy
MathAnxietyMathAnxiety
A data frame with 599 observations on the following 6 variables.
AgeAge
GenderGender: Boy, Girl
GradeGrade: Secondary, Primary
AMASScore on Abbreviated Math Anxiety Scale
RCMASScore on Revised Abbreviated Math Anxiety Scale
ArithScore on arithmetic test
Hill, Mammarella, Devine, et al (2016). Maths anxiety in primary and secondary school students: gender differences, developmental changes and anxiety specificity. Learning and Individual Differences 48: 45-53
Average CO2 levels (ppm) for the month of May from 1990 to 2010.
MaunaloaMaunaloa
A data frame with 21 observations on the following 3 variables.
IDSubject ID
YearYear
LevelCarbon dioxide level (ppm)
https://www.esrl.noaa.gov/gmd/ccgg/trends
Measurements on water quality in wells in Minnesota.
MnGroundwaterMnGroundwater
A data frame with 895 observations on the following 10 variables.
CountyMinnesota county
Aquifer.GroupType of aquifer: buried Quaternary, Cambrian, Cretaceous, Devonian, Ordovician, Precambrian, surficial Quaternary
Water.LevelWater level
AlkalinityAlkalinity
AluminumAluminum
ArsenicArsenic
ChlorideChloride
Leadlead
pHpH level
Basin.NameBasin name
Minnesota Pollution Control Agency
Google experiment on effectiveness of certain recommendations for bidding on ads.
MobileAdsMobileAds
A data frame with 655 observations on the following 40 variables.
Campaigna numeric vector
m.impr_posta numeric vector
m.impr_prea numeric vector
m.click_posta numeric vector
m.click_prea numeric vector
m.cost_posta numeric vector
m.cost_prea numeric vector
m.conv_posta numeric vector
m.conv_prea numeric vector
m.value_posta numeric vector
m.value_prea numeric vector
m.cpm_prea numeric vector
m.cpm_posta numeric vector
m.cpc_prea numeric vector
m.cpc_posta numeric vector
m.cpa_prea numeric vector
m.cpa_posta numeric vector
m.cpr_prea numeric vector
m.cpr_posta numeric vector
mult.changea numeric vector
d.impr_posta numeric vector
d.impr_prea numeric vector
d.click_posta numeric vector
d.click_prea numeric vector
d.cost_posta numeric vector
d.cost_prea numeric vector
d.conv_posta numeric vector
d.conv_prea numeric vector
d.value_posta numeric vector
d.value_prea numeric vector
d.cpm_prea numeric vector
d.cpm_posta numeric vector
d.cpc_prea numeric vector
d.cpc_posta numeric vector
d.cpa_prea numeric vector
d.cpa_posta numeric vector
d.cpr_prea numeric vector
d.cpr_posta numeric vector
error.cpr_prea numeric vector
error.cpr_posta numeric vector
Subset of experimental data for one advertiser. See Chihara and Hesterberg textbook for more information.
Ed Lee (Google)
Chihara and Hesterberg, Mathematical Statistics with Resampling and R (2022). Wiley.
Opening and closing stock prices for a random sample of 50 stock funds on NASDAQ on 1 December 2017.
NasdaqNasdaq
A data frame with 50 observations on the following 4 variables.
SymbolStock symbol
OpenOpening price
CloseClosing price
VolumeNumber of shares traded
https://finance.yahoo.com
Basketball statistics for a sample of NBA players from 4 teams for the 2016-2017 season.
data("NBA1617")data("NBA1617")
A data frame with 68 observations on the following 13 variables.
NamePlayer name
PositionPosition: C (center), PF (power forward), PG (point guard), SF (small forward),
SG (shooting guard)
TeamTeam: Brooklyn, Charlotte, Cleveland, San Antonio
GamesNumber of games played
MinutesNumber of minutes plyaed
PercFGField goal percentage
Perc3P3-point field goal percentage
Perc2P2-point field goal percentage
PercFTFree throw percentage
OffRebOffensive rebounds
DefRebDefensive rebounds
AssistsAssists
BlocksBlocks
Players in this data set played a minimum of 100 minutes during the 2016-2017 season.
https://www.basketball-reference.com/
Birth weights of babies born in North Carolina in 2004
NCBirths2004NCBirths2004
A data frame with 1009 observations on the following 7 variables.
IDSubject ID
MothersAgeMother's age level
SmokerMother a smoker? codeNo, Yes
AlcoholMother consumed alcohol during pregnancy? No, Yes
GenderBaby's gender
WeightBaby's weight (gm)
GestationGestation length (weeks)
Babies in this random sample had a gestation period of at least 37 weeks and were single births (that is, not one of a twin or triplet).
http://wonder.cdc.gov/natality-current.html
Chihara and Hesterberg, Mathematical Statistics with Resampling and R, 2022 (Wiley).
Data on a sample of athletes competing in the 2012 London Olympics.
Olympics2012Olympics2012
A data frame with 42 observations on the following 7 variables.
NameName of athlete
CountryCountry
AgeAge
SexSex: F, M
HeightHeight (inches)
Weightweight (lb)
SportSport
Age and gender of Academy Award winners
OscarsOscars
A data frame with 188 observations on the following 6 variables.
YearYear of award
ActorName of actor
MovieMovie
GenderGender: Man, Woman
BirthyearBirth year of actor
AgeAge at time of award
https://www.oscars.org/
Baseball data for Philadelphia Phillies during the 2009 season.
Phillies2009Phillies2009
A data frame with 162 observations on the following 8 variables.
DateDate of game
LocationGame played where: Away, Home
OutcomeOutcome of game: Lose, Win
Outcome2Outcome recoded: 1=win, 0 = lose
HitsNumber of hits
DoublesNumber of doubles
HomerunsNumber of homeruns
StrikeOutsNumber of strikeouts
https://www.baseball-reference.com/
Time between earthquakes for all earthquakes of magnitude 6 or greater (1970-2009).
data("Quakes")data("Quakes")
A data frame with 805 observations on the following 2 variables.
IDSubject ID
TimeDiffTime (days)
http://earthquakes.usgs.gov/earthquakes/eqarchives
Heights of nests and snags for the quetzal (bird).
QuetzalQuetzal
A data frame with 21 observations on the following 3 variables.
CountryCountry: Costa Rica, Guatemala
NestHeight of nest (meters)
SnagHeight of snag (meters)
The quetzal typically nests in abandoned woodpecker nests in dead tree trunks (snags).
Siegfried, D., Linville, D., Hille, D. (2010). Analysis of nest sites and the resplendent quetzal (pharomachrus mocinno): relationship between nest and snag heights. Wilson Journal of Ornithology 122: 608-11.
Data on baseball players (excluding pitchers) who played for the Texas Rangers or Minnesota Twins.
data("RangersTwins2016")data("RangersTwins2016")
A data frame with 27 observations on the following 17 variables.
NameName of player
TeamTeam: Rangers, Twins
PosPlayer's position
AgeAge in years
GamesNumber of games played
AtBatsNumber of at bats
RunsRuns
HitsHits
DoublesDoubles
TriplesTriples
HRHomeruns
RBIRuns batted in
SBStolen bases
CSCaught stealing
BBBase on balls
SOStrike outs
BABatting average
Data on baseball players (excluding pitchers) who played for the Texas Rangers or Minnesota Twins. These players played at least 50 games. During the 2016 season, the Rangers had the best winning percentage (0.586) in the American League while the Twins had the worst (0.364)
www.baseball-reference.com
Recidivism data from Iowa.
RecidivismRecidivism
A data frame with 17022 observations on the following 7 variables.
GenderGender: F, M
AgeAge group: 25-34, 35-44, 45-54, 55 and Older, Under 25
Age25Over or Under 25 years of age? Over 25, Under 25
OffenseType of offense: Felony Misdemeanor
RecidRecidivated? No, Yes
TypeReason: New (new crime), No Recidivism (did not recidivate),
Tech (technical violation, such as a parole violation)
DaysNumber of days to recidivism; NA if no recidivism
All offenders convicted of either a misdemeanor or felony who were released from an Iowa prison during the 2010 fiscal year ending in June.
https://data.iowa.gov/Public-Safety/3-Year-Recidivism-for-Offenders-Released-from-Pris/mw8r-vqy4
Salaries of a random sample of baseball players from 1985 and 2015.
SalariesSalaries
A data frame with 70 observations on the following 3 variables.
LeagueLeague: American National
SalarySalary (in millions) in 2015 dollars
YearYear: 1985 or 2015
Time to be served at a college snack bar.
ServiceService
A data frame with 174 observations on the following 2 variables.
IDSubject ID
TimesTime in minutes
Haynor, Lojovich, Syed (private communication, 2010).
Measurement of testosterone levels in males in a skateboard experiment.
SkateboardSkateboard
A data frame with 71 observations on the following 3 variables.
AgeAge in years
ExperimenterTreatment (gender of experimenter): Female, Male
TestosteroneTestosterone level
Results from an experiment where male skateboarders performed tricks in front of either a female or male.
Ronay and Hippel (2010). The presence of an attractive woman elevates testosterone and physical risk taking in young men. Social Psychological and Personality Science 1:57-64.
Short and free skate scores for male figure skaters in the 2010 Winter Olympics (Vancouver).
Skating2010Skating2010
A data frame with 24 observations on the following 5 variables.
CountryCountry of skater
NameName
ShortShort program score
FreeFree skate score
TotalTotal
https://skatingscores.com/0910/oly/
Measurements from an experiment on the growth of black spruce seedlings.
SpruceSpruce
A data frame with 72 observations on the following 9 variables.
TreeSubject ID
CompetitionTreatment: C (competition), NC (no competition)
FertilizerTreatment: F (fertilizer), NF (no fertilizer)
Height0Height of seedling at start
Height5Height of seedling after 5 years
Diameter0Diameter of seedling at start
Diameter5Diameter of seedling after 5 years
Ht.changeChange in height
Di.changeChange in diameter
Experiment on growth of black spruce seedlings under treatments of fertilizer-no fertilizer, competition- no competition (weeding).
Camill, Chihara, Adams, et al (2010). Early life history transitions and recruitment of Picea mariana in thawed boreal permafrost peatlands. Ecology 2:448-459.
Number of wins by a sample of Korean players in Starcraft, a strategy video game.
StarcraftStarcraft
A data frame with 45 observations on the following 4 variables.
IDSubject ID
RaceChosen race of player: Protoss, Terran, Zerg
AgeAge of player
WinsNumber of wins
Evans, private communication. http://www.teamliquid.net/tipd/players
Subset of Titanic data
TitanicTitanic
A data frame with 658 observations on the following 3 variables.
IDSubject ID
SurvivedSurvival status: 1 = survived, 0 = died
AgeAge of passenger
Subset of passenger data on the Titanic.
https://data.world/nrippner/titanic-disaster-dataset
Average daily wind speeds (2010) from Carleton College turbine.
data("Turbine")data("Turbine")
A data frame with 168 observations on the following 4 variables.
Date2010Date
AveKWAverage kilowatts
AveSpeedAverage speed (m/s)
ProductionEnergy output (kilowatt hours)
Carleton College, Northfield MN.
Chihara and Hesterberg (2022). Mathematical Statistics with Resampling and R. (Wiley)
Lengths of television commercials on basic and extended cable TV channel.s
data("TV")data("TV")
A data frame with 20 observations on the following 3 variables.
IDSubject ID
TimesTime (min)
CableCable: Basic, Extended
Lengths of TV commercials during any given half-hour time period.
Rodgers, Robinson (private communication).
Weights of babies born in Texas in 2004.
TXBirths2004TXBirths2004
A data frame with 1587 observations on the following 8 variables.
IDSubject ID
MothersAgeMother's age: 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, under 15
SmokerMother smokes? No, Yes
GenderGender of baby: Female, Male
WeightWeight of baby (g)
GestationGestation length (weeks)
NumberBaby a single birth (1), twin (2), etc.
MultiplePart of multiple birth (eg twin, triple)?: No, Yes
Random sample of babies born in Texas in 2004.
http://wonder.cdc.gov/natality-current.html
Repair times by Verizon for its customers or customers of other telephone companies.
VerizonVerizon
A data frame with 1687 observations on the following 2 variables.
TimeRepair time (h)
GroupCustomer: CLEC (competing local exchange carrier),
ILEC (incumbent local exchange carrier)
Verizon is responsible for providing repair service to both its customers (ILEC) and its competitors (ILEC).
Chihara and Hesterberg (2022). Mathematical Statistics with Resamplng and R (Wiley).
Data on a sample of Division I women volleyball teams.
Volleyball2009Volleyball2009
A data frame with 30 observations on the following 4 variables.
TeamTeam
HitPercentHitting percentage
AsstsAssists
KillsKills
http://www.ncaa.org/championships/statistics/womens-volleyball-statistics
Lengths and weights of a sample of walleye caught in Minnesota lakes (1990's).
WalleyeWalleye
A data frame with 60 observations on the following 2 variables.
LengthLength (inches)
WeightWeight (pounds)
Monson, Minnesota Pollution Control Agency (private communication)
Relationship between the depth of the watertable and survival status of black spruce seedings.
WatertableWatertable
A data frame with 360 observations on the following 2 variables.
DepthDepth of watertable (cm)
AliveStatus of seedling: 1 = alive, 0 = dead
Part of the data from an experiment to study factors associated with the growth of black spruce seedlings under various treatments. Status of seedling at the end of the second year of the experiment is noted here.
Camill, Chihara, Adams, et al (2010). Early life history transitions and recruitment of Picae mariana in thawed boreal permafrost peatlands. Ecology 2:448-459.
Chihara and Hesterberg (2022). Mathematical Statistics with Resampling and R (Wiley).