Due: Tuesday 2020-10-13 at 5pm
In this lab we will work with one package: Stat2Data
.
If you need to install this package, run the following code once in your Console:
If you’d like to run your code in the Console as well you’ll also need to load the packages there. To do so, run the following in the console.
Note that the packages are also loaded with the same commands in your R Markdown document.
The data frame we will be working with today is called Backpack
and it is in the Stat2Data
package. To load the data add the following to your .Rmd file:
To find out more about the dataset, type the following in your Console: ?Backpack
. A question mark before the name of an object will always bring up its help file. This command must be ran in the Console.
Remember: The Console is at the bottom of your RStudio workspace. Things you type in the Console will not be in your final report. This is a good place to peek at data (try typing Backpack
in the Console) and look at help files with the ?
.
Describe what a 95% confidence interval means.
Find a formula relating \(R^2\) and the \(F\)-test. Show all steps completed using Latex.
The hypothesis tests we considered in this class so far require the assumption of normal errors. Another way to complete hypothesis testing is via permutation tests, which do not require this assumption. To do this:
Describe the ways in which this method is similar to and different from the bootstrap method we discussed previously.
Based on the help function, how many rows (n) and how many columns (p) does the Backpack
data have? What are the variables included in the data frame? Add your responses to your lab report.
Fit a regression model predicting a student’s backpack weight from their body weight, year in school, whether they are a graduate or undergraduate student, and the number of credits they are taking that quarter. Use an F-test to determine whether collectively these four predictors have a relationship to the response. Interpret this result.
Using the model created in Exercise 5, interpret the coefficient for body weight.
Compute 90% and 95% confidence intervals for the predictor body weight. Interpret these.
Compute 90% and 95% bootstrap confidence intervals (using the percentile method) for the predictor body weight. How do these compare to those calculated in Exercise 7?
Test the null hypothesis that the coefficient for body weight is 0. What is the test statistic? What is the result of this hypothesis test?
Fill in the code below to use a permutation test to test the null hypothesis that the coefficient for body weight is 0. Interpret this result.
model <- ## insert your model here from exercise 5
t <- summary(model)$coef["BodyWeight", 3] ## this will pull out the t-stat for BodyWeight
perm_test <- function() {
m <- lm(sample(BackpackWeight) ~ ---, data = ---) ## fill in with the same model as exercise 5, except sample from BackpackWeight
return(c(t = summary(m)$coef["BodyWeight", 3]))
}
perms <- purrr::map_df(1:1000, ~ perm_test())
## plot a histogram of the permutations
ggplot(perms, aes(x = t)) +
geom_histogram(bins = 30) +
geom_vline(xintercept = c(t, -t))
## calculate the proportion of permutations that are as extreme or more extreme than the one originally calculated
mean(perms$t >= t | perms$t <= - t)