Lab 03 - Multiple Regression Inference

Due: Tuesday 2020-10-13 at 5pm

Getting started

Packages

In this lab we will work with one package: Stat2Data.

If you need to install this package, run the following code once in your Console:

install.packages("Stat2Data")

If you’d like to run your code in the Console as well you’ll also need to load the packages there. To do so, run the following in the console.

library(Stat2Data) 

Note that the packages are also loaded with the same commands in your R Markdown document.

Data

The data frame we will be working with today is called Backpack and it is in the Stat2Data package. To load the data add the following to your .Rmd file:

data("Backpack")

To find out more about the dataset, type the following in your Console: ?Backpack. A question mark before the name of an object will always bring up its help file. This command must be ran in the Console.

Remember: The Console is at the bottom of your RStudio workspace. Things you type in the Console will not be in your final report. This is a good place to peek at data (try typing Backpack in the Console) and look at help files with the ?.

Conceptual Exercises

  1. Describe what a 95% confidence interval means.

  2. Find a formula relating \(R^2\) and the \(F\)-test. Show all steps completed using Latex.

  3. The hypothesis tests we considered in this class so far require the assumption of normal errors. Another way to complete hypothesis testing is via permutation tests, which do not require this assumption. To do this:

Describe the ways in which this method is similar to and different from the bootstrap method we discussed previously.

Backpack Data

  1. Based on the help function, how many rows (n) and how many columns (p) does the Backpack data have? What are the variables included in the data frame? Add your responses to your lab report.

  2. Fit a regression model predicting a student’s backpack weight from their body weight, year in school, whether they are a graduate or undergraduate student, and the number of credits they are taking that quarter. Use an F-test to determine whether collectively these four predictors have a relationship to the response. Interpret this result.

  3. Using the model created in Exercise 5, interpret the coefficient for body weight.

  4. Compute 90% and 95% confidence intervals for the predictor body weight. Interpret these.

  5. Compute 90% and 95% bootstrap confidence intervals (using the percentile method) for the predictor body weight. How do these compare to those calculated in Exercise 7?

  6. Test the null hypothesis that the coefficient for body weight is 0. What is the test statistic? What is the result of this hypothesis test?

  7. Fill in the code below to use a permutation test to test the null hypothesis that the coefficient for body weight is 0. Interpret this result.

model <- ## insert your model here from exercise 5
t <- summary(model)$coef["BodyWeight", 3] ## this will pull out the t-stat for BodyWeight

perm_test <- function() {
  m <- lm(sample(BackpackWeight) ~ ---, data = ---) ## fill in with the same model as exercise 5, except sample from BackpackWeight
  return(c(t = summary(m)$coef["BodyWeight", 3]))
}


perms <- purrr::map_df(1:1000, ~ perm_test())

## plot a histogram of the permutations

ggplot(perms, aes(x = t)) +
  geom_histogram(bins = 30) +
  geom_vline(xintercept = c(t, -t))

## calculate the proportion of permutations that are as extreme or more extreme than the one originally calculated
mean(perms$t >= t | perms$t <= - t)