class: center, middle, inverse, title-slide # Confidence intervals for regression coefficients ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- class: middle # confidence intervals If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( `\(\beta_1\)` ) to fall within the interval estimates 95% of the time. --- ## Confidence interval for `\(\hat\beta\)` .center[ `\(\Huge \hat\beta_i \pm t^∗ \times SE_{\hat\beta_i}\)` ] --- ## Confidence interval for `\(\hat\beta\)` .center[ `\(\Huge \hat\beta_i \pm t^∗ \times SE_{\hat\beta_i}\)` ] * `\(t^*\)` is the critical value for the `\(t_{n−p-1}\)` density curve to obtain the desired confidence level -- * Often we want a **95% confidence level**. --- ## Let's do it in R ```r X <- matrix(c(1, 1, 1, 1, 1, 2, 3, 5, 1, 3), ncol = 2) y <- c(1, 2, 5, 4, 2) ``` --- ## Let's do it in R ```r (beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y) ``` ``` ## [,1] ## [1,] 1.5909091 ## [2,] 0.4318182 ``` --- ## Let's do it in R ```r e <- y - X %*% beta_hat var_e <- sum(e^2) / (5 - 2) var_beta <- var_e * solve(t(X) %*% X) (se_beta <- sqrt(diag(var_beta))) ``` ``` ## [1] 1.8249873 0.5890121 ``` --- ## Let's do it in R .center[ `\(\Huge \hat\beta_i \pm t^∗ \times SE_{\hat\beta_i}\)` ] ```r (t_star <- qt(0.975, 5 - 2)) ``` ``` ## [1] 3.182446 ``` -- .question[Why 0.975? I thought we want a 95% CI?] -- * We want a 2-sided p-value! --- ## t-distribution ![](09-ci-beta_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## t-distribution ![](09-ci-beta_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- ## t-distribution ![](09-ci-beta_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- ## Let's do it in R .center[ `\(\Huge \hat\beta_i \pm t^∗ \times SE_{\hat\beta_i}\)` ] .small[ ```r (t_star <- qt(0.975, 5 - 2)) ``` ``` ## [1] 3.182446 ``` ```r (lb <- beta_hat - t_star * se_beta) ``` ``` ## [,1] ## [1,] -4.217015 ## [2,] -1.442681 ``` ```r (ub <- beta_hat + t_star * se_beta) ``` ``` ## [,1] ## [1,] 7.398833 ## [2,] 2.306318 ``` ] -- * 95% CI `\(\hat\beta_0\)`: (-4.22, 7.40) * 95% CI `\(\hat\beta_1\)`: (-1.44, 2.30) --- ## Let's do it in R ```r lm_fit <- lm(y ~ X[,2]) confint(lm_fit) ``` ``` ## 2.5 % 97.5 % ## (Intercept) -4.217015 7.398833 ## X[, 2] -1.442681 2.306318 ``` --- ## Confidence Regions You can also calculate a _confidence region_ for more than one parameter. `$$(\hat\beta-\beta)^T\mathbf{X}^T\mathbf{X}(\hat\beta-\beta)\leq (p+1)\hat\sigma^2F^{\alpha}_{p+1, n-(p+1)}$$` -- * This will form an _ellipsoid_ region --- ## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Application Exercise` y | x_1 | x_2 ---|----|---- 1 | 3 | 2 3 | 3 | 6 5 | 1 | 8 6 | 7 | 9 1 | 1 | 2 * Fit a model predicting `y` from `X` * Calculate the confidence intervals "by hand" * Check with the `confint` function * Interpret these intervals * Calculate the _joint_ confidence interval for `\(\beta_1\)` and `\(\beta_2\)`