Confidence intervals for regression coefficients

# Confidence intervals for regression coefficients
### Dr. D’Agostino McGowan

---

<div class="my-footer">
<span>
Dr. Lucy D'Agostino McGowan
</span>
</div>

---

# confidence intervals

If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( `$\beta_1$` ) to fall within the interval estimates 95% of the time.

---

## Confidence interval for `$\hat\beta$`

---

## Confidence interval for `$\hat\beta$`

* `$t^*$` is the critical value for the `$t_{n−p-1}$` density curve to obtain the desired confidence level
--

* Often we want a **95% confidence level**.

---

## Let's do it in R

```r
X <- matrix(c(1, 1, 1, 1, 1,
              2, 3, 5, 1, 3), ncol = 2)
y <- c(1, 2, 5, 4, 2)
```

---

## Let's do it in R

```r
(beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y)
```

```
##           [,1]
## [1,] 1.5909091
## [2,] 0.4318182
```

---

## Let's do it in R

```r
e <- y - X %*% beta_hat
var_e <- sum(e^2) / (5 - 2)
var_beta <-  var_e * solve(t(X) %*% X)
(se_beta <- sqrt(diag(var_beta)))
```

```
## [1] 1.8249873 0.5890121
```

---

## Let's do it in R

```r
(t_star <- qt(0.975, 5 - 2))
```

```
## [1] 3.182446
```

* We want a 2-sided p-value!

---

## t-distribution

![](09-ci-beta_files/figure-html/unnamed-chunk-6-1.png)

---

## t-distribution

![](09-ci-beta_files/figure-html/unnamed-chunk-7-1.png)

---

## t-distribution

![](09-ci-beta_files/figure-html/unnamed-chunk-8-1.png)

---

## Let's do it in R

```r
(t_star <- qt(0.975, 5 - 2))
```

```
## [1] 3.182446
```

```r
(lb <- beta_hat - t_star * se_beta)
```

```
##           [,1]
## [1,] -4.217015
## [2,] -1.442681
```

```r
(ub <- beta_hat + t_star * se_beta)
```

```
##          [,1]
## [1,] 7.398833
## [2,] 2.306318
```
]

* 95% CI `$\hat\beta_0$`: (-4.22, 7.40)
* 95% CI `$\hat\beta_1$`: (-1.44, 2.30)

---

## Let's do it in R

```r
lm_fit <- lm(y ~ X[,2])
confint(lm_fit)
```

```
##                 2.5 %   97.5 %
## (Intercept) -4.217015 7.398833
## X[, 2]      -1.442681 2.306318
```

---

## Confidence Regions

You can also calculate a _confidence region_ for more than one parameter.

`$$(\hat\beta-\beta)^T\mathbf{X}^T\mathbf{X}(\hat\beta-\beta)\leq (p+1)\hat\sigma^2F^{\alpha}_{p+1, n-(p+1)}$$`

* This will form an _ellipsoid_ region

---

## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Application Exercise`

y | x_1 | x_2
---|----|----
1 | 3 | 2
3 | 3 | 6
5 | 1 | 8
6 | 7 | 9
1 | 1 | 2

* Fit a model predicting `y` from `X`
* Calculate the confidence intervals "by hand"
* Check with the `confint` function
* Interpret these intervals
* Calculate the _joint_ confidence interval for `$\beta_1$` and `$\beta_2$`