Confidence Intervals

# Confidence Intervals
### Dr. D’Agostino McGowan

---

<div class="my-footer">
  <span>
  Dr. Lucy D'Agostino McGowan
</span>
</div>

---

# confidence intervals

If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( `$\beta_1$` ) to fall within the interval estimates 95% of the time.

---

## What it is **NOT**

* 95% CI: (5-9)

_We would expect the true population parameter to fall within 5 to 9 95% of the time_

---

## What it is **NOT**

* 95% CI: (5-9)

~~We would expect the true population parameter to fall within 5 - 9 95% of the time~~

---

## What it is **NOT**

* 95% CI: (5-9)

~~We would expect the true population parameter to fall within 5 - 9 95% of the time~~

* The "true population parameter" is **FIXED**!

---

## The true parameter is **FIXED**!

* *when we are talking about confidence intervals, which rely on Frequentist theory. If you take a Bayesian inference class, you will learn about _credible intervals_ which have different assumptions

---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`

* 2!
---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`

* Age ~ Normal(30, 10)
* `$\epsilon$` ~ Normal(0, 10)
* Sample `$n = 100$`

```r
set.seed(7)
n <- 100
sample <- data.frame(
  Age = rnorm(n, 30, 10)
)
sample$Wage <- 2 * sample$Age + rnorm(n, 0, 10)
```

---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`

```r
head(sample)
```

```
##        Age      Wage
## 1 52.87247 110.93553
## 2 18.03228  41.93996
## 3 23.05707  45.32082
## 4 25.87707  40.01053
## 5 20.29327  43.67375
## 6 20.52720  25.01562
```

---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`

```r
n <- 100
sample2 <- data.frame(
  Age = rnorm(n, 30, 10)
)
sample2$Wage <- 2 * sample2$Age + rnorm(n, 0, 10)
```

---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`
.pull-left[

```r
head(sample)
```

```
##        Age      Wage
## 1 52.87247 110.93553
## 2 18.03228  41.93996
## 3 23.05707  45.32082
## 4 25.87707  40.01053
## 5 20.29327  43.67375
## 6 20.52720  25.01562
```

]

```r
head(sample2)
```

```
##        Age      Wage
## 1 50.23344 105.71950
## 2 38.62492  74.49258
## 3 29.75091  60.04891
## 4 36.00635  68.13020
## 5 42.16481  80.15938
## 6 18.23468  24.82762
```

]

---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`
Fit a linear model on the `sample`

|term        |  estimate| std.error|  statistic|   p.value|  conf.low| conf.high|
|:-----------|---------:|---------:|----------:|---------:|---------:|---------:|
|(Intercept) | 0.9950715| 3.2797052|  0.3034027| 0.7622261| -5.513397|  7.503540|
|Age         | 2.0098559| 0.0999765| 20.1032929| 0.0000000|  1.811456|  2.208256|

### 95% CI: 1.81, 2.21

---

## Example

### 95% CI: 1.81, 2.21

---

## Example

### 95% CI: 1.81, 2.21

---

## Example

We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land.

`$$Wage = 2 \times Age + \epsilon$$`
Fit a linear model on the `sample2`

|term        |  estimate| std.error|  statistic|   p.value|  conf.low| conf.high|
|:-----------|---------:|---------:|----------:|---------:|---------:|---------:|
|(Intercept) | -2.819931| 3.0542175| -0.9232909| 0.3581233| -8.880926|  3.241064|
|Age         |  2.078026| 0.0968743| 21.4507408| 0.0000000|  1.885782|  2.270270|

### 95% CI: 1.89, 2.27

---

## Example

|term        |  estimate| std.error|  statistic|   p.value|  conf.low| conf.high|
|:-----------|---------:|---------:|----------:|---------:|---------:|---------:|
|(Intercept) | -2.819931| 3.0542175| -0.9232909| 0.3581233| -8.880926|  3.241064|
|Age         |  2.078026| 0.0968743| 21.4507408| 0.0000000|  1.885782|  2.270270|

### 95% CI: 1.89, 2.27

---

## Example

![](12-ci_files/figure-html/unnamed-chunk-12-1.png)

---

## Example

![](12-ci_files/figure-html/unnamed-chunk-13-1.png)
--

---

## Example

![](12-ci_files/figure-html/unnamed-chunk-14-1.png)
--

## 48 / 50 = 96%

---

## Example

![](12-ci_files/figure-html/unnamed-chunk-15-1.png)

## 95 / 100 = 95%

---

## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Applicaton Exercise`

* Watch me code this up in the next lecture video
* Code along with me, or repeat the same steps to build an understanding about confidence intervals