class: center, middle, inverse, title-slide # Confidence Intervals ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- class: middle # confidence intervals If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( `\(\beta_1\)` ) to fall within the interval estimates 95% of the time. --- ## What it is **NOT** * 95% CI: (5-9) _We would expect the true population parameter to fall within 5 to 9 95% of the time_ -- <img src = "https://media.tenor.com/images/b9df0c0c41ff93838e8f837ffc54615a/tenor.gif" width = 400></img> --- ## What it is **NOT** * 95% CI: (5-9) ~~We would expect the true population parameter to fall within 5 - 9 95% of the time~~ <img src = "https://media.tenor.com/images/b9df0c0c41ff93838e8f837ffc54615a/tenor.gif" width = 400></img> --- ## What it is **NOT** * 95% CI: (5-9) ~~We would expect the true population parameter to fall within 5 - 9 95% of the time~~ .question[ Why? ] -- * The "true population parameter" is **FIXED**! --- ## The true parameter is **FIXED**! <img src="https://media2.giphy.com/media/3orieOTNvHHCF41C9y/200w.gif" width = 500></img> -- * *when we are talking about confidence intervals, which rely on Frequentist theory. If you take a Bayesian inference class, you will learn about _credible intervals_ which have different assumptions --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` -- .question[ What is the "true parameter" for the relationship between `Age` and `Wage`? ] -- * 2! --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` * Age ~ Normal(30, 10) * `\(\epsilon\)` ~ Normal(0, 10) * Sample `\(n = 100\)` ```r set.seed(7) n <- 100 sample <- data.frame( Age = rnorm(n, 30, 10) ) sample$Wage <- 2 * sample$Age + rnorm(n, 0, 10) ``` --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` ```r head(sample) ``` ``` ## Age Wage ## 1 52.87247 110.93553 ## 2 18.03228 41.93996 ## 3 23.05707 45.32082 ## 4 25.87707 40.01053 ## 5 20.29327 43.67375 ## 6 20.52720 25.01562 ``` --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` ```r n <- 100 sample2 <- data.frame( Age = rnorm(n, 30, 10) ) sample2$Wage <- 2 * sample2$Age + rnorm(n, 0, 10) ``` --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` .pull-left[ ```r head(sample) ``` ``` ## Age Wage ## 1 52.87247 110.93553 ## 2 18.03228 41.93996 ## 3 23.05707 45.32082 ## 4 25.87707 40.01053 ## 5 20.29327 43.67375 ## 6 20.52720 25.01562 ``` ] .pull-right[ ```r head(sample2) ``` ``` ## Age Wage ## 1 50.23344 105.71950 ## 2 38.62492 74.49258 ## 3 29.75091 60.04891 ## 4 36.00635 68.13020 ## 5 42.16481 80.15938 ## 6 18.23468 24.82762 ``` ] --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` Fit a linear model on the `sample` -- |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:-----------|---------:|---------:|----------:|---------:|---------:|---------:| |(Intercept) | 0.9950715| 3.2797052| 0.3034027| 0.7622261| -5.513397| 7.503540| |Age | 2.0098559| 0.0999765| 20.1032929| 0.0000000| 1.811456| 2.208256| -- ### 95% CI: 1.81, 2.21 --- ## Example .question[ What percent of the time does the "true parameter" fall within this interval? ] |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:-----------|---------:|---------:|----------:|---------:|---------:|---------:| |(Intercept) | 0.9950715| 3.2797052| 0.3034027| 0.7622261| -5.513397| 7.503540| |Age | 2.0098559| 0.0999765| 20.1032929| 0.0000000| 1.811456| 2.208256| ### 95% CI: 1.81, 2.21 --- ## Example .question[ What percent of the time does 2 fall within this interval? ] |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:-----------|---------:|---------:|----------:|---------:|---------:|---------:| |(Intercept) | 0.9950715| 3.2797052| 0.3034027| 0.7622261| -5.513397| 7.503540| |Age | 2.0098559| 0.0999765| 20.1032929| 0.0000000| 1.811456| 2.208256| ### 95% CI: 1.81, 2.21 --- ## Example We are interested in the relationship between **Age** and **Wage**. To demonstrate what a _confidence interval_ is, I am going to construct a "truth" for the relationship in Lucy-land. `$$Wage = 2 \times Age + \epsilon$$` Fit a linear model on the `sample2` -- |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:-----------|---------:|---------:|----------:|---------:|---------:|---------:| |(Intercept) | -2.819931| 3.0542175| -0.9232909| 0.3581233| -8.880926| 3.241064| |Age | 2.078026| 0.0968743| 21.4507408| 0.0000000| 1.885782| 2.270270| -- ### 95% CI: 1.89, 2.27 --- ## Example .question[ What percent of the time does 2 fall within this interval? ] |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:-----------|---------:|---------:|----------:|---------:|---------:|---------:| |(Intercept) | -2.819931| 3.0542175| -0.9232909| 0.3581233| -8.880926| 3.241064| |Age | 2.078026| 0.0968743| 21.4507408| 0.0000000| 1.885782| 2.270270| ### 95% CI: 1.89, 2.27 --- ## Example ![](12-ci_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ## Example ![](12-ci_files/figure-html/unnamed-chunk-13-1.png)<!-- --> -- .question[ What percent of the intervals contain the "true parameter"? ] --- ## Example ![](12-ci_files/figure-html/unnamed-chunk-14-1.png)<!-- --> -- .question[ What percent of the intervals contain the "true parameter" ] -- ## 48 / 50 = 96% --- ## Example ![](12-ci_files/figure-html/unnamed-chunk-15-1.png)<!-- --> ## 95 / 100 = 95% --- class: inverse ## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Applicaton Exercise` * Watch me code this up in the next lecture video * Code along with me, or repeat the same steps to build an understanding about confidence intervals