class: center, middle, inverse, title-slide # Hypothesis Testing ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- ## Assumptions * If you **just** want to estimate the parameters `\(\beta\)`, you don't have to make **any** distributional assumptions for `\(\epsilon\)`. -- * This may sound different from what you learned in previous classes! The reason we have distributional assumptions is so that we can create **confidence intervals** and do **hypothesis testing** --- ## Assumptions * Since we are doing **least squares** we assume the errors are **independent** and **identically distributed** with a **mean of 0** and a **variance of `\(\sigma^2\)`**. -- * For hypothesis testing + confidence intervals, let's assume the distribution is normal: `$$\epsilon \sim N(0,\sigma^2\mathbf{I})$$` -- * Since `\(y = \mathbf{X}\beta+\epsilon\)`, we know `\(y\sim N(\mathbf{X}\beta,\sigma^2\mathbf{I})\)` -- * From this it follows that `$$\hat\beta=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^Ty\sim N(\beta,(\mathbf{X}^T\mathbf{X})^{-1}\sigma^2)$$` --- ## Hypothesis testing * Often we want to compare a _small_ model to a _larger_ one -- * _Parsimony_ is great! We want to fit the smallest model that can best explain our outcome. But how do we determine if the small model is good enough? -- * We can compare the RSS for the two models! -- * If `\(RSS_{small}-RSS_{larger}\)` is small, then the fit of the small model is almost as good as the larger one! --- ## Hypothesis testing The test statistic of interest is: `$$\frac{RSS_{small}-RSS_{larger}}{RSS_{larger}}$$` -- .question[ How do we decide what value is "good enough"? ] -- `$$\frac{RSS_{small}-RSS_{larger}}{RSS_{larger}}> \textrm{some constant}$$` -- .question[ What constant should we use? ] -- * Let's create an F-Statistic so we can compare to the F-distribtion! --- ## The F-Statistic `$$F = \frac{RSS_{small} - RSS_{larger} / (df_{small}- df_{larger})}{RSS_{larger}/df_{larger}}$$` -- * If the **small** model has 3 variables and the **larger** model has 5, what are the degrees of freedom? -- * **small**: n - 4 * **larger**: n - 6 --- ## The F-Statistic `$$F = \frac{RSS_{small} - RSS_{larger} / (df_{small}- df_{larger})}{RSS_{larger}/df_{larger}}\sim F_{df_{small}- df_{larger}, df_{larger}}$$` -- We will **reject the null hypothesis** (that small = larger) if `\(F>F^{\alpha}_{df_{small}- df_{larger}, df_{larger}}\)` --- ## The F-Statistic .question[ What if you want to compare your full model to an intercept only model? ] -- `$$H_0: \beta_1 = \beta_2=\dots=\beta_p=0$$` -- .question[ What is RSS for the small model? ] -- * TSS -- `$$F = \frac{(TSS-RSS)/p}{RSS/(n-(p+1))}$$`