class: center, middle, inverse, title-slide # Comparing Prediction Models ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- ## Comparison metrics * Adjusted `\(R^2\)` * Mallow's Cp --- ## `\(R^2\)` .question[ How do we estimate `\(R^2\)`? ] `$$\Huge 1 - \frac{RSS}{TSS}$$` -- .question[ How do you interpret this? ] --- ## `\(R^2\)` `$$\Huge 1 - \frac{RSS}{TSS}$$` * Adding a new variable to a model can **only decrease** the RSS -- .question[ What does this mean for `\(R^2\)`? ] --- ## `\(R^2\)` * This means that `\(R^2\)` by itself is not a good criteria for determining model fit, because we'd always just pick the largest model! -- * Enter Adjusted `\(R^2\)`! --- ## Adjusted `\(R^2\)` `$$\Large R^2 = 1 - \frac{RSS/(n-(p+1))}{TSS/(n-1)}$$` -- * Now this will only chose a larger model if it has some predictive value --- ## Mallow's Cp * This estimates the average mean square error of prediction: `$$\Large\frac{1}{\sigma^2}\sum E[(\hat{y}_i-E[y_i])^2]$$` -- * This can be estimated by `$$C_p = \frac{RSS_p}{\hat{\sigma}^2}+ 2(p+1)-n$$` where `\(\hat\sigma^2\)` is from the full model and `\(RSS_p\)` is the RSS from a reduced model. --- class: inverse ## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 576 512"><path d="M402.6 83.2l90.2 90.2c3.8 3.8 3.8 10 0 13.8L274.4 405.6l-92.8 10.3c-12.4 1.4-22.9-9.1-21.5-21.5l10.3-92.8L388.8 83.2c3.8-3.8 10-3.8 13.8 0zm162-22.9l-48.8-48.8c-15.2-15.2-39.9-15.2-55.2 0l-35.4 35.4c-3.8 3.8-3.8 10 0 13.8l90.2 90.2c3.8 3.8 10 3.8 13.8 0l35.4-35.4c15.2-15.3 15.2-40 0-55.2zM384 346.2V448H64V128h229.8c3.2 0 6.2-1.3 8.5-3.5l40-40c7.6-7.6 2.2-20.5-8.5-20.5H48C21.5 64 0 85.5 0 112v352c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V306.2c0-10.7-12.9-16-20.5-8.5l-40 40c-2.2 2.3-3.5 5.3-3.5 8.5z"/></svg> `Application Exercise` Show that Mallow's Cp for a full model (where the full model and the reduced model have the same predictors, p) is equal to p+1