Unusual Observations

class: center, middle, inverse, title-slide

# Unusual Observations
### Dr. D’Agostino McGowan

---

layout: true

<div class="my-footer">
<span>
Dr. Lucy D'Agostino McGowan
</span>
</div>

---

## Leverage

* _leverage_ is the amount of influence an observation has on the estimation of `\(\hat\beta\)`
--

* Mathematically, we can define this as the _diagonal elements of the hat matrix_.
--

.question[
What is the hat matrix?
]
--

* `\(\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\)`

---

## Leverage

* _leverage_ is the amount of influence an observation has on the estimation of `\(\hat\beta\)`
* Mathematically, we can define this as the _diagonal elements of the hat matrix_.

`$$h_i = H_{ii}\\
h_i = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T_{ii}$$`
---

## Leverage

.question[
What do we use the diagnonal of the hat matrix?
]
--

* Recall that the variance of the residuals is

`$$\textrm{var}(e_i) = \sigma^2(1 - h_i)$$`

* so _large_ leverage points will pull the fit towards `\(y_i\)`

---

## Leverage

* The _leverage_, `\(h_i\)`, will always be between 0 and 1

.question[
How do we know this? Let's show it using the fact that the hat matrix is _idempotent_ and _symmetric_.
]

--

`$$\begin{align}h_i &= \sum_{j}H_{ij}H_{ji}\\
\end{align}$$`

---

## Leverage

* The _leverage_, `\(h_i\)`, will always be between 0 and 1

.question[
How do we know this? Let's show it using the fact that the hat matrix is _idempotent_ and _symmetric_.
]

`$$\begin{align}h_i &= \sum_{j}H_{ij}H_{ji}\\
&=\sum_{j}H_{ij}^2\end{align}$$`

---

## Leverage

* The _leverage_, `\(h_i\)`, will always be between 0 and 1

.question[
How do we know this? Let's show it using the fact that the hat matrix is _idempotent_ and _symmetric_.
]

`$$\begin{align}h_i &= \sum_{j}H_{ij}H_{ji}\\
&=\sum_{j}H_{ij}^2\\
&=H_{ii}^2+\sum_{j\neq i}H_{ij}^2\end{align}$$`

---

## Leverage

* The _leverage_, `\(h_i\)`, will always be between 0 and 1

.question[
How do we know this? Let's show it using the fact that the hat matrix is _idempotent_ and _symmetric_.
]

`$$\begin{align}h_i &= \sum_{j}H_{ij}H_{ji}\\
&=\sum_{j}H_{ij}^2\\
&=H_{ii}^2+\sum_{j\neq i}H_{ij}^2\\
&=h_i^2 + \sum_{j\neq i}H_{ij}^2\end{align}$$`

---

## Leverage

* The _leverage_, `\(h_i\)`, will always be between 0 and 1

`$$\begin{align}h_i &= \sum_{j}H_{ij}H_{ji}\\
&=\sum_{j}H_{ij}^2\\
&=H_{ii}^2+\sum_{j\neq i}H_{ij}^2\\
&=h_i^2 + \sum_{j\neq i}H_{ij}^2\end{align}$$`

* This means that `\(h_i\)` must be **larger** than `\(h_i^2\)`, implying that `\(h_i\)` will always be between 0 and 1 ✅

---

## Leverage

* The `\(\sum_i h_i = p+1\)` (remember when we calculated the trace of H?) 
--

* This means an _average_ value for `\(h_i\)` is `\((p+1)/n\)`
--

* 👍 A rule of thumb leverages greater than `\(2(p+1)/n\)` should get an extra look

---

## Standardized residuals

* We can use the _leverages_ to standardize the residuals
--

* Instead of plotting the residuals, `\(e\)`, we can plot the _standardized residuals_

`$$r_i = \frac{e}{\hat\sigma\sqrt{1-h_i}}$$`
--

* 👍 A rule of thumb for standardized residuals: those greater than 4 would be very unusual and should get an extra look

---

class: inverse

## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Application Exercise`

y | x
--|--
1 | 0
5 | 4
2 | 2
2 | 1
11 | 10

Using the data above calculate:

* The leverage for each observation. Are any "unusual"?
* The standardized residuals, `\(r_i\)`.

---

## Doing it in R

* It is good to understand how to calculate these standardized residuals _by hand_, but there is an R function that does this for you (`rstandard()`)
* There is also an R function to calculate the _leverage_ (`hatvalues()`)

---

## Standardized residuals

.small[

```r
mod <- lm(mpg ~ disp, data = mtcars)
d <- data.frame(
* standardized_resid = rstandard(mod),
  fit = fitted(mod)
)
ggplot(d, aes(fit, standardized_resid)) +
  geom_point() +geom_hline(yintercept = 0) + 
  labs(y = "Standardized Residual")
```

![](18-unusual-observations_files/figure-html/unnamed-chunk-3-1.png)
]

---

## Outliers

* An _outlier_ is a point that doesn't fit the current model well

--

![](18-unusual-observations_files/figure-html/unnamed-chunk-4-1.png)

---

## Outliers

* An _outlier_ is a point that doesn't fit the current model well

![](18-unusual-observations_files/figure-html/unnamed-chunk-5-1.png)

* This first plot, see a point that is definitely an _outlier_ but it doesn't have much _leverage_ or _influence_ over the fit

---

## Outliers

* An _outlier_ is a point that doesn't fit the current model well

![](18-unusual-observations_files/figure-html/unnamed-chunk-6-1.png)

* This second plot, see a point that has a large _leverage_ but is not an outlier and doesn't have much _influence_ over the fit

---

## Outliers

* An _outlier_ is a point that doesn't fit the current model well

![](18-unusual-observations_files/figure-html/unnamed-chunk-7-1.png)

* In the third plot, the point is both an _outlier_ and very _influential_. Not only is the residual for this point large, but it inflates the residuals for the other points

---

## Outliers

* To detect points like this third example, it can be prudent to _exclude_ the point and recompute the estimates to get `\(\hat\beta_{(i)}\)` and `\(\hat\sigma^2_{(i)}\)`

--

`$$\hat{y}_{(i)} = x_i^T\hat\beta_{(i)}$$`
--

* If `\(\hat{y}_{(i)}- y_i\)` is large, then observation `\(i\)` is an _outlier_.

--

.question[
How do we determine "large"? We need to scale it using the variance!
]

---

class: inverse

## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 576 512"><path d="M402.6 83.2l90.2 90.2c3.8 3.8 3.8 10 0 13.8L274.4 405.6l-92.8 10.3c-12.4 1.4-22.9-9.1-21.5-21.5l10.3-92.8L388.8 83.2c3.8-3.8 10-3.8 13.8 0zm162-22.9l-48.8-48.8c-15.2-15.2-39.9-15.2-55.2 0l-35.4 35.4c-3.8 3.8-3.8 10 0 13.8l90.2 90.2c3.8 3.8 10 3.8 13.8 0l35.4-35.4c15.2-15.3 15.2-40 0-55.2zM384 346.2V448H64V128h229.8c3.2 0 6.2-1.3 8.5-3.5l40-40c7.6-7.6 2.2-20.5-8.5-20.5H48C21.5 64 0 85.5 0 112v352c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V306.2c0-10.7-12.9-16-20.5-8.5l-40 40c-2.2 2.3-3.5 5.3-3.5 8.5z"/></svg> `Application Exercise`

Show that

`$$\hat{\textrm{var}}((\hat{y}-\hat{y}_{(i)})) = \hat{\sigma}^2_{(i)}(1+x_i^T(\mathbf{X}_{(i)}^T\mathbf{X}_{(i)})^{-1}x_i)$$`

y | x
--|--
1 | 0
5 | 4
2 | 2
2 | 1
11 | 10

Using the data above, calculate `\(\hat{\textrm{var}}(\hat{y})_{(i)}\)` for observation 5.

---

## Studentized residuals

`$$t_i = \frac{y_i - \hat{y}_{(i)}}{\hat\sigma_{(i)}(1+x_i^T(\mathbf{X}_{(i)}^T\mathbf{X}_{(i)})^{-1}x_i)^{1/2}}$$`
The have a `\(t\)` distribution with `\((n-1)-(p+1) = n-p-2\)` degrees of freedom if the model is correct and `\(\epsilon ~ N(0, \sigma^2\mathbf{I})\)`.

--

* There is an easier way to compute these using the studentized residuals!

`$$t_i = r_i\left(\frac{n-p-2}{n-p-1-r_i^2}\right)^{1/2}$$`

---

class: inverse

## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Application Exercise`

y | x
--|--
1 | 0
5 | 4
2 | 2
2 | 1
11 | 10

Calculate the studentized residuals for the data above.

---

## Studentized residuals

It is good to understand how to calculate these studentized residuals _by hand_, but there is an R function that does this for you (`rstudent()`)

---

## Influential points

* A common measure to determine _influential_ points is Cook's D

`$$D_i = \frac{(\hat{y}-\hat{y}_{(i)})^T(\hat{y}-\hat{y}_{(i)})}{(p+1)\hat\sigma^2}$$`

--

* `\((\hat{y}-\hat{y}_{(i)})\)` is the _change in the fit_ after leaving observation `\(i\)` out.
--

* This can be calculated using:

`$$\frac{1}{p+1}r_i^2\frac{h_i}{1-h_i}$$`

--

* 👍 A rule of thumb is to give observations with Cook's Distance > `\(4/n\)` an extra look

---

## Cook's Distance

![](18-unusual-observations_files/figure-html/unnamed-chunk-8-1.png)

---

class: inverse

## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Application Exercise`

y | x
--|--
1 | 0
5 | 4
2 | 2
2 | 1
11 | 10

Calculate Cook's Distance for the data above and plot it with the row number on the x-axis and Cook's Distance on the y-axis.

---

## Cook's Distance

It is good to understand how to calculate these _by hand_, but there is an R function that does this for you (`cooks.distance()`)