+ - 0:00:00
Notes for current slide
Notes for next slide

Prediction Intervals

Dr. D’Agostino McGowan

1 / 13

Predictions

Once we have built a model, \(\hat{\mathbf{y}} = \mathbf{X}\hat\beta\), we can calculate predicted y, \(\hat{\mathbf{y}}_0\) values for a new set of predictors, \(\mathbf{x}_0\).

2 / 13

Predictions

Once we have built a model, \(\hat{\mathbf{y}} = \mathbf{X}\hat\beta\), we can calculate predicted y, \(\hat{\mathbf{y}}_0\) values for a new set of predictors, \(\mathbf{x}_0\).

$$\hat{\mathbf{y}}_0=\mathbf{x}_0^T\hat\beta$$

2 / 13

Predictions

Once we have built a model, \(\hat{\mathbf{y}} = \mathbf{X}\hat\beta\), we can calculate predicted y, \(\hat{\mathbf{y}}_0\) values for a new set of predictors, \(\mathbf{x}_0\).

$$\hat{\mathbf{y}}_0=\mathbf{x}_0^T\hat\beta$$

For example, if we fit a model \(\hat{y} = 1.2+2.5x_1+3x_2\)

And would like to know the predicted value for someone with \(x_1 = 3\) and \(x_2 = 2\), we would calculate

2 / 13

Predictions

Once we have built a model, \(\hat{\mathbf{y}} = \mathbf{X}\hat\beta\), we can calculate predicted y, \(\hat{\mathbf{y}}_0\) values for a new set of predictors, \(\mathbf{x}_0\).

$$\hat{\mathbf{y}}_0=\mathbf{x}_0^T\hat\beta$$

For example, if we fit a model \(\hat{y} = 1.2+2.5x_1+3x_2\)

And would like to know the predicted value for someone with \(x_1 = 3\) and \(x_2 = 2\), we would calculate

$$\hat{\mathbf{y}}_0 = \begin{bmatrix}1&3&2\end{bmatrix} \begin{bmatrix}1.2\\2.5\\3\end{bmatrix}$$

2 / 13

Predictions

Once we have built a model, \(\hat{\mathbf{y}} = \mathbf{X}\hat\beta\), we can calculate predicted y, \(\hat{\mathbf{y}}_0\) values for a new set of predictors, \(\mathbf{x}_0\).

$$\hat{\mathbf{y}}_0=\mathbf{x}_0^T\hat\beta$$

For example, if we fit a model \(\hat{y} = 1.2+2.5x_1+3x_2\)

And would like to know the predicted value for someone with \(x_1 = 3\) and \(x_2 = 2\), we would calculate

$$\hat{\mathbf{y}}_0 = \begin{bmatrix}1&3&2\end{bmatrix} \begin{bmatrix}1.2\\2.5\\3\end{bmatrix}$$ $$\hat{\mathbf{y}}_0 = 14.7$$

2 / 13

Application Exercise

We are interested in predicting a chicken's weight based on their diet using the chickwts dataset

  • Fit the model of interest and extract the estimated \(\beta\) coefficients
  • Construct \(\mathbf{x}_0\) for a chicken that is eating "sunflower".
  • Find the predicted weight for a chicken eating sunflowers.
3 / 13

Predictions

There are ✌️ kinds of predictions that can be made from regression models

4 / 13

Predictions

There are ✌️ kinds of predictions that can be made from regression models

  • A predicted mean response
4 / 13

Predictions

There are ✌️ kinds of predictions that can be made from regression models

  • A predicted mean response
  • A prediction of a future observation
4 / 13

Predictions

There are ✌️ kinds of predictions that can be made from regression models

  • A predicted mean response
  • A prediction of a future observation

This matters for estimating the uncertainty

4 / 13

Example

  • What would a chicken who eats sunflowers weigh on average?
5 / 13

Example

  • What would a chicken who eats sunflowers weigh on average?
  • Suppose you want to feed your chicken sunflowers, what will your chicken's predicted weight be?
5 / 13

Example

  • What would a chicken who eats sunflowers weigh on average?
  • Suppose you want to feed your chicken sunflowers, what will your chicken's predicted weight be?

What is the difference?

5 / 13

Example

  • What would a chicken who eats sunflowers weigh on average?
  • Suppose you want to feed your chicken sunflowers, what will your chicken's predicted weight be?

What is the difference?

  • one is a prediction for an average one is for an individual
5 / 13

Prediction of the mean response

Example: What would a chicken who eats sunflowers weigh on average?

The prediction is \(\mathbf{x}_0^T\beta\), estimated by \(\mathbf{x}_0^T\hat\beta\).

6 / 13

Prediction of the mean response

Example: What would a chicken who eats sunflowers weigh on average?

The prediction is \(\mathbf{x}_0^T\beta\), estimated by \(\mathbf{x}_0^T\hat\beta\).

What is the variance of this prediction?

6 / 13

Application Exercise

Show that the variance of \(\mathbf{x}_0^T\hat\beta\) is

$$\textrm{var}(\mathbf{x}_0^T\hat\beta) = \mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0\sigma^2$$

7 / 13

Confidence interval for a mean response

$$\hat{\mathbf{y}_0}\pm t*\hat\sigma\sqrt{\mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0}$$

8 / 13

Prediction of a future value

Example: Suppose you want to feed your chicken sunflowers, what will your chicken's predicted weight be?

The prediction is \(\mathbf{x}_0^T\beta + \epsilon\).

What is the expected value? What is the variance?

9 / 13

Application Exercise

  • What is the expected value of \(\mathbf{x}_0^T\beta + \epsilon\)?
  • Show that the variance is

$$(1 + \mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0)\sigma^2$$

10 / 13

Prediction Intervals

$$\hat{\mathbf{y}}_0\pm t^*\hat\sigma\sqrt{1+\mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0}$$

11 / 13

Prediction Intervals

  • There is an important conceptual difference here
12 / 13

Prediction Intervals

  • There is an important conceptual difference here
  • parameters (like \(\beta_1\), \(\beta_2\), etc) are considered fixed but unknown (they are not random) which is why we interpret confidence intervals like we do
12 / 13

Prediction Intervals

  • There is an important conceptual difference here
  • parameters (like \(\beta_1\), \(\beta_2\), etc) are considered fixed but unknown (they are not random) which is why we interpret confidence intervals like we do
  • A future observation is a random variable. Therefore, we are saying there is a 95% chance that the future value falls within this interval
12 / 13

Prediction Intervals

  • There is an important conceptual difference here
  • parameters (like \(\beta_1\), \(\beta_2\), etc) are considered fixed but unknown (they are not random) which is why we interpret confidence intervals like we do
  • A future observation is a random variable. Therefore, we are saying there is a 95% chance that the future value falls within this interval
  • THIS IS NOT the correct interpretation of a parameter's confidence interval. It is the correct interpretation of a prediction interval
12 / 13

Prediction Intervals

Which is larger?

$$(\mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0)\sigma^2$$

or

$$(1 + \mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0)\sigma^2$$

13 / 13

Prediction Intervals

Which is larger?

$$(\mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0)\sigma^2$$

or

$$(1 + \mathbf{x}_0^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_0)\sigma^2$$

  • prediction intervals tend to be wider than confidence intervals for a mean response
13 / 13

Predictions

Once we have built a model, \(\hat{\mathbf{y}} = \mathbf{X}\hat\beta\), we can calculate predicted y, \(\hat{\mathbf{y}}_0\) values for a new set of predictors, \(\mathbf{x}_0\).

2 / 13
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow