The following table lists midterm and final exam grades for randomly selected students in a statistics course:

Midterm (x) | 82 | 65 | 93 | 70 | 80 |
---|---|---|---|---|---|

Final (y) | 94 | 77 | 94 | 79 | 91 |

a. Plot the scatter diagram for the paired data.

b. Assume a 5% level of significance, use hypothesis testing and the linear correlation coefficient (r) to determine if there is a linear correlation. $\alpha = 0.05$.

c. Find the equation of the regression line and plot it on your scatter diagram using an x of 80 and an x of 93.

d. Determine the residual value for an x of 80and explain its meaning.

#### Solution

a. Scatter diagram

b. Let $x$ denote Midterm grades and $y$ denote Final grades.

$x$ | $y$ | $x^2$ | $y^2$ | $xy$ |
---|---|---|---|---|

82 | 94 | 6724 | 8836 | 7708 |

65 | 77 | 4225 | 5929 | 5005 |

93 | 94 | 8649 | 8836 | 8742 |

70 | 79 | 4900 | 6241 | 5530 |

80 | 91 | 6400 | 8281 | 7280 |

390 | 435 | 30898 | 38123 | 34265 |

The correlation coefficient $r$ is given by

` $$ \begin{aligned} r & = \frac{n \sum xy - (\sum x)(\sum y)}{\sqrt{\big(n(\sum x^2) -(\sum x)^2\big)\times \big(n(\sum y^2) -(\sum y)^2\big)}}\\ & = \frac{5*34265-(390)(435)}{\sqrt{\big(5*(30898)-(390)^2\big)\times\big(5*(38123)-(435)^2\big)}}\\ &= 0.919. \end{aligned} $$ `

Given that `$n = 5$`

, `$r=0.919$`

, `$\alpha =0.05$`

.

**State the hypothesis testing problem**

The hypothesis testing problem is `$H_0: \rho = 0$`

against `$H_a: \rho \neq 0$`

.

**Define the test statistic**

The test statistic for testing above hypothesis is

` $$ \begin{aligned} t&=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\\ &=\frac{0.919\sqrt{5 -2}}{\sqrt{1-0.919^2}}\\ &=4.037 \end{aligned} $$ `

The test statistic $t$ follows Students' $t$ distribution with $n-2=5-2 =3$ degrees of freedom.

The level of significance is $\alpha = 0.05$.

**Determine the critical values**

For the specified value of $\alpha$ determine the critical region.

` $$ \begin{aligned} P(t < t_{1-\alpha/2,n-2} \text{ or } t > t_{\alpha/2,n-2}) = \alpha. \end{aligned} $$ `

The critical values are `$t_{\alpha/2,n-2}=-3.182$`

and `$t_{1-\alpha/2,n-2}=3.182$`

.

**Decision**

As the observed value of test statistic $t$ falls inside the critical region, we reject the null hypothesis.

We conclude that there is a linear correlation between Midterm and Final grades.

c. The slope $b_1$ is given by

` $$ \begin{aligned} b_1 & = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = \frac{5*34265-(390)(435)}{5*(30898)-(390)^2}\\ &= \frac{1675}{2390}\\ &= 0.7008. \end{aligned} $$ `

The estimate of intercept is

` $$ b_0=\overline{y}-b_1\overline{x}=\bigg(\frac{435}{5}\bigg)-0.7008*\bigg(\frac{390}{5}\bigg). $$ `

The best fitted linear regression equation is

` $$ \hat{y} = 32.3347+ (0.7008)*x $$ `

d. Estimated value for an $x$ of 80 is

` $$ \hat{y}_{x=80} = 32.3347+ (0.7008)*80 = 88.402 $$ `

The residual for an $x=80$ is

` $y-\hat{y}_{x=80} = 90-88.402 =2.598$`

.

The estimate of true error for Midterm score of 80 is `$2.598$`

.

#### Further Reading

- Statistics
- Descriptive Statistics
- Probability Theory
- Probability Distribution
- Hypothesis Testing
- Confidence interval
- Sample size determination
- Non-parametric Tests
- Correlation Regression
- Statistics Calculators