The following table lists midterm and final exam grades for randomly selected students in a statistics course:

Midterm (x) 82 65 93 70 80
Final (y) 94 77 94 79 91

a. Plot the scatter diagram for the paired data.
b. Assume a 5% level of significance, use hypothesis testing and the linear correlation coefficient (r) to determine if there is a linear correlation. $\alpha = 0.05$.
c. Find the equation of the regression line and plot it on your scatter diagram using an x of 80 and an x of 93.
d. Determine the residual value for an x of 80and explain its meaning.

Solution

a. Scatter diagram

Scatter plot
Scatter plot

b. Let $x$ denote Midterm grades and $y$ denote Final grades.

$x$ $y$ $x^2$ $y^2$ $xy$
82 94 6724 8836 7708
65 77 4225 5929 5005
93 94 8649 8836 8742
70 79 4900 6241 5530
80 91 6400 8281 7280
390 435 30898 38123 34265

The correlation coefficient $r$ is given by
$$ \begin{aligned} r & = \frac{n \sum xy - (\sum x)(\sum y)}{\sqrt{\big(n(\sum x^2) -(\sum x)^2\big)\times \big(n(\sum y^2) -(\sum y)^2\big)}}\\ & = \frac{5*34265-(390)(435)}{\sqrt{\big(5*(30898)-(390)^2\big)\times\big(5*(38123)-(435)^2\big)}}\\ &= 0.919. \end{aligned} $$

Given that $n = 5$, $r=0.919$, $\alpha =0.05$.

State the hypothesis testing problem

The hypothesis testing problem is $H_0: \rho = 0$ against $H_a: \rho \neq 0$.

Define the test statistic

The test statistic for testing above hypothesis is

$$ \begin{aligned} t&=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\\ &=\frac{0.919\sqrt{5 -2}}{\sqrt{1-0.919^2}}\\ &=4.037 \end{aligned} $$

The test statistic $t$ follows Students' $t$ distribution with $n-2=5-2 =3$ degrees of freedom.

The level of significance is $\alpha = 0.05$.

Determine the critical values
For the specified value of $\alpha$ determine the critical region.

$$ \begin{aligned} P(t < t_{1-\alpha/2,n-2} \text{ or } t > t_{\alpha/2,n-2}) = \alpha. \end{aligned} $$

t-critical
t-critical

The critical values are $t_{\alpha/2,n-2}=-3.182$ and $t_{1-\alpha/2,n-2}=3.182$.

Decision

As the observed value of test statistic $t$ falls inside the critical region, we reject the null hypothesis.

We conclude that there is a linear correlation between Midterm and Final grades.

c. The slope $b_1$ is given by

$$ \begin{aligned} b_1 & = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = \frac{5*34265-(390)(435)}{5*(30898)-(390)^2}\\ &= \frac{1675}{2390}\\ &= 0.7008. \end{aligned} $$

The estimate of intercept is

$$ b_0=\overline{y}-b_1\overline{x}=\bigg(\frac{435}{5}\bigg)-0.7008*\bigg(\frac{390}{5}\bigg). $$
The best fitted linear regression equation is
$$ \hat{y} = 32.3347+ (0.7008)*x $$

Fitted Line
Fitted Line

d. Estimated value for an $x$ of 80 is

$$ \hat{y}_{x=80} = 32.3347+ (0.7008)*80 = 88.402 $$
The residual for an $x=80$ is

$y-\hat{y}_{x=80} = 90-88.402 =2.598$.

The estimate of true error for Midterm score of 80 is $2.598$.

Further Reading