Researchers have collected data on the hours of television watched in a day and the age of a person. You are given the data below.

Hours of Television 1 3 4 3 6
Age 45 30 22 25 5

a. Determine which variable is the dependent variable.
b. Compute the least squares estimated line.
c. Is there a significant relationship between the two variables? Use a .05 level of significance. Be sure to state the null and alternative hypotheses.
d. Compute the coefficient of determination. How would you interpret this value?

Solution

Let $x$ denote age and $y$ denote hours of television watched in a day.

a. Here $y$ : hours of television watched is a dependent variable.

The general form of simple linear regression is

$$ \begin{equation*} \hat{y}=b_0+b_1 x \end{equation*} $$
where $b_0$ is the $y$-intercept and $b_1$ is the slope.

$x$ $y$ $x^2$ $y^2$ $xy$
1 45 1 2025 1 45
2 30 3 900 9 90
3 22 4 484 16 88
4 25 3 625 9 75
5 5 6 25 36 30
Total 127 17 4059 71 328

b. The slope $b_1$ is given by

$$ \begin{eqnarray*} b_1 & = & \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = & \frac{5*328-(127)(17)}{5*(4059)-(127)^2}\\ &=& \frac{-519}{4166}\\ &=& -0.1246. \end{eqnarray*} $$

The estimate of intercept is

$$ \begin{equation*} b_0=\overline{y}-b_1\overline{x}=\bigg(\frac{17}{5}\bigg)--0.125*\bigg(\frac{127}{5}\bigg). \end{equation*} $$

The best fitted linear regression equation is

$$ \begin{equation*} \hat{y} = 6.5643+ (-0.1246)*x \end{equation*} $$

$x$ $y$ $\hat{y}$ $(y-\hat{y})^2$ $(\hat{y}-\overline{y})^2$ $(y-\overline{y})^2$
1 45 1 0.9582333 0.0017445 5.9622245 5.76
2 30 3 2.8269323 0.0299524 0.3284066 0.16
3 22 4 3.8235718 0.0311269 0.1794130 0.36
4 25 3 3.4498320 0.2023488 0.0024832 0.16
5 5 6 5.9414306 0.0034304 6.4588696 6.76
Total 127 17 0.2686030 12.9313970 13.2000000 127.00

Explained variation $ = SSR = \sum (\hat{y}-\overline{y})^2 =12.9314$

Unexplained variation $ = SSE = \sum (y-\hat{y})^2 =0.2686$

Total variation $ = SST = \sum (y-\overline{y})^2 =13.2$

c. Testing the significance relation between two variables:

$$ \begin{eqnarray*} s^2 = MSE &= & \frac{\sum(y-\hat{y})^2}{n-2}\\ & = &\frac{SSE}{n-2}\\ & = &\frac{0.2686}{3}\\ & = & 0.0895\\ s& = & 0.2992 \end{eqnarray*} $$

Standard error of estimate of $b_1$ is

$$ \begin{eqnarray*} S_{b_1}& = & \frac{s}{\sqrt{\sum(x-\overline{x})^2}}\\ & = &\frac{0.2992}{\sqrt{833.2}}\\ &=& 0.0104 \end{eqnarray*} $$

Hypothesis $H_0: b_1 = 0$ against $H_1 : b_1 \neq 0$.

The test statistic is
$$ \begin{eqnarray*} t & = &\frac{b_1}{s_{b_1}}\\ & = & \frac{-0.1246}{0.0104}\\ &=& -12.0179 \end{eqnarray*} $$

The critical value of $t$ at 0.05 level of significance and for $3$ degrees of freedom is 3.1824.

As the the $|t|$ is greater than $t$-critical we reject the null hypothesis. That is there is a sign ificant relationship between age and hours of television watched.

d. Coefficient of determination $=R^2 =\frac{SSR}{SST} = \frac{12.9314}{13.2}=0.9797 $

$97.97$ percent of the variation in dependent variable (hours of watching television) is explained by the independent variable (Age).

Further Reading