Researchers have collected data on the hours of television watched in a day and the age of a person. You are given the data below.

Hours of Television | 1 | 3 | 4 | 3 | 6 |
---|---|---|---|---|---|

Age | 45 | 30 | 22 | 25 | 5 |

a. Determine which variable is the dependent variable.

b. Compute the least squares estimated line.

c. Is there a significant relationship between the two variables? Use a .05 level of significance. Be sure to state the null and alternative hypotheses.

d. Compute the coefficient of determination. How would you interpret this value?

#### Solution

Let $x$ denote age and $y$ denote hours of television watched in a day.

a. Here $y$ : hours of television watched is a dependent variable.

The general form of simple linear regression is

` $$ \begin{equation*} \hat{y}=b_0+b_1 x \end{equation*} $$ `

where $b_0$ is the $y$-intercept and $b_1$ is the slope.

$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|

1 | 45 | 1 | 2025 | 1 | 45 |

2 | 30 | 3 | 900 | 9 | 90 |

3 | 22 | 4 | 484 | 16 | 88 |

4 | 25 | 3 | 625 | 9 | 75 |

5 | 5 | 6 | 25 | 36 | 30 |

Total | 127 | 17 | 4059 | 71 | 328 |

b. The slope $b_1$ is given by

` $$ \begin{eqnarray*} b_1 & = & \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = & \frac{5*328-(127)(17)}{5*(4059)-(127)^2}\\ &=& \frac{-519}{4166}\\ &=& -0.1246. \end{eqnarray*} $$ `

The estimate of intercept is

` $$ \begin{equation*} b_0=\overline{y}-b_1\overline{x}=\bigg(\frac{17}{5}\bigg)--0.125*\bigg(\frac{127}{5}\bigg). \end{equation*} $$ `

The best fitted linear regression equation is

` $$ \begin{equation*} \hat{y} = 6.5643+ (-0.1246)*x \end{equation*} $$ `

$x$ | $y$ | $\hat{y}$ | $(y-\hat{y})^2$ | $(\hat{y}-\overline{y})^2$ | $(y-\overline{y})^2$ | |
---|---|---|---|---|---|---|

1 | 45 | 1 | 0.9582333 | 0.0017445 | 5.9622245 | 5.76 |

2 | 30 | 3 | 2.8269323 | 0.0299524 | 0.3284066 | 0.16 |

3 | 22 | 4 | 3.8235718 | 0.0311269 | 0.1794130 | 0.36 |

4 | 25 | 3 | 3.4498320 | 0.2023488 | 0.0024832 | 0.16 |

5 | 5 | 6 | 5.9414306 | 0.0034304 | 6.4588696 | 6.76 |

Total | 127 | 17 | 0.2686030 | 12.9313970 | 13.2000000 | 127.00 |

Explained variation $ = SSR = \sum (\hat{y}-\overline{y})^2 =12.9314$

Unexplained variation $ = SSE = \sum (y-\hat{y})^2 =0.2686$

Total variation $ = SST = \sum (y-\overline{y})^2 =13.2$

c. Testing the significance relation between two variables:

` $$ \begin{eqnarray*} s^2 = MSE &= & \frac{\sum(y-\hat{y})^2}{n-2}\\ & = &\frac{SSE}{n-2}\\ & = &\frac{0.2686}{3}\\ & = & 0.0895\\ s& = & 0.2992 \end{eqnarray*} $$ `

Standard error of estimate of $b_1$ is

` $$ \begin{eqnarray*} S_{b_1}& = & \frac{s}{\sqrt{\sum(x-\overline{x})^2}}\\ & = &\frac{0.2992}{\sqrt{833.2}}\\ &=& 0.0104 \end{eqnarray*} $$ `

Hypothesis $H_0: b_1 = 0$ against $H_1 : b_1 \neq 0$.

The test statistic is

` $$ \begin{eqnarray*} t & = &\frac{b_1}{s_{b_1}}\\ & = & \frac{-0.1246}{0.0104}\\ &=& -12.0179 \end{eqnarray*} $$ `

The critical value of $t$ at 0.05 level of significance and for $3$ degrees of freedom is 3.1824.

As the the $|t|$ is greater than $t$-critical we reject the null hypothesis. That is there is a sign ificant relationship between age and hours of television watched.

d. Coefficient of determination $=R^2 =\frac{SSR}{SST} = \frac{12.9314}{13.2}=0.9797 $

$97.97$ percent of the variation in dependent variable (hours of watching television) is explained by the independent variable (Age).

#### Further Reading

- Statistics
- Descriptive Statistics
- Probability Theory
- Probability Distribution
- Hypothesis Testing
- Confidence interval
- Sample size determination
- Non-parametric Tests
- Correlation Regression
- Statistics Calculators