Researchers have collected data on the hours of television watched in a day and the age of a person. You are given the data below.
Hours of Television | 1 | 3 | 4 | 3 | 6 |
---|---|---|---|---|---|
Age | 45 | 30 | 22 | 25 | 5 |
a. Determine which variable is the dependent variable.
b. Compute the least squares estimated line.
c. Is there a significant relationship between the two variables? Use a .05 level of significance. Be sure to state the null and alternative hypotheses.
d. Compute the coefficient of determination. How would you interpret this value?
Solution
Let $x$ denote age and $y$ denote hours of television watched in a day.
a. Here $y$ : hours of television watched is a dependent variable.
The general form of simple linear regression is
$$ \begin{equation*} \hat{y}=b_0+b_1 x \end{equation*} $$
where $b_0$ is the $y$-intercept and $b_1$ is the slope.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 45 | 1 | 2025 | 1 | 45 |
2 | 30 | 3 | 900 | 9 | 90 |
3 | 22 | 4 | 484 | 16 | 88 |
4 | 25 | 3 | 625 | 9 | 75 |
5 | 5 | 6 | 25 | 36 | 30 |
Total | 127 | 17 | 4059 | 71 | 328 |
b. The slope $b_1$ is given by
$$ \begin{eqnarray*} b_1 & = & \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = & \frac{5*328-(127)(17)}{5*(4059)-(127)^2}\\ &=& \frac{-519}{4166}\\ &=& -0.1246. \end{eqnarray*} $$
The estimate of intercept is
$$ \begin{equation*} b_0=\overline{y}-b_1\overline{x}=\bigg(\frac{17}{5}\bigg)--0.125*\bigg(\frac{127}{5}\bigg). \end{equation*} $$
The best fitted linear regression equation is
$$ \begin{equation*} \hat{y} = 6.5643+ (-0.1246)*x \end{equation*} $$
$x$ | $y$ | $\hat{y}$ | $(y-\hat{y})^2$ | $(\hat{y}-\overline{y})^2$ | $(y-\overline{y})^2$ | |
---|---|---|---|---|---|---|
1 | 45 | 1 | 0.9582333 | 0.0017445 | 5.9622245 | 5.76 |
2 | 30 | 3 | 2.8269323 | 0.0299524 | 0.3284066 | 0.16 |
3 | 22 | 4 | 3.8235718 | 0.0311269 | 0.1794130 | 0.36 |
4 | 25 | 3 | 3.4498320 | 0.2023488 | 0.0024832 | 0.16 |
5 | 5 | 6 | 5.9414306 | 0.0034304 | 6.4588696 | 6.76 |
Total | 127 | 17 | 0.2686030 | 12.9313970 | 13.2000000 | 127.00 |
Explained variation $ = SSR = \sum (\hat{y}-\overline{y})^2 =12.9314$
Unexplained variation $ = SSE = \sum (y-\hat{y})^2 =0.2686$
Total variation $ = SST = \sum (y-\overline{y})^2 =13.2$
c. Testing the significance relation between two variables:
$$ \begin{eqnarray*} s^2 = MSE &= & \frac{\sum(y-\hat{y})^2}{n-2}\\ & = &\frac{SSE}{n-2}\\ & = &\frac{0.2686}{3}\\ & = & 0.0895\\ s& = & 0.2992 \end{eqnarray*} $$
Standard error of estimate of $b_1$ is
$$ \begin{eqnarray*} S_{b_1}& = & \frac{s}{\sqrt{\sum(x-\overline{x})^2}}\\ & = &\frac{0.2992}{\sqrt{833.2}}\\ &=& 0.0104 \end{eqnarray*} $$
Hypothesis $H_0: b_1 = 0$ against $H_1 : b_1 \neq 0$.
The test statistic is
$$ \begin{eqnarray*} t & = &\frac{b_1}{s_{b_1}}\\ & = & \frac{-0.1246}{0.0104}\\ &=& -12.0179 \end{eqnarray*} $$
The critical value of $t$ at 0.05 level of significance and for $3$ degrees of freedom is 3.1824.
As the the $|t|$ is greater than $t$-critical we reject the null hypothesis. That is there is a sign ificant relationship between age and hours of television watched.
d. Coefficient of determination $=R^2 =\frac{SSR}{SST} = \frac{12.9314}{13.2}=0.9797 $
$97.97$ percent of the variation in dependent variable (hours of watching television) is explained by the independent variable (Age).
Further Reading
- Statistics
- Descriptive Statistics
- Probability Theory
- Probability Distribution
- Hypothesis Testing
- Confidence interval
- Sample size determination
- Non-parametric Tests
- Correlation Regression
- Statistics Calculators