A company is providing hardened steel struts for the aircraft industry and you have been asked to investigate the relationship between hardening temperature (X) amd the percentage of carbon in steel (Y). The following results have been obtained:

Carbon (%) | Hardening Temp $^oC$ |
---|---|

0.35 | 890 |

0.45 | 880 |

0.55 | 860 |

0.70 | 830 |

0.85 | 810 |

1.00 | 790 |

1.15 | 770 |

a. Determine the equation of the regression line of percentage of carbon on hardening temperature, assuming a linear relationship.

c. Determine the product moment correlation.

#### Solution

Let $x$ denote the Hardening Temp and $y$ denote the percentage carbon in steel.

The scatter diagram is

$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|

1 | 0.35 | 890 | 0.1225 | 792100 | 311.5 |

2 | 0.45 | 880 | 0.2025 | 774400 | 396.0 |

3 | 0.55 | 860 | 0.3025 | 739600 | 473.0 |

4 | 0.70 | 830 | 0.4900 | 688900 | 581.0 |

5 | 0.85 | 810 | 0.7225 | 656100 | 688.5 |

6 | 1.00 | 790 | 1.0000 | 624100 | 790.0 |

7 | 1.15 | 770 | 1.3225 | 592900 | 885.5 |

Total | 5.05 | 5830 | 4.1625 | 4868100 | 4125.5 |

a. Let the simple linear regression model of $Y$ on $X$ is

$$y=\beta_0 + \beta_1x +e$$

By the method of least square, the estimates of $\beta_1$ and $\beta_0$ are respectively

` $$ \begin{aligned} \hat{\beta}_1 & = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2} \end{aligned} $$ `

and

` $$ \begin{aligned} \hat{\beta}_0&=\overline{y}-\hat{\beta}_1\overline{x} \end{aligned} $$ `

The sample mean of $x$ is

` $$ \begin{aligned} \overline{x}&=\frac{1}{n} \sum_{i=1}^n x_i\\ &=\frac{5.05}{7}\\ &=0.7214 \end{aligned} $$ `

The sample mean of $y$ is

` $$ \begin{aligned} \overline{y}&=\frac{1}{n} \sum_{i=1}^n y_i\\ &=\frac{5830}{7}\\ &=832.8571 \end{aligned} $$ `

The estimate of $\beta_1$ is given by

` $$ \begin{aligned} b_1 & = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = \frac{7*4125.5-(5.05)(5830)}{7*(4.1625)-(5.05)^2}\\ &= \frac{-563}{3.635}\\ &= -154.8831. \end{aligned} $$ `

The estimate of intercept is

` $$ \begin{aligned} b_0&=\overline{y}-b_1\overline{x}\\ &=832.8571-(-154.883)*0.7214\\ &=944.5898. \end{aligned} $$ `

The best fitted simple linear regression model to predict percentage carbon in steel from Hardening Temp is

` $$ \begin{aligned} \hat{y} &= 944.5898+ (-154.8831)*x \end{aligned} $$ `

b. Correlation coefficient:

The sample variance of $X$ is

` $$ \begin{aligned} s_{x}^2 &=\frac{1}{n-1}\sum_{i=1}^{n}(x_i -\overline{x})^2\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n x_i^2 - \frac{(\sum_{i=1}^n x_i)^2}{n}\bigg)\\ &= \frac{1}{7 -1}\big(4.1625-\frac{5.05^2}{7}\big)\\ &= 0.0865. \end{aligned} $$ `

The sample variance of $Y$ is

` $$ \begin{aligned} s_{y}^2 &=\frac{1}{n-1}\sum_{i=1}^{n}(y_i -\overline{y})^2\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n y_i^2 - \frac{(\sum_{i=1}^n y_i)^2}{n}\bigg)\\ &= \frac{1}{7 -1}\big(4868100-\frac{5830^2}{7}\big)\\ &= 2090.4762. \end{aligned} $$ `

The covariance between $X$ and $Y$ is

` $$ \begin{aligned} s_{xy}&=\frac{1}{n-1}\sum_{i=1}^{n}(x_i -\overline{x})(y_i-\overline{y})\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n x_iy_i - \frac{(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n}\bigg)\\ &=\frac{1}{7-1}\big(4125.5 - \frac{5.05\times 5830}{7} \big)\\ &=-13.4048. \end{aligned} $$ `

The product moment correlation coefficient is

` $$ \begin{eqnarray*} r &=& \frac{Cov(X,Y)}{\sqrt{V(x)*V(Y)}} \\ &=&\frac{s_{xy}}{\sqrt{s_x^2\times s_y^2}}\\ &=& \frac{-13.4048}{\sqrt{0.0865* 2090.4762}}\\ & = & -0.9968. \end{eqnarray*} $$ `

There is a strong negative relation between Hardening Temp and percentage carbon in steel.

#### Further Reading

- Statistics
- Descriptive Statistics
- Probability Theory
- Probability Distribution
- Hypothesis Testing
- Confidence interval
- Sample size determination
- Non-parametric Tests
- Correlation Regression
- Statistics Calculators