A company is providing hardened steel struts for the aircraft industry and you have been asked to investigate the relationship between hardening temperature (X) amd the percentage of carbon in steel (Y). The following results have been obtained:

Carbon (%) Hardening Temp $^oC$
0.35 890
0.45 880
0.55 860
0.70 830
0.85 810
1.00 790
1.15 770

a. Determine the equation of the regression line of percentage of carbon on hardening temperature, assuming a linear relationship.
c. Determine the product moment correlation.

Solution

Let $x$ denote the Hardening Temp and $y$ denote the percentage carbon in steel.

The scatter diagram is

scatterplot
scatterplot
$x$ $y$ $x^2$ $y^2$ $xy$
1 0.35 890 0.1225 792100 311.5
2 0.45 880 0.2025 774400 396.0
3 0.55 860 0.3025 739600 473.0
4 0.70 830 0.4900 688900 581.0
5 0.85 810 0.7225 656100 688.5
6 1.00 790 1.0000 624100 790.0
7 1.15 770 1.3225 592900 885.5
Total 5.05 5830 4.1625 4868100 4125.5

a. Let the simple linear regression model of $Y$ on $X$ is

$$y=\beta_0 + \beta_1x +e$$

By the method of least square, the estimates of $\beta_1$ and $\beta_0$ are respectively

$$ \begin{aligned} \hat{\beta}_1 & = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2} \end{aligned} $$

and

$$ \begin{aligned} \hat{\beta}_0&=\overline{y}-\hat{\beta}_1\overline{x} \end{aligned} $$

The sample mean of $x$ is

$$ \begin{aligned} \overline{x}&=\frac{1}{n} \sum_{i=1}^n x_i\\ &=\frac{5.05}{7}\\ &=0.7214 \end{aligned} $$

The sample mean of $y$ is

$$ \begin{aligned} \overline{y}&=\frac{1}{n} \sum_{i=1}^n y_i\\ &=\frac{5830}{7}\\ &=832.8571 \end{aligned} $$

The estimate of $\beta_1$ is given by

$$ \begin{aligned} b_1 & = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}\\ & = \frac{7*4125.5-(5.05)(5830)}{7*(4.1625)-(5.05)^2}\\ &= \frac{-563}{3.635}\\ &= -154.8831. \end{aligned} $$

The estimate of intercept is

$$ \begin{aligned} b_0&=\overline{y}-b_1\overline{x}\\ &=832.8571-(-154.883)*0.7214\\ &=944.5898. \end{aligned} $$

The best fitted simple linear regression model to predict percentage carbon in steel from Hardening Temp is

$$ \begin{aligned} \hat{y} &= 944.5898+ (-154.8831)*x \end{aligned} $$

b. Correlation coefficient:

The sample variance of $X$ is

$$ \begin{aligned} s_{x}^2 &=\frac{1}{n-1}\sum_{i=1}^{n}(x_i -\overline{x})^2\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n x_i^2 - \frac{(\sum_{i=1}^n x_i)^2}{n}\bigg)\\ &= \frac{1}{7 -1}\big(4.1625-\frac{5.05^2}{7}\big)\\ &= 0.0865. \end{aligned} $$

The sample variance of $Y$ is

$$ \begin{aligned} s_{y}^2 &=\frac{1}{n-1}\sum_{i=1}^{n}(y_i -\overline{y})^2\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n y_i^2 - \frac{(\sum_{i=1}^n y_i)^2}{n}\bigg)\\ &= \frac{1}{7 -1}\big(4868100-\frac{5830^2}{7}\big)\\ &= 2090.4762. \end{aligned} $$

The covariance between $X$ and $Y$ is

$$ \begin{aligned} s_{xy}&=\frac{1}{n-1}\sum_{i=1}^{n}(x_i -\overline{x})(y_i-\overline{y})\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n x_iy_i - \frac{(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n}\bigg)\\ &=\frac{1}{7-1}\big(4125.5 - \frac{5.05\times 5830}{7} \big)\\ &=-13.4048. \end{aligned} $$

The product moment correlation coefficient is

$$ \begin{eqnarray*} r &=& \frac{Cov(X,Y)}{\sqrt{V(x)*V(Y)}} \\ &=&\frac{s_{xy}}{\sqrt{s_x^2\times s_y^2}}\\ &=& \frac{-13.4048}{\sqrt{0.0865* 2090.4762}}\\ & = & -0.9968. \end{eqnarray*} $$

There is a strong negative relation between Hardening Temp and percentage carbon in steel.

Further Reading