Suppose that you are interested whether exposure to the organochlorine DDT which has been used extensively as an insecticide for many years, is associated with breast cancer in women. As part of the study that investigated this issue, blood was drawn from a sample of women diagnosed with breast cancer over a six year period and from a sample of healthy controls subject matched to the cancer patients on age, menopausal status, and date of blood donation. Each woman's blood level of DDE-an important by product of DDT in the human body was measured, and the difference in levels for each patient and her matched control calculated. A sample of 171 such differences has mean $\overline{d}=2.7$ ng/ml and standard deviation $s_d =15.9$ ng/ml

a) Test the null hypothesis that the mean blood levels of DDE are identical for women with breast cancer and for healthy control subjects. What do you conclude?

b) Would you expect a 95% confidence interval for the true difference in population mean DDE levels to contain the value 0? Explain.

#### Solution

a) Given that the sample size $n = 171$, $\overline{d}= 2.7$ and $s_d = 15.9$.

**Hypothesis**

The hypothesis testing problem is

$H_0 : \mu_d = 0$ against $H_1 : \mu_d \neq 0$ ($\textit{two-tailed}$)

**Test Statistic**

The test statistic is

` $$ \begin{aligned} t=\frac{\overline{d} -\mu_d}{s_d/\sqrt{n}} \end{aligned} $$ `

**Level of Significance**

The significance level is $\alpha = 0.05$.

**Critical Value(s)**

As the alternative hypothesis is $\textit{two-tailed}$, the critical value of $t$ for $170$ degrees of freedom and $\alpha = 0.05$ level of significance $\text{are}$ $\text{-1.974 and 1.974}$.

The rejection region (i.e. critical region) is $\text{t < -1.974 or t > 1.974}$.

**Computation**

The test statistic for testing above hypothesis testing problem under the null hypothesis is

` $$ \begin{aligned} t&=\frac{\overline{d} -\mu_d}{s_d/\sqrt{n}}\\ &= \frac{2.7-0}{15.9/\sqrt{171}}\\ &= 2.2206 \end{aligned} $$ `

**Decision:Traditional approach**

The test statistic is $t =2.2206$ which falls $\textit{inside}$ the critical region, we $\textit{reject}$ the null hypothesis.

OR

**$p$-value approach**

The test is $\textit{two-tailed}$ test, so p-value is the area to the $\textit{extreme}$ of the test statistic ($t=2.2206$). That is p-value = $2*P(t\geq 2.2206 ) = 0.0277$.

The p-value is $0.0277$ which is $\textit{less than}$ the significance level of $\alpha = 0.05$, we $\textit{reject}$ the null hypothesis.

That is we reject the null hypothesis that the mean difference in blood levels of DDE between women with breast cancer and women who do not have breast cancer is 0.

b) No, we would not expect a 95% confidence interval for the true difference in population mean DDE levels to contain 0, the value of the null hypothesis.

Confidence intervals are analagous to hypothesis tests problem. So because the sample leads us to reject the null hypothesis that the true difference in blood levels of DDE between women with and without breast cancer is 0 at $\alpha = 0.05$, the range of plausible values of the true mean of this difference determined by the 95% confidence interval will not include 0.