A keyword search program lists the files that contain a given keyword. If it runs through 200 files, and each file contains the keyword with probability 0.36, independently of other files, compute the probability that

(a) more than 70 files
(b) less than 70 files
(c) exactly 70 files will be listed.

Solution

Let $X$ be the number of files listed and $p$ be the probability that the listed file contains the keyword.

Given that $n =200$ and $p=0.36$. Thus $X\sim B(200, 0.36)$.

Since $n$ is large and $p$ is neither too small nor too large, the distribution of X is approximately Normal with mean = $E(X)=\mu= n*p = 200 \times 0.36 = 72$ and sd = $\sigma= \sqrt{n*p*(1-p)} = \sqrt{200 \times 0.36 \times (1- 0.36)}=6.79$.

(a) The (approximate) probability that more than 70 files will be listed is

$$ \begin{aligned} P(X> 70) & = P(X> 70.5)\\ & = 1-P(X< 70.5)\\ &\quad \quad (\text{ using continuity correction})\\ &= 1-P\bigg(\frac{X-\mu}{\sigma}< \frac{70.5- 72}{6.79}\bigg)\\ & =1-P(Z< -0.22)\\ & = 1-0.4126\\ & = 0.5874 \end{aligned} $$

(b) The (approximate) probability that less than 70 files will be listed is

$$ \begin{aligned} P(X< 70) & = P(X< -0.5)\\ &\quad \quad (\text{ using continuity correction})\\ &= P\bigg(\frac{X-\mu}{\sigma}< \frac{69.5- 72}{6.79}\bigg)\\ & =P(Z< -0.37)\\ & = 0.3564\\ \end{aligned} $$

(c) The probability that exactly 70 files will be listed is

$$ \begin{aligned} P(X= 70) & = P(X< 70.5)-P(X< 69.5)\\ &\quad \quad (\text{ using continuity correction})\\ &= P(Z< -0.22)-P(Z< -0.37)\\ & = 0.0562\\ \end{aligned} $$

Further Reading