1 Population and Sample Proportion
Consider categorical data for a population of size N. If M individuals from the population belong to a certain
group, we say that the proportion of the population that belongs to this group is p = M/N.
Now suppose that a sample of size m is randomly selected and k individuals from the sample belong to the
group in question. We say that the proportion of the sample that belongs to this group is ¯p = m/n. The
sample proportion may or may not equal the population proportion.
Since ¯p was obtained through a random process, it is a random variable. Therefore, it has a set of possible
values, a probability distribution, an expected value or mean, a variance, and a standard deviation. Since ¯p
represents a proportion, its set of possible values is limited to the interval between 0 and 1. We let µ
¯p
denote
the mean of ¯p and we let σ
¯p
denote the standard deviation of ¯p.
It turns out that the mean and standard deviation of the sample proportion are related to the population
proportion in the following way:
µ
¯p
= p
That is, the mean or expected value of the sample proportion is the same as the population proportion.
Notice that this does not depend on the sample size or the population size.
σ
¯p
=
r
p(1 − p)
n
r
N − n
N − 1
| {z }
FPCF
The finite population correction factor appears again. We can ignore it in the same three cases that we did
when considering the sample mean. Observe that, as the sample size n increases, the standard deviation of
the sample proportion gets smaller. That is, as the sample size increases, the sample proportion becomes
more likely to be closer to the population proportion.
Notice that we have not said anything about the distribution of ¯p so far other than its mean and standard
deviation. For all we know at this point, it could follow a normal distribution, or a uniform distribution, or
any distribution really. We will give a more precise description of the distribution of ¯p later.
As an example, suppose that a family has five people, A, B, C, D, and E. A and D are women and B, C,
and E are men. This is our population data. The proportion of the population which is men is p = 3/5.
Now suppose that we obtain a simple random sample of 2 people from the family, without replacement. That
is, the sample must consist of 2 different people. From Lecture 7, we know that there are
5
C
2
=
5 · 4
2 · 1
= 10
possible ways of doing this. Each pair of people is equally likely to occur, with probability 1/10. For each
different sample, we will get a (perhaps) different value for ¯p, the proportion of men in the sample. For
example, if the sample consists of people A and B, then ¯p is 1/2. We can then fill in the rest of the table
below.
sample ¯p
A,B 1/2
A,C 1/2
A,D 0/2
A,E 1/2
B,C 2/2
B,D 1/2
B,E 2/2
C,D 1/2
C,E 2/2
D,E 1/2
In the second column, we see all the possible values of ¯p. The probability distribution of ¯x is:
k P (¯p = k)
0/2 1/10
1/2 6/10
2/2 3/10
1