# Prashnam: The Story and the Science (Part 4)

Sizing

There are two important terms to understand for determining sample size – confidence level and margin of error. Stat Trek explains both:

Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic. A confidence interval is an interval estimate combined with a probability statement.

For example, suppose a statistician conducted a survey and computed an interval estimate, based on survey data. The statistician might use a confidence level to describe uncertainty associated with the interval estimate. He/she might describe the interval estimate as a “95% confidence interval”. This means that if we used the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter to fall within the interval estimates 95% of the time.

Confidence intervals are preferred to point estimates and to interval estimates, because only confidence intervals indicate (a) the precision of the estimate and (b) the uncertainty of the estimate.

The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter. To be meaningful, the margin of error should be qualified by a probability statement (often expressed in the form of a confidence level).

For example, a pollster might report that 50% of voters will choose the Democratic candidate. To indicate the quality of the survey result, the pollster might add that the margin of error is +5%, with a confidence level of 90%. This means that if the survey were repeated many times with different samples, the true percentage of Democratic voters would fall within the margin of error 90% of the time.

Most political surveys provide their results with a 95% confidence level and a +3% margin of error. For this, the sample that they need for a heterogenous population (irrespective of size) is about 1000. This is the magic of sampling. To get a sense of what voters in Bihar (about 7 crore) think, all we need to do is to sample 1000 randomly selected people. By choosing them across all assembly constituencies and ensuring proper representation against age, gender and geography, this sample of 1000 can give a very accurate view of what the general population thinks.

(If you don’t believe it, try this sample size calculator. Select confidence level of 95% and margin of error of 0.03, and play around with the population size.) What we have learnt so far: a sample of about 1000 people chosen via stratified random sampling is good enough to provide a mirror of what the people are thinking. This gives a 95% confidence level and a 3% margin of error. (If the population is more homogenous as in a village or a PIN code or even an assembly constituency, then the sample size can be reduced.)

Tomorrow: Part 5 