Sampling is one of the most critical activities that occur in every project. The problem is that most of us don’t get it right. This article will attempt to explain the fundamentals of sampling, why it is essential to understand these concepts, and some practical considerations to think about.

Let’s start by understanding what we mean by a representative sample. A sample is representative of a larger population that is of interest to us. The sample, therefore, must reflect the aspects of the population of interest. Consider a group of 36 children, of which 18 are boys, and 18 are girls. A representative sample of ten based on gender alone would be five boys and five girls. A non-representative sample would be ten boys. Finding a representative sample will become problematic if we add ethnicity to the mix.

From a statistical point of view, a representative sample is a random sample drawn from a population such that observed values selected have the same distribution as the population. When samples are not representative of the population, statistical inferences are not valid.

Homogeneity is a term used to describe the similarity or sameness of specified characteristics of the members of the targeted population and is important when conducting stratified sampling.

Bias is another concept that can affect sampling, allowing some members of the targeted population to be less likely included than others. In addition, the process used to collect the sample is non-random, where each member of the population does not have the same probability of being selected.

Other essential sampling concepts are:

- Accuracy refers to how close the sample statistic is to the population parameter and is measured by the margin of error.
- Precision refers to how close estimates from different samples are to each other and is inversely related to the standard error.
- The margin of error refers to the maximum expected difference between the sample estimate of the population parameter and the true population parameter. The margin of error is equal to one-half of the width of the confidence interval.
- The sampling frame is a complete list of the target population members from which a sample is drawn.
- Strata is a mutually exclusive segment of the target population that is defined by one or more characteristics.

From a practical standpoint, here are some things to consider:

- Unless we can collect EVERY instance of process observation, we are forced to sample the output of the process
- The goal is that our sample is representative of the population
- In determining a reasonable sampling plan, we need to consider
- Cost to collect the data
- The variation in the process output
- What is the purpose of this data collection (baseline, improvement comparison, screening variation sources)
- What issues or barriers could we run into?
- What are some of the potential sources of variation and their frequency?