Statistical Sampling

Introduction

The technique for selecting a sample from a statistical population is known as sampling.[1].

Sampling is a procedure that involves the selection and examination of a representative portion of a larger population for the purpose of making inferences and generalizations about a specific group (for example, consumers of a particular good or service). In general terms, two broad categories of sampling can be distinguished: probability and non-probability. Within the first group, the relevance lies in the precise determination of the sample size, for which there are several techniques, highlighting Simple Random Sampling. This process plays a crucial role in obtaining meaningful data and the reliability of the conclusions drawn, especially in market studies.

By choosing a random sample, it is hoped that its properties can be extrapolated to the population. This process saves resources, and at the same time obtains results similar to those that would be achieved if a study were carried out on the entire population. In research carried out by businessmen and medicine, sampling is used extensively to collect information about populations.[2].

It is worth mentioning that for the sampling to be valid and an adequate study to be carried out (which allows not only to make estimates of the population but also to estimate the margins of error corresponding to said estimates), it must meet certain requirements. We can never be entirely sure that the result is a representative sample, but we can act so that this condition is achieved with a high probability.

In sampling, if the sample size is smaller than the population size, two or more samples can be drawn from the same population. The set of samples that can be obtained from the population is called sample space. The variable that associates its extraction probability with each sample follows the so-called sampling distribution.

Definition of population

The success of statistical practice is based on the precise definition of the problem. In sampling, this includes defining the “population” from which our sample is drawn. A population can be defined as the set of people or elements with the characteristics that you want to understand. Since there is rarely enough time or money to collect information from all or each person in a population, the goal becomes to find a representative sample (or subset) of that population.

Statistical Sampling

Introduction

The technique for selecting a sample from a statistical population is known as sampling.[1].

Definition of population

Sampling frame

Contenido

En el caso más sencillo, como el muestreo de un lote de material de producción (muestreo de aceptación por lotes), lo más deseable sería identificar y medir cada uno de los elementos de la población e incluir cualquiera de ellos en nuestra muestra. Sin embargo, en el caso más general esto no suele ser posible ni práctico. No hay forma de identificar a todas las ratas en el conjunto de todas las ratas. Cuando el voto no es obligatorio, no hay forma de identificar qué personas votarán en unas próximas elecciones (antes de las elecciones). Estas poblaciones imprecisas no son susceptibles de muestreo de ninguna de las formas que se indican a continuación y a las que podríamos aplicar la teoría estadística.

Como remedio, buscamos un marco de muestreo") que tenga la propiedad de que podamos identificar cada elemento e incluirlo en nuestra muestra.[3][4][5][6] El tipo de marco más sencillo es una lista de elementos de la población (preferiblemente toda la población) con la información de contacto adecuada. Por ejemplo, en una encuesta de opinión, los posibles marcos de muestreo incluyen un censo electoral y una guía telefónica.

Una muestra probabilística es una muestra en la que cada unidad de la población tiene una probabilidad (mayor que cero) de ser seleccionada en la muestra, y esta probabilidad puede determinarse con precisión. La combinación de estos rasgos permite producir estimaciones no sesgadas de los totales de la población, ponderando las unidades muestreadas según su probabilidad de selección.

En el ejemplo anterior, no todo el mundo tiene la misma probabilidad de selección; lo que la convierte en una muestra probabilística es el hecho de que se conoce la probabilidad de cada persona. Cuando cada elemento de la población sí tiene la misma probabilidad de selección, esto se conoce como un diseño de igual probabilidad de selección (EPS). Este tipo de diseño también se denomina "autoponderación", ya que todas las unidades de la muestra tienen el mismo peso.

El muestreo probabilístico incluye: Muestreo aleatorio simple"), Muestreo sistemático, Muestreo estratificado, Muestreo probabilístico proporcional al tamaño y Muestreo por conglomerados o Muestreo polietápico"). Estas diversas formas de muestreo probabilístico tienen dos cosas en común:.

El muestreo aleatorio incorpora el azar como recurso en el proceso de selección. Cuando este último cumple con la condición de que todos los elementos de la población tienen alguna oportunidad de ser escogidos en la muestra, si la probabilidad correspondiente a cada sujeto de la población es conocida de antemano, recibe el nombre de muestreo probabilístico. Una muestra seleccionada por muestreo de juicio puede basarse en la experiencia de alguien con la población. Algunas veces una muestra de juicio se usa como guía o muestra tentativa para decidir cómo tomar una muestra aleatoria más adelante.[7][8].

Cada observación mide una o más propiedades (como peso, ubicación, color o masa) de objetos o individuos independientes. En el mue«streo de encuestas», se pueden aplicar ponderaciones a los datos para ajustar el diseño de la muestra, particularmente en el muestreo estratificado.[9] Los resultados de la teoría de la probabilidad y la teoría estadística") se emplean para guiar la práctica. En la investigación comercial y médica, el muestreo se usa ampliamente para recopilar información sobre una población.[10] El muestreo de aceptación se utiliza para determinar si un lote de producción de material cumple con las especificaciones vigentes.

Simple random sampling

Part of this type of sampling are all those methods for which the probability of extraction of any of the possible samples can be calculated. This set of sampling techniques is the most advisable, although sometimes it is not possible to choose it.

To carry out this type of sampling, and in certain situations, the extraction of random numbers using computers, calculators or tables built for this purpose is very useful.

It is used when the universe or population is large, or must extend over time. First you have to identify the units and relate them to the calendar (when applicable). Then you have to calculate a constant, called the elevation coefficient:

where N is the population size and n the sample size.

To determine what date the first extraction will occur, a number between 1 and K must be chosen at random; thereafter take one of each K at regular intervals. Occasionally, it is convenient to take into account the periodicity of the phenomenon.

This means that if we have a certain number of people that is the population (N) and we want to choose from that population a smaller number which is the sample (n), we divide the number of the population by the number of the sample that we want to take and the result of this operation will be the interval, then we choose a random number from one to the number of the interval, and from this number we choose the others following the order.

It consists of the prior division of the study population into groups or classes that are assumed to be homogeneous with respect to some characteristic of those to be studied. Each of these strata would be assigned a quota that would determine the number of its members that would make up the sample. Within each stratum, the systematic sampling technique is usually used, one of the most used selection techniques in practice.

Depending on the number of sample elements to be chosen from each of the strata, there are two stratified sampling techniques:

For example, for an opinion study, it may be interesting to study the opinions of men and women separately since it is estimated that, within each of these groups, there may be a certain homogeneity. In proportional allocation, if the population is made up of 55% women and 45% men, a sample would be taken that also contains those same percentages of men and women. In the optimal allocation, if all men think alike, but women are unpredictable, a sample with more than 55% women would be taken.

For a general description of stratified sampling and the inference methods associated with this procedure, we assume that the population is divided into h subpopulations or strata of known sizes N, N,..., N such that the units in each stratum are homogeneous with respect to the characteristic in question. The unknown mean and variance for the -th stratum are denoted by and , respectively.

Non-probability sampling

It is one for which the probability of drawing a given sample cannot be calculated since not all subjects have the same probability of being chosen. For this reason, we seek to select individuals who have in-depth knowledge of the topic under study and it is considered that the information provided by these people is vital for decision making.

It is the most widespread technique, especially in market studies and opinion polls. First of all, it is necessary to divide the reference population into several strata defined by some variables of known distribution (such as gender "Gender (biology)") or age). Subsequently, the proportional weight of each stratum is calculated, that is, the proportional part of the population they represent. Finally, each weight is multiplied by the size of n of the sample to determine the precise quota in each stratum. It differs from stratified sampling in that once the quota is determined, the researcher is free to choose the sample subjects within each stratum.

Indicated for studies of clandestine, minority or widely dispersed populations but in contact with each other. It consists of identifying subjects to be included in the sample from the interviewees themselves. Starting from a small number of individuals who meet the necessary requirements, they will serve as locators for others with similar characteristics.

In this case the sample units are chosen based on some of their characteristics in a rational and not casual manner. A variant of this technique is compensated sampling") or balanced sampling"), in which the units are selected in such a way that the sample mean for certain variables is close to the population mean, which works on the basis of references or by recommendation, then is recognized through statistics.

Sampling frame

Contenido

Simple random sampling

To carry out this type of sampling, and in certain situations, the extraction of random numbers using computers, calculators or tables built for this purpose is very useful.

where N is the population size and n the sample size.

Depending on the number of sample elements to be chosen from each of the strata, there are two stratified sampling techniques:

Navegación

Statistical Sampling

Introduction

Definition of population

Statistical Sampling

Introduction

Definition of population

Sampling frame

Contenido

Simple random sampling

Non-probability sampling

References

Sampling frame

Contenido

Simple random sampling

Non-probability sampling

References