The technique for selecting a sample from a statistical population is known as sampling.[1].
Sampling is a procedure that involves the selection and examination of a representative portion of a larger population for the purpose of making inferences and generalizations about a specific group (for example, consumers of a particular good or service). In general terms, two broad categories of sampling can be distinguished: probability and non-probability. Within the first group, the relevance lies in the precise determination of the sample size, for which there are several techniques, highlighting Simple Random Sampling. This process plays a crucial role in obtaining meaningful data and the reliability of the conclusions drawn, especially in market studies.
By choosing a random sample, it is hoped that its properties can be extrapolated to the population. This process saves resources, and at the same time obtains results similar to those that would be achieved if a study were carried out on the entire population. In research carried out by businessmen and medicine, sampling is used extensively to collect information about populations.[2].
It is worth mentioning that for the sampling to be valid and an adequate study to be carried out (which allows not only to make estimates of the population but also to estimate the margins of error corresponding to said estimates), it must meet certain requirements. We can never be entirely sure that the result is a representative sample, but we can act so that this condition is achieved with a high probability.
In sampling, if the sample size is smaller than the population size, two or more samples can be drawn from the same population. The set of samples that can be obtained from the population is called sample space. The variable that associates its extraction probability with each sample follows the so-called sampling distribution.
Definition of population
The success of statistical practice is based on the precise definition of the problem. In sampling, this includes defining the “population” from which our sample is drawn. A population can be defined as the set of people or elements with the characteristics that you want to understand. Since there is rarely enough time or money to collect information from all or each person in a population, the goal becomes to find a representative sample (or subset) of that population.
Statistical Sampling
Introduction
The technique for selecting a sample from a statistical population is known as sampling.[1].
Sampling is a procedure that involves the selection and examination of a representative portion of a larger population for the purpose of making inferences and generalizations about a specific group (for example, consumers of a particular good or service). In general terms, two broad categories of sampling can be distinguished: probability and non-probability. Within the first group, the relevance lies in the precise determination of the sample size, for which there are several techniques, highlighting Simple Random Sampling. This process plays a crucial role in obtaining meaningful data and the reliability of the conclusions drawn, especially in market studies.
By choosing a random sample, it is hoped that its properties can be extrapolated to the population. This process saves resources, and at the same time obtains results similar to those that would be achieved if a study were carried out on the entire population. In research carried out by businessmen and medicine, sampling is used extensively to collect information about populations.[2].
It is worth mentioning that for the sampling to be valid and an adequate study to be carried out (which allows not only to make estimates of the population but also to estimate the margins of error corresponding to said estimates), it must meet certain requirements. We can never be entirely sure that the result is a representative sample, but we can act so that this condition is achieved with a high probability.
In sampling, if the sample size is smaller than the population size, two or more samples can be drawn from the same population. The set of samples that can be obtained from the population is called sample space. The variable that associates its extraction probability with each sample follows the so-called sampling distribution.
Definition of population
The success of statistical practice is based on the precise definition of the problem. In sampling, this includes defining the “population” from which our sample is drawn. A population can be defined as the set of people or elements with the characteristics that you want to understand. Since there is rarely enough time or money to collect information from all or each person in a population, the goal becomes to find a representative sample (or subset) of that population.
Sometimes what defines a population is obvious. For example, a manufacturer has to decide whether a batch of production material is of sufficient quality to be delivered to the customer or whether it should be scrapped or reworked due to poor quality. In this case, the lot is the population.
Although the population of interest is usually made up of physical objects, sometimes it is necessary to sample in time, space, or some combination of these dimensions. For example, research on supermarket staff could examine the length of checkout queues at different times, or a study on endangered penguins could aim to understand their use of different hunting grounds over time. Regarding the temporal dimension, it can focus on specific periods or occasions.
In other cases, the "population" examined may be even less tangible. For example, Joseph Jagger studied the behavior of roulette wheels in a Monte Carlo casino and used it to identify a biased wheel. In this case, the "population" Jagger wanted to investigate was the overall behavior of the roulette wheel (i.e., the probability distribution of its outcomes across infinite trials), while his "sample" was made up of the observed outcomes of that roulette wheel. Similar considerations arise when taking repeated measurements of some physical characteristic, such as the electrical conductivity of copper.
This situation often arises when seeking knowledge about the system of causes "Causation (philosophy)") of which the observed population is a result. In such cases, sampling theory may treat the observed population as a sample of a larger "superpopulation." For example, a researcher might study the success rate of a new smoking cessation program in a test group of 100 patients, in order to predict the effects of the program if it were put into | March throughout the country. In this case, the overpopulation is "everyone in the country, with access to this treatment", a group that does not yet exist, since the program is not yet available to everyone.
The population from which the sample is drawn may not coincide with the population from which information is desired. There is often extensive but not complete overlap between these two groups due to framing issues etc. (see below). Sometimes they can be completely separate: for example, rats can be studied to better understand human health, or records of people born in 2008 can be studied to make predictions about people born in 2009.
The time spent specifying the sampled population and the population under study is often well spent, as it raises many issues, ambiguities and questions that would otherwise have been overlooked at this stage.
Sampling frame
Contenido
En el caso más sencillo, como el muestreo de un lote de material de producción (muestreo de aceptación por lotes), lo más deseable sería identificar y medir cada uno de los elementos de la población e incluir cualquiera de ellos en nuestra muestra. Sin embargo, en el caso más general esto no suele ser posible ni práctico. No hay forma de identificar a todas las ratas en el conjunto de todas las ratas. Cuando el voto no es obligatorio, no hay forma de identificar qué personas votarán en unas próximas elecciones (antes de las elecciones). Estas poblaciones imprecisas no son susceptibles de muestreo de ninguna de las formas que se indican a continuación y a las que podríamos aplicar la teoría estadística.
Como remedio, buscamos un marco de muestreo") que tenga la propiedad de que podamos identificar cada elemento e incluirlo en nuestra muestra.[3][4][5][6] El tipo de marco más sencillo es una lista de elementos de la población (preferiblemente toda la población) con la información de contacto adecuada. Por ejemplo, en una encuesta de opinión, los posibles marcos de muestreo incluyen un censo electoral y una guía telefónica.
Una muestra probabilística es una muestra en la que cada unidad de la población tiene una probabilidad (mayor que cero) de ser seleccionada en la muestra, y esta probabilidad puede determinarse con precisión. La combinación de estos rasgos permite producir estimaciones no sesgadas de los totales de la población, ponderando las unidades muestreadas según su probabilidad de selección.
En el ejemplo anterior, no todo el mundo tiene la misma probabilidad de selección; lo que la convierte en una muestra probabilística es el hecho de que se conoce la probabilidad de cada persona. Cuando cada elemento de la población sí tiene la misma probabilidad de selección, esto se conoce como un diseño de igual probabilidad de selección (EPS). Este tipo de diseño también se denomina "autoponderación", ya que todas las unidades de la muestra tienen el mismo peso.
El muestreo probabilístico incluye: Muestreo aleatorio simple"), Muestreo sistemático, Muestreo estratificado, Muestreo probabilístico proporcional al tamaño y Muestreo por conglomerados o Muestreo polietápico"). Estas diversas formas de muestreo probabilístico tienen dos cosas en común:.
El muestreo aleatorio incorpora el azar como recurso en el proceso de selección. Cuando este último cumple con la condición de que todos los elementos de la población tienen alguna oportunidad de ser escogidos en la muestra, si la probabilidad correspondiente a cada sujeto de la población es conocida de antemano, recibe el nombre de muestreo probabilístico. Una muestra seleccionada por muestreo de juicio puede basarse en la experiencia de alguien con la población. Algunas veces una muestra de juicio se usa como guía o muestra tentativa para decidir cómo tomar una muestra aleatoria más adelante.[7][8].
Cada observación mide una o más propiedades (como peso, ubicación, color o masa) de objetos o individuos independientes. En el mue«streo de encuestas», se pueden aplicar ponderaciones a los datos para ajustar el diseño de la muestra, particularmente en el muestreo estratificado.[9] Los resultados de la teoría de la probabilidad y la teoría estadística") se emplean para guiar la práctica. En la investigación comercial y médica, el muestreo se usa ampliamente para recopilar información sobre una población.[10] El muestreo de aceptación se utiliza para determinar si un lote de producción de material cumple con las especificaciones vigentes.
Simple random sampling
Part of this type of sampling are all those methods for which the probability of extraction of any of the possible samples can be calculated. This set of sampling techniques is the most advisable, although sometimes it is not possible to choose it.
To carry out this type of sampling, and in certain situations, the extraction of random numbers using computers, calculators or tables built for this purpose is very useful.
It is used when the universe or population is large, or must extend over time. First you have to identify the units and relate them to the calendar (when applicable). Then you have to calculate a constant, called the elevation coefficient:
where N is the population size and n the sample size.
To determine what date the first extraction will occur, a number between 1 and K must be chosen at random; thereafter take one of each K at regular intervals. Occasionally, it is convenient to take into account the periodicity of the phenomenon.
This means that if we have a certain number of people that is the population (N) and we want to choose from that population a smaller number which is the sample (n), we divide the number of the population by the number of the sample that we want to take and the result of this operation will be the interval, then we choose a random number from one to the number of the interval, and from this number we choose the others following the order.
It consists of the prior division of the study population into groups or classes that are assumed to be homogeneous with respect to some characteristic of those to be studied. Each of these strata would be assigned a quota that would determine the number of its members that would make up the sample. Within each stratum, the systematic sampling technique is usually used, one of the most used selection techniques in practice.
Depending on the number of sample elements to be chosen from each of the strata, there are two stratified sampling techniques:
For example, for an opinion study, it may be interesting to study the opinions of men and women separately since it is estimated that, within each of these groups, there may be a certain homogeneity. In proportional allocation, if the population is made up of 55% women and 45% men, a sample would be taken that also contains those same percentages of men and women. In the optimal allocation, if all men think alike, but women are unpredictable, a sample with more than 55% women would be taken.
For a general description of stratified sampling and the inference methods associated with this procedure, we assume that the population is divided into h subpopulations or strata of known sizes N, N,..., N such that the units in each stratum are homogeneous with respect to the characteristic in question. The unknown mean and variance for the -th stratum are denoted by and , respectively.
Non-probability sampling
It is one for which the probability of drawing a given sample cannot be calculated since not all subjects have the same probability of being chosen. For this reason, we seek to select individuals who have in-depth knowledge of the topic under study and it is considered that the information provided by these people is vital for decision making.
It is the most widespread technique, especially in market studies and opinion polls. First of all, it is necessary to divide the reference population into several strata defined by some variables of known distribution (such as gender "Gender (biology)") or age). Subsequently, the proportional weight of each stratum is calculated, that is, the proportional part of the population they represent. Finally, each weight is multiplied by the size of n of the sample to determine the precise quota in each stratum. It differs from stratified sampling in that once the quota is determined, the researcher is free to choose the sample subjects within each stratum.
Indicated for studies of clandestine, minority or widely dispersed populations but in contact with each other. It consists of identifying subjects to be included in the sample from the interviewees themselves. Starting from a small number of individuals who meet the necessary requirements, they will serve as locators for others with similar characteristics.
In this case the sample units are chosen based on some of their characteristics in a rational and not casual manner. A variant of this technique is compensated sampling") or balanced sampling"), in which the units are selected in such a way that the sample mean for certain variables is close to the population mean, which works on the basis of references or by recommendation, then is recognized through statistics.
[10] ↑ Salant, Priscilla, I. Dillman, and A. Don. How to conduct your own survey. No. 300.723 S3. 1994.
Sometimes what defines a population is obvious. For example, a manufacturer has to decide whether a batch of production material is of sufficient quality to be delivered to the customer or whether it should be scrapped or reworked due to poor quality. In this case, the lot is the population.
Although the population of interest is usually made up of physical objects, sometimes it is necessary to sample in time, space, or some combination of these dimensions. For example, research on supermarket staff could examine the length of checkout queues at different times, or a study on endangered penguins could aim to understand their use of different hunting grounds over time. Regarding the temporal dimension, it can focus on specific periods or occasions.
In other cases, the "population" examined may be even less tangible. For example, Joseph Jagger studied the behavior of roulette wheels in a Monte Carlo casino and used it to identify a biased wheel. In this case, the "population" Jagger wanted to investigate was the overall behavior of the roulette wheel (i.e., the probability distribution of its outcomes across infinite trials), while his "sample" was made up of the observed outcomes of that roulette wheel. Similar considerations arise when taking repeated measurements of some physical characteristic, such as the electrical conductivity of copper.
This situation often arises when seeking knowledge about the system of causes "Causation (philosophy)") of which the observed population is a result. In such cases, sampling theory may treat the observed population as a sample of a larger "superpopulation." For example, a researcher might study the success rate of a new smoking cessation program in a test group of 100 patients, in order to predict the effects of the program if it were put into | March throughout the country. In this case, the overpopulation is "everyone in the country, with access to this treatment", a group that does not yet exist, since the program is not yet available to everyone.
The population from which the sample is drawn may not coincide with the population from which information is desired. There is often extensive but not complete overlap between these two groups due to framing issues etc. (see below). Sometimes they can be completely separate: for example, rats can be studied to better understand human health, or records of people born in 2008 can be studied to make predictions about people born in 2009.
The time spent specifying the sampled population and the population under study is often well spent, as it raises many issues, ambiguities and questions that would otherwise have been overlooked at this stage.
Sampling frame
Contenido
En el caso más sencillo, como el muestreo de un lote de material de producción (muestreo de aceptación por lotes), lo más deseable sería identificar y medir cada uno de los elementos de la población e incluir cualquiera de ellos en nuestra muestra. Sin embargo, en el caso más general esto no suele ser posible ni práctico. No hay forma de identificar a todas las ratas en el conjunto de todas las ratas. Cuando el voto no es obligatorio, no hay forma de identificar qué personas votarán en unas próximas elecciones (antes de las elecciones). Estas poblaciones imprecisas no son susceptibles de muestreo de ninguna de las formas que se indican a continuación y a las que podríamos aplicar la teoría estadística.
Como remedio, buscamos un marco de muestreo") que tenga la propiedad de que podamos identificar cada elemento e incluirlo en nuestra muestra.[3][4][5][6] El tipo de marco más sencillo es una lista de elementos de la población (preferiblemente toda la población) con la información de contacto adecuada. Por ejemplo, en una encuesta de opinión, los posibles marcos de muestreo incluyen un censo electoral y una guía telefónica.
Una muestra probabilística es una muestra en la que cada unidad de la población tiene una probabilidad (mayor que cero) de ser seleccionada en la muestra, y esta probabilidad puede determinarse con precisión. La combinación de estos rasgos permite producir estimaciones no sesgadas de los totales de la población, ponderando las unidades muestreadas según su probabilidad de selección.
En el ejemplo anterior, no todo el mundo tiene la misma probabilidad de selección; lo que la convierte en una muestra probabilística es el hecho de que se conoce la probabilidad de cada persona. Cuando cada elemento de la población sí tiene la misma probabilidad de selección, esto se conoce como un diseño de igual probabilidad de selección (EPS). Este tipo de diseño también se denomina "autoponderación", ya que todas las unidades de la muestra tienen el mismo peso.
El muestreo probabilístico incluye: Muestreo aleatorio simple"), Muestreo sistemático, Muestreo estratificado, Muestreo probabilístico proporcional al tamaño y Muestreo por conglomerados o Muestreo polietápico"). Estas diversas formas de muestreo probabilístico tienen dos cosas en común:.
El muestreo aleatorio incorpora el azar como recurso en el proceso de selección. Cuando este último cumple con la condición de que todos los elementos de la población tienen alguna oportunidad de ser escogidos en la muestra, si la probabilidad correspondiente a cada sujeto de la población es conocida de antemano, recibe el nombre de muestreo probabilístico. Una muestra seleccionada por muestreo de juicio puede basarse en la experiencia de alguien con la población. Algunas veces una muestra de juicio se usa como guía o muestra tentativa para decidir cómo tomar una muestra aleatoria más adelante.[7][8].
Cada observación mide una o más propiedades (como peso, ubicación, color o masa) de objetos o individuos independientes. En el mue«streo de encuestas», se pueden aplicar ponderaciones a los datos para ajustar el diseño de la muestra, particularmente en el muestreo estratificado.[9] Los resultados de la teoría de la probabilidad y la teoría estadística") se emplean para guiar la práctica. En la investigación comercial y médica, el muestreo se usa ampliamente para recopilar información sobre una población.[10] El muestreo de aceptación se utiliza para determinar si un lote de producción de material cumple con las especificaciones vigentes.
Simple random sampling
Part of this type of sampling are all those methods for which the probability of extraction of any of the possible samples can be calculated. This set of sampling techniques is the most advisable, although sometimes it is not possible to choose it.
To carry out this type of sampling, and in certain situations, the extraction of random numbers using computers, calculators or tables built for this purpose is very useful.
It is used when the universe or population is large, or must extend over time. First you have to identify the units and relate them to the calendar (when applicable). Then you have to calculate a constant, called the elevation coefficient:
where N is the population size and n the sample size.
To determine what date the first extraction will occur, a number between 1 and K must be chosen at random; thereafter take one of each K at regular intervals. Occasionally, it is convenient to take into account the periodicity of the phenomenon.
This means that if we have a certain number of people that is the population (N) and we want to choose from that population a smaller number which is the sample (n), we divide the number of the population by the number of the sample that we want to take and the result of this operation will be the interval, then we choose a random number from one to the number of the interval, and from this number we choose the others following the order.
It consists of the prior division of the study population into groups or classes that are assumed to be homogeneous with respect to some characteristic of those to be studied. Each of these strata would be assigned a quota that would determine the number of its members that would make up the sample. Within each stratum, the systematic sampling technique is usually used, one of the most used selection techniques in practice.
Depending on the number of sample elements to be chosen from each of the strata, there are two stratified sampling techniques:
For example, for an opinion study, it may be interesting to study the opinions of men and women separately since it is estimated that, within each of these groups, there may be a certain homogeneity. In proportional allocation, if the population is made up of 55% women and 45% men, a sample would be taken that also contains those same percentages of men and women. In the optimal allocation, if all men think alike, but women are unpredictable, a sample with more than 55% women would be taken.
For a general description of stratified sampling and the inference methods associated with this procedure, we assume that the population is divided into h subpopulations or strata of known sizes N, N,..., N such that the units in each stratum are homogeneous with respect to the characteristic in question. The unknown mean and variance for the -th stratum are denoted by and , respectively.
Non-probability sampling
It is one for which the probability of drawing a given sample cannot be calculated since not all subjects have the same probability of being chosen. For this reason, we seek to select individuals who have in-depth knowledge of the topic under study and it is considered that the information provided by these people is vital for decision making.
It is the most widespread technique, especially in market studies and opinion polls. First of all, it is necessary to divide the reference population into several strata defined by some variables of known distribution (such as gender "Gender (biology)") or age). Subsequently, the proportional weight of each stratum is calculated, that is, the proportional part of the population they represent. Finally, each weight is multiplied by the size of n of the sample to determine the precise quota in each stratum. It differs from stratified sampling in that once the quota is determined, the researcher is free to choose the sample subjects within each stratum.
Indicated for studies of clandestine, minority or widely dispersed populations but in contact with each other. It consists of identifying subjects to be included in the sample from the interviewees themselves. Starting from a small number of individuals who meet the necessary requirements, they will serve as locators for others with similar characteristics.
In this case the sample units are chosen based on some of their characteristics in a rational and not casual manner. A variant of this technique is compensated sampling") or balanced sampling"), in which the units are selected in such a way that the sample mean for certain variables is close to the population mean, which works on the basis of references or by recommendation, then is recognized through statistics.
[10] ↑ Salant, Priscilla, I. Dillman, and A. Don. How to conduct your own survey. No. 300.723 S3. 1994.
i
m
s
This technique is the only option when a complete list of the reference population is not available or when, through the simple or stratified sampling technique, a sample is obtained with units distributed in such a way that they are difficult to access. In multiple study sampling, the population is subdivided into several ordered levels that are successively extracted through a funnel procedure. Sampling is carried out in several phases or successive extractions for each level.
For example, if it is necessary to build a sample of primary school teachers in a given country, these can be subdivided into primary units represented by didactic constituencies and secondary units that would be the teachers themselves. Firstly, we extract a sample of the primary units (for which we must have the complete list of these units) and secondly, we randomly extract a sample of secondary units from each of the primary units selected in the first extraction.
It is used when the population is divided, naturally, into groups that are supposed to contain all the variability of the population, that is, they faithfully represent it with respect to the characteristic to be chosen. Only some of these groups or clusters can be selected to carry out the study.
Within the selected groups, the elementary units will be located, for example, the people to be surveyed, and the measurement instrument could be applied to all the units, that is, the members of the group, or it could only be applied to some of them, selected at random. This method has the advantage of simplifying the collection of sample information.
When, within each selected cluster, some individuals are extracted to integrate the sample, the design is called two-stage sampling.
The ideas of strata and conglomerates are, in a sense, opposite. The first method works better the more homogeneous the population is with respect to the stratum, although the more different these are from each other. In the second, the opposite occurs. The clusters must present all the variability, although they must be very similar to each other.
Homogeneous means, in the context of stratification, that there is not much variability. The strata work better the more homogeneous each of them is with respect to the characteristic to be measured. For example, if the height of a population is studied, it is good to distinguish between the female and male strata because it is expected that, within them, there will be less variability, that is, they will be less heterogeneous. In other words, there are not as many differences between some heights and others within the stratum than in the total population.
On the contrary, heterogeneity makes division into strata useless. If the same differences occur within the stratum as in the entire population, there is no reason to use this sampling method. In cases where there are groups that contain all the variability of the population, what are built are conglomerates, which save some of the work that would be involved in analyzing the entire population. In summary, strata and conglomerates work under opposite principles: the former are better the more homogeneous the group is with respect to the characteristic to be studied and the conglomerates, if they faithfully represent the population, that is, they contain all its variability, that is, they are heterogeneous.
The sample space is: Set formed by all possible results of a random experiment. Example: When throwing an unloaded die, the sample space of the die would be: 1-2-3-4-5-6.
i
m
s
This technique is the only option when a complete list of the reference population is not available or when, through the simple or stratified sampling technique, a sample is obtained with units distributed in such a way that they are difficult to access. In multiple study sampling, the population is subdivided into several ordered levels that are successively extracted through a funnel procedure. Sampling is carried out in several phases or successive extractions for each level.
For example, if it is necessary to build a sample of primary school teachers in a given country, these can be subdivided into primary units represented by didactic constituencies and secondary units that would be the teachers themselves. Firstly, we extract a sample of the primary units (for which we must have the complete list of these units) and secondly, we randomly extract a sample of secondary units from each of the primary units selected in the first extraction.
It is used when the population is divided, naturally, into groups that are supposed to contain all the variability of the population, that is, they faithfully represent it with respect to the characteristic to be chosen. Only some of these groups or clusters can be selected to carry out the study.
Within the selected groups, the elementary units will be located, for example, the people to be surveyed, and the measurement instrument could be applied to all the units, that is, the members of the group, or it could only be applied to some of them, selected at random. This method has the advantage of simplifying the collection of sample information.
When, within each selected cluster, some individuals are extracted to integrate the sample, the design is called two-stage sampling.
The ideas of strata and conglomerates are, in a sense, opposite. The first method works better the more homogeneous the population is with respect to the stratum, although the more different these are from each other. In the second, the opposite occurs. The clusters must present all the variability, although they must be very similar to each other.
Homogeneous means, in the context of stratification, that there is not much variability. The strata work better the more homogeneous each of them is with respect to the characteristic to be measured. For example, if the height of a population is studied, it is good to distinguish between the female and male strata because it is expected that, within them, there will be less variability, that is, they will be less heterogeneous. In other words, there are not as many differences between some heights and others within the stratum than in the total population.
On the contrary, heterogeneity makes division into strata useless. If the same differences occur within the stratum as in the entire population, there is no reason to use this sampling method. In cases where there are groups that contain all the variability of the population, what are built are conglomerates, which save some of the work that would be involved in analyzing the entire population. In summary, strata and conglomerates work under opposite principles: the former are better the more homogeneous the group is with respect to the characteristic to be studied and the conglomerates, if they faithfully represent the population, that is, they contain all its variability, that is, they are heterogeneous.
The sample space is: Set formed by all possible results of a random experiment. Example: When throwing an unloaded die, the sample space of the die would be: 1-2-3-4-5-6.