Standard deviation: what is standard deviation and what is it used for?

What is standard deviation, and how is it used in statistics when it comes to research? Let's take a look.

The term standard deviation or standard deviation refers to a measure that is used to quantify the variation or dispersion of numerical data in a random variable, statistical population, data set or probability distribution.

The world of research and statistics may seem complex and foreign to the general population, as it seems that mathematical calculations happen under our eyes without us being able to understand the underlying mechanisms behind them. Nothing could be further from the truth.

In this opportunity we are going to relate in a simple but at the same time exhaustive way the context, the foundation and the application of such an essential term as the standard deviation in the field of statistics.

What is standard deviation?

Statistics is a branch of mathematics that is in charge of recording variability, as well as the random process that generates it following the laws of probability. This is soon said, but within the statistical processes are found the answers to everything that today we consider as "dogmas" in the world of nature and physics.

For example, let's say that when a coin is tossed three times in the air, two of them come up heads and one comes up tails. Simple coincidence, right? On the other hand, if we flip the same coin 700 times and 660 of them land on heads, it is possible that there is a factor that favors this phenomenon beyond randomness (let us imagine, for example, that it only has time to spin a limited number of times in the air, which means that it almost always lands in the same way). Thus, the observation of patterns beyond mere coincidence prompts us to think about the underlying reasons for the trend.

What we want to make clear with this very bizarre example is that statistics is an essential tool for any scientific process, since on the basis of it we are able tobecause on the basis of it we are able to distinguish realities that are the result of chance from events governed by natural laws.

Thus, we can throw a hasty definition of standard deviation and say that it is a statistical measure product of the square root of its variance. This is like starting the house from the roof, because for a person who is not entirely dedicated to the world of numbers, this definition and not knowing anything about the term are little different. Let us then take a moment to dissect the world of basic statistical patterns..

Measures of position and variability

Measures of position are indicators used to indicate what percentage of data within a frequency distribution exceed these expressions, whose value represents the value of the data at the center of the frequency distribution.. Do not despair, because we define them quickly:

Mean: The numerical average of the sample.
Median: represents the value of the variable of central position in a set of ordered data.

In a rudimentary way, we could say that position measures are focused on dividing the data set into equal percentage parts, i.e., "getting to the middle".

On the other hand, the measures of variability are in charge of determine the degree of closeness or distance of the values of a distribution to its location average (i.e., to the mean of the distribution). (i.e., against the mean). These are as follows:

Range: measures the amplitude of the data, i.e., from the minimum to the maximum value.
Variance: the expectation (mean of the data series) of the square of the deviation of that variable from its mean.
Standard deviation: numerical index of the dispersion of the data set.

Of course, we are moving in relatively complex terms for someone who is not entirely dedicated to the world of mathematics. We do not want to go into other measures of variability, since we know that the higher the numerical products of these parameters, the less homogenized the data set will be.

"The average of the atypical"

Once we have cemented our knowledge of measures of variability and their importance vis-à-vis data analysis, it is time to turn our attention back to standard deviation.

Without going into complex concepts (and perhaps erring on the side of oversimplifying things), we can say that this measure is the product of the calculation of the mean of the "outliers".. Let's use an example to clarify this definition:

We have a sample of six pregnant bitches of the same breed and age that have just given birth to their litters of puppies simultaneously. Three of them have given birth to 2 pups each, while three others have given birth to 4 pups per female. Naturally, the mean value of offspring is 3 pups per female (the sum of all pups divided by the total number of females).

What would be the standard deviation in this example? First, we would have to subtract the mean from the values obtained and square this figure (since we do not want negative numbers), for example: 4-3=1 or 2-3= (-1, squared, 1).

The variance would be calculated as the average of the deviations with respect to the mean value (in this case, 3). Here we would be dealing with the variance, and therefore, we have to take the square root of this value to transform it into the same numerical scale as the mean. After this we would obtain the standard deviation.

So, what would be the standard deviation of our example? Well, a puppy. The mean of the litters is estimated to be three offspring, but it is within the normal range for the mother to give birth to one less or one more puppy per litter.

Perhaps this example might sound a little confusing as far as variance and deviation are concerned (since the square root of 1 is 1), but if the variance in this example were 4, the standard deviation would be 2 (remember, its square root).

What we have wanted to show with this example is that the variance and the standard deviation are statistical measures that seek to obtain the mean of the values different from the average.. Let us remember: the greater the standard deviation, the greater the dispersion of the population.

Recalling the previous example, if all the bitches are of the same breed and have similar weights, it is normal that the deviation is one puppy per litter. But for example, if we take a mouse and an elephant, it is clear that the deviation in the number of offspring would reach values much larger than one. Again, the less the two sample groups have in common, the larger the deviations are expected to be.

Still, one thing is clear: using this parameter we are calculating the variance in the data of a sample, but by no means is this necessarily representative of an entire population. In this example we have taken six bitches, but what if we were monitoring seven and the seventh bitch had a litter of 9 puppies?

Of course, the pattern of deviation would change. For this reason, taking into account sample size is essential when interpreting any set of data.. The more individual numbers collected and the more times an experiment is repeated, the closer we are to postulating a general truth.

Conclusions

As we have seen, the standard deviation is a measure of data dispersion. The greater the dispersion, the higher this value will be.If we were dealing with a set of completely homogeneous results (i.e. all were equal to the mean), this parameter would be equal to 0.

This value is of enormous importance in statistics, since not everything is reduced to finding common bridges between figures and events, but it is also essential to record the variability between sample groups in order to be able to ask more questions and obtain more knowledge in the long term.

Bibliographic references:

Calculating standard deviation step by step, khanacademy.org. Retrieved August 29 from https://es.khanacademy.org/math/probability/data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-step.
Jaime, S., & Vinicio, M. (1973). Probability and statistics.
Parra, J. M. (1995). Descriptive and inferential statistics I. Retrieved from: http://www. academia. edu/download/35987432/ESTADISTICA_DESCRIPTIVA_E_INFERENCIAL. pdf.
Rendón-Macías, M. E., Villasís-Keeve, M. Á., & Miranda-Novales, M. G. (2016). Descriptive statistics. Revista Alergia México, 63(4), 397-407.
Ricardi, F. Q. (2011). Statistics applied to health research. Retrieved from The Chi-Square test: http://www. medwave. cl/link. cgi/Medwave/Series/MBE04/5266.

(Updated at Apr 14 / 2024)