Statistics – Part 2

In this post, we’ll cover Mean, Median and Mode. We’ll keep them to the point and will not go deep down into them in this series.

There are three different types of means

  • Arithmetic mean
  • Geometric mean
  • Harmonic mean.

For this series we’ll cover arithmetic mean only. I’ll refer to arithmetic mean as mean.

In simple words, mean is average. But average of what?
We’ll get into that but take a look at a following table of Players’ scores:

Players Scores
15 35 85 55 95 65
45 20 25 60 80 20

These are scores of 12 players for some game they played.

So, now if we want to know ‘what is the score that is achieved averagely by the players?’ we can simple get an average of all the observed values. (Remember data set is a set of values)

Average = sum of all observations / total number of observations 

Now, we have
total observations = 12, and
sum of all observations = 15 + 35 + 85 + ….. + 80 +20 =  600

Therefore, Mean = 600 / 12 = 50

This concludes that the average score that players achieve in the game is 50.

But that only tell us about the overall info about that data. What if we want to know much more detail about that? What if we want to know if the game was difficult or easy? How can we deduce that from out data set?

This brings us to our next topic.

Median is the mid point of our data.

To observe the mid point we need to organize our data from low score to high score. Take a look at the following table:

Ordered Players Scores
Player Score Player Score
1 15 7 55
2 20 8 60
3 20 9 65
4 25 10 80
5 35 11 85
6 45 12 95

Median is calculated using the total number of observations. Median is a value which lies in the mid of all the values. We have 12 observations. What is the mid point of these values? Well it is difficult to say, since there is no single value that fulfills this criteria. Now, only if it had been 13 total observations, then we could have said value number 7th value is the median. Then this would have been our expression:

Median = value at (n + 1 ) / 2
where n = Total number of observations

But we have 12 total values, this makes it a problem. If we use the above formula we’ll get 6th value as our median which does not makes sense as 6th value doesn’t lie in the middle of the data.
How can we get the median for this?

If we observe the table again we can see that 6th and 7th values lie in the middle but neither of them are true medians. To solve this dilemma, just take an average of these both values.
6th value = 45
7th value = 55
Their average = (45+55)/2 = 100/2 = 50

So, we can say that in case of even number of observations, our formula is as:

Median = (value at ‘a’ + value at ‘b’)/2
where n = Total number of observations,
a = n/2,
b = (n+1)/2

We got a median value as 50, therefore we can see that the game is neither easy nor difficult.

Now most of the times you see a data set, you can observe that certain data values are repeating. This triggers a question, what value is the most common?
This brings us to our last topic Mode.

Mode is a data point that occurs with high frequency in the data set. To put it simply, the data value that appears most often in the data set is Mode.

Frequency of scores
Score Frequency Score  Frequency
15 1 55  1
20 2 75  1
25 1 80  1
35 1 85  1
45 1 95  1
50 1

We can see that 20 is the score that occurs 2 times in our data set. This is our mode.

Easy. Right?

Yes, but make sure to take care of following points while finding the mode.

  • Mode does not have a minimum frequency;
    • Suppose the following data set is there
      And another data set
      We can observe that in the first data set, 5 occurs twice and in the second data set 5 occurs 4 times. In both cases, the mode is 5.
      However in the second case the 5 is the most common value whereas in first case it could be a ‘by-chance’ situation. Always make a smart observation. Frequency matters for any data set values.
  • Multiple modes can be there for a single data set;
    • Suppose the following data set is there
      We can observe that 4 and 6 occurs three times. Therefore, 4 and 6 both are our modes.

So we have seen how mean, median and mode can help us evaluate our data in different contexts and answer some questions that would have been very difficult to answer. Add in a good representation such as histograms, we get a good idea what is going on with out data.

We’ll study some other concepts that are useful in statistics in future posts.

Statistics – Part 1

Statistics is a study of methods for data collection, analysis, and interpretation, and principles of experimental design.

What is Data?
Data is any set of values. It could be qualitative or quantitative. In a simple words, data is collection of values or numbers or anything that gives some sense of information.

Is data good or bad?
Well to put it simply, data can be both. If we can get some useful information from that data, deduce something by going through it, then data is generally considered good.
But is it really so?
Even the good data can still be further classified as good or bad. Confusing, isn’t it?

Think about these questions:
How much of that data is useful?
Is it valuable?
What is the quality of data?
Is the data biased? 

You got the gist, right? When we ask these questions we slowly begin to understand whether data is good or not.

How is data collected?
The data is collected through observations and measurements.

What type of data is there?
Data can be broadly divided into 2 parts:

  • Primary data (collected by us)
  • Secondary data (collected by others)

What is importance of organised data?
Organised data help us to understand it more easily. We save time and money. It allows us to make good decision. It helps us work with precision.

How do we represent data?
There are many ways that data can be represented. The most common of them all are:

  • Tables
  • Charts
  • Graphs

There are several other ways to plot our data such as dot plot, histogram, pie charts, etc. We’ll go through these in upcoming posts.