Statistics – Part 2

In this post, we’ll cover Mean, Median and Mode. We’ll keep them to the point and will not go deep down into them in this series.

There are three different types of means

  • Arithmetic mean
  • Geometric mean
  • Harmonic mean.

For this series we’ll cover arithmetic mean only. I’ll refer to arithmetic mean as mean.

In simple words, mean is average. But average of what?
We’ll get into that but take a look at a following table of Players’ scores:

Players Scores
15 35 85 55 95 65
45 20 25 60 80 20

These are scores of 12 players for some game they played.

So, now if we want to know ‘what is the score that is achieved averagely by the players?’ we can simple get an average of all the observed values. (Remember data set is a set of values)

Average = sum of all observations / total number of observations 

Now, we have
total observations = 12, and
sum of all observations = 15 + 35 + 85 + ….. + 80 +20 =  600

Therefore, Mean = 600 / 12 = 50

This concludes that the average score that players achieve in the game is 50.

But that only tell us about the overall info about that data. What if we want to know much more detail about that? What if we want to know if the game was difficult or easy? How can we deduce that from out data set?

This brings us to our next topic.

Median is the mid point of our data.

To observe the mid point we need to organize our data from low score to high score. Take a look at the following table:

Ordered Players Scores
Player Score Player Score
1 15 7 55
2 20 8 60
3 20 9 65
4 25 10 80
5 35 11 85
6 45 12 95

Median is calculated using the total number of observations. Median is a value which lies in the mid of all the values. We have 12 observations. What is the mid point of these values? Well it is difficult to say, since there is no single value that fulfills this criteria. Now, only if it had been 13 total observations, then we could have said value number 7th value is the median. Then this would have been our expression:

Median = value at (n + 1 ) / 2
where n = Total number of observations

But we have 12 total values, this makes it a problem. If we use the above formula we’ll get 6th value as our median which does not makes sense as 6th value doesn’t lie in the middle of the data.
How can we get the median for this?

If we observe the table again we can see that 6th and 7th values lie in the middle but neither of them are true medians. To solve this dilemma, just take an average of these both values.
6th value = 45
7th value = 55
Their average = (45+55)/2 = 100/2 = 50

So, we can say that in case of even number of observations, our formula is as:

Median = (value at ‘a’ + value at ‘b’)/2
where n = Total number of observations,
a = n/2,
b = (n+1)/2

We got a median value as 50, therefore we can see that the game is neither easy nor difficult.

Now most of the times you see a data set, you can observe that certain data values are repeating. This triggers a question, what value is the most common?
This brings us to our last topic Mode.

Mode is a data point that occurs with high frequency in the data set. To put it simply, the data value that appears most often in the data set is Mode.

Frequency of scores
Score Frequency Score  Frequency
15 1 55  1
20 2 75  1
25 1 80  1
35 1 85  1
45 1 95  1
50 1

We can see that 20 is the score that occurs 2 times in our data set. This is our mode.

Easy. Right?

Yes, but make sure to take care of following points while finding the mode.

  • Mode does not have a minimum frequency;
    • Suppose the following data set is there
      And another data set
      We can observe that in the first data set, 5 occurs twice and in the second data set 5 occurs 4 times. In both cases, the mode is 5.
      However in the second case the 5 is the most common value whereas in first case it could be a ‘by-chance’ situation. Always make a smart observation. Frequency matters for any data set values.
  • Multiple modes can be there for a single data set;
    • Suppose the following data set is there
      We can observe that 4 and 6 occurs three times. Therefore, 4 and 6 both are our modes.

So we have seen how mean, median and mode can help us evaluate our data in different contexts and answer some questions that would have been very difficult to answer. Add in a good representation such as histograms, we get a good idea what is going on with out data.

We’ll study some other concepts that are useful in statistics in future posts.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s