Clouds distributed around the mean, err, horizon.

Clouds distributed around the mean, err, horizon.

Measures of central tendency attempt to define the center of a set of data. Measures of central tendency are important to for interpreting benchmarks and when single rates are used in contracts.  There are many ways to measure however the three most popular are mean, mode and median.  Each of the measures of central tendency provides interesting information and each is more or less useful in different circumstances.  Let’s explore the following simple data set.

Untitled

Mean

The mean is the most popular and well known measure of central tendency.  The mean is calculated by summing the values in the sample (or population) and dividing by the total number of observations.  In the example the mean is calculated as 231.43 / 13 or 17.80.  The mean is most useful when the data points are disturbed evenly around the mean or are normally distributed. A mean is highly influenced by outliers.  

Advantages include:

  • Most common measure of central tendency used and therefore most quickly understood.
  • The answer is unique.

Disadvantages include

  • Influenced by extremes (skewed data and outliers).

Median

Median is the middle observation in a set of data.  Median is affected less by outliers or skewed data.  In order to find the median (by hand) you need to arrange the data in numerical order.  Using the same data set:

Untitled2

The median is 18.64 (six observations above and six observations below.  Since the median is positional, it is less affected by extreme values. Therefore the median is a better reflection of central tendency for data that has outliers or is skewed.  Most project metrics include outliers and tend to be skewed therefore the median is very valuable when evaluating software measures. 

Advantages

  • Extreme values (outliers) do not affect the median as strongly as they do the mean.
  • The answer is unique.

Disadvantages

  • Not as popular as the mean.

Mode

The mode is the most frequent observation in the set of data.  Modes may not be the best measure of central tendency and may not be unique. Worse the set may not have a mode.  The mode is most useful when the data is non-numeric or when you are attempting to the most popular item in a data set. Determine the mode by counting the number of each unique observations. In our example data set:

Untitled3

The mode in this data set is 26.43; it has two observations.  

Advantages:

  • Extreme values (outliers) do not affect the mode.

Disadvantages:

  • May be more than one answer.
  • If every value is unique the mode is useless (every value is the mode).
  • May be difficult to interpret.

Based on our test data set the three measures of central tendency return the following values:

  • Mean: 17.8
  • Median: 18.64
  • Mode: 26.43

Each statistic returns different values.  The mean and median provide relatively similar values therefore it would be important to understand whether the data set represents a sample or whether the data set represents the population.  If the data is from a sample or could become more skewed by extreme values, the median is probably a better representation of the central tendency in this case.  If the population is evenly distributed about the mean (or is normally distributed) the mean is a better representation of central tendency. In the sample data set the mode provides little explanative power. Understanding which measure of central tendency allows change agents to better target changes and if your contract uses metrics to determine performance, which measure of central measure you can have an impact.  Changing or arguing over which to use smacks of poor contracting or gaming the measure.  

Advertisements