Base Rate Fallacy Is LIke An Illusion

I took a lot of statistics, quantitative analysis, and math courses in university. I think it was a function of going to three schools during every available semester and building up a boatload of credits before LSU gave me a diploma and made me leave the state. I still remember the day I learned partial differential equations (I finally could understand the footnotes in my economics texts). With all of that, I was not exposed to the idea of a base rate fallacy (known also as base rate bias) until several years later when I was working in the garment industry. Twice this week I have run into scenarios that are examples of base rate fallacies which suggest that many people either don’t understand the concept or are blinded by raw numbers (a shiny object phenomenon). 


Sushi Rice and Tofu Bowl

Don’t assume no meat mean no taste!

Monte Carlo analysis provides a way to handle answering questions with significant uncertainty in the inputs that influence the outcome of the work so you can have the difficult “when, what, and how much” type conversations with sponsors, stakeholders, and marketing people. That definition is an explicit admission that almost ALL of the hard questions asked about projects cannot be answered using simple a + b = c formulas or arguments. This leads us to use tools like Monte Carlo analysis. There are four common assumptions often overlooked or misunderstood when using Monte Carlo methods. (more…)

Space Sign

What is the possibility

One of the most often used saying in agile is that yesterday’s weather is a good predictor of tomorrow’s performance. I have lived in Louisiana where if you blink the weather will change. I currently live near Cleveland (and I like it), and in 2014 the temperature went from 39 F to -11 F in less than 24 hours. I went running on both days; they were very different. Even if I grant that yesterday can be an important indicator of performance tomorrow, a sample size of one does not capture the degree of variability that might be present in the environment. Why does anyone care about variability in performance? As suggested in Agile Metrics: An Interlude, leaders have not stopped asking what they can get, when they can get it and how much it will cost type questions. Even if they don’t ask all of those questions there will be questions about budgets. These are not evil, unagile people; they are business people that need to plan things like cash flows and making payroll. Answering when they can deliver product to the market or when a change to the HR portal will be made are important questions. Just relying on yesterday’s weather is not always sufficient and there is no Oracle of Delphi that provides a single, unambiguous answer to any of those questions. The answers are always a range. In most circumstances, each possible outcome is more or less probable than the next. Uncertainty makes it difficult to have a conversation about when, what and how much. Monte Carlo analysis provides a way to handle answering questions with significant uncertainty in the inputs that influence the outcome of the work so you can have the difficult when, what and how much type conversations. (more…)

A New Copy!

Chapter 10A of Daniel S. Vacanti’s Actionable Agile Metrics for Predictability: An Introduction (buy a copy today) is a very short chapter (I should have planned to include this in chapter 10B, but hindsight is 20/20) covering a discussion of histograms but given the time and space we can spend additional time with an example.  The chapter is titled Cycle Time Histograms.   (more…)

How far is it?

Just because you can measure it, should you?

Over the last two weeks, we published three articles on vanity metrics:

Vanity Metrics in Software Organizations

Vanity Metrics? Maybe Not!

Vanity Metrics Have a Downside!

One of those articles elicited an interesting discussion on Twitter.   The discussion was predominately between @EtharUK (Ethar Alali) and @GregerWikstrand.  The discussion started by using the blog entry from 4 Big Myths of Profile Pictures from OKCupid to illustrate vanity metrics, and then shifted to a discussion of whether vanity metrics can be recognized based on statistical validity.  The nice thing about Twitter is that responses are forced to be succinct.  See the whole conversation here. (more…)

 How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition

How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition

Chapter 10 of How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition is titled, Bayes: Adding to What You Know Now.  Here is a summary of the chapter in a few bullet points:

  • Prior knowledge influences how we process new information.
  • Bayesian statistics help us move from what we know to what we don’t know.
  • We are all Bayesian to some extent, but maybe not enough.
  • Many myths about using information are just plain wrong, and Bayes proves it.


 How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition

How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition

Chapter 9 of How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition titled, Sampling Reality: How Observing Some Things Tells Us about All Things.  Here is a summary of the chapter in four bullet points:

  1. You do not have measure the whole population to reduce uncertainty.
  2. The term statistical significance is not always the most important question to ask when collecting data.
  3. Experimentation is useful to reduce uncertainty.
  4. Regression is a powerful, but oft misunderstood, mechanism to understand what data is telling you!



How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition

Chapter 6 of How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition, is titled: Quantifying Risk Through Modeling. Chapter 6 builds on the basics described in Chapter 4 (define the decision and data that will be needed) and Chapter 5 (determine what is known). Hubbard addresses the process of quantifying risk in two overarching themes. The first theme is the quantification of risk and the second is using the Monte Carlo analysis to model outcomes. (more…)

Clouds distributed around the mean, err, horizon.

Clouds distributed around the mean, err, horizon.

Measures of central tendency attempt to define the center of a set of data. Measures of central tendency are important to for interpreting benchmarks and when single rates are used in contracts.  There are many ways to measure however the three most popular are mean, mode and median.  Each of the measures of central tendency provides interesting information and each is more or less useful in different circumstances.  Let’s explore the following simple data set.



The mean is the most popular and well known measure of central tendency.  The mean is calculated by summing the values in the sample (or population) and dividing by the total number of observations.  In the example the mean is calculated as 231.43 / 13 or 17.80.  The mean is most useful when the data points are disturbed evenly around the mean or are normally distributed. A mean is highly influenced by outliers.  

Advantages include:

  • Most common measure of central tendency used and therefore most quickly understood.
  • The answer is unique.

Disadvantages include

  • Influenced by extremes (skewed data and outliers).


Median is the middle observation in a set of data.  Median is affected less by outliers or skewed data.  In order to find the median (by hand) you need to arrange the data in numerical order.  Using the same data set:


The median is 18.64 (six observations above and six observations below.  Since the median is positional, it is less affected by extreme values. Therefore the median is a better reflection of central tendency for data that has outliers or is skewed.  Most project metrics include outliers and tend to be skewed therefore the median is very valuable when evaluating software measures. 


  • Extreme values (outliers) do not affect the median as strongly as they do the mean.
  • The answer is unique.


  • Not as popular as the mean.


The mode is the most frequent observation in the set of data.  Modes may not be the best measure of central tendency and may not be unique. Worse the set may not have a mode.  The mode is most useful when the data is non-numeric or when you are attempting to the most popular item in a data set. Determine the mode by counting the number of each unique observations. In our example data set:


The mode in this data set is 26.43; it has two observations.  


  • Extreme values (outliers) do not affect the mode.


  • May be more than one answer.
  • If every value is unique the mode is useless (every value is the mode).
  • May be difficult to interpret.

Based on our test data set the three measures of central tendency return the following values:

  • Mean: 17.8
  • Median: 18.64
  • Mode: 26.43

Each statistic returns different values.  The mean and median provide relatively similar values therefore it would be important to understand whether the data set represents a sample or whether the data set represents the population.  If the data is from a sample or could become more skewed by extreme values, the median is probably a better representation of the central tendency in this case.  If the population is evenly distributed about the mean (or is normally distributed) the mean is a better representation of central tendency. In the sample data set the mode provides little explanative power. Understanding which measure of central tendency allows change agents to better target changes and if your contract uses metrics to determine performance, which measure of central measure you can have an impact.  Changing or arguing over which to use smacks of poor contracting or gaming the measure.