Chapter 10 of How to Measure Anything, Finding the Value of “Intangibles in Business” Third Edition is titled, Bayes: Adding to What You Know Now. Here is a summary of the chapter in a few bullet points:
- Prior knowledge influences how we process new information.
- Bayesian statistics help us move from what we know to what we don’t know.
- We are all Bayesian to some extent, but maybe not enough.
- Many myths about using information are just plain wrong, and Bayes proves it.
Conventional statistics make some simplifying assumptions. Two of the big assumptions are:
- The observer has no prior information about the range of possible values, and
- The observer does not have prior knowledge of the distribution of the population.
Often, both are bad assumptions. Enter the concepts of Bayesian statistics. Bayesian statistics deal with how we update prior knowledge with new information. Hubbard uses the example of the determining the probability that it raining if you have an accident based on the known probability of having an accident if it is raining. This is called the Bayesian inversion. In figure 10.1 Hubbard lists and reviews a number of basic probability concepts that allow us to logically flip the question being asked around and then to determine the probability of the flipped question. A Bayesian inversion gives you a path from things that seem easier to quantify to something you believe to be harder to quantify.
I spent a semester in undergrad and several classes during grad school learning Bayesian statistics. Bayesian statistics is powerful, but difficult to learn (at least in my case). Hubbard makes the point that people are intuitively Bayesian. It is in our nature to begin with an estimate, gather information and then update the estimates (this is of course unless you are a fan of the Cleveland Browns). Being intuitively Bayesian means we understand that we begin with prior knowledge and that we can update that knowledge with new information. The process happens in our heads without Microsoft Excel or even math.
A deeper problem tends to be a tendency to ignore how the data is distributed when presented with new information. This phenomenon is called base rate neglect. Wikipedia presents two excellent examples. Boiling the concept down, when presented with specific information related to a broader pool of answers. For example, let’s say my Cleveland Browns win the first game of the next football season, I might immeability jump to the conclusion that they will win the Super Bowl for that year even they have not won more than 50% of their games in YEARS. I am neglecting what is known about the distribution of success for the Cleveland Brown’s based on the most immediate observation. Hubbard suggests that one simple defense against base rate neglect is to simply be aware that the whole set of observations must be taken into account. Secondly, calibrated estimators are better at leveraging Bayesian concepts than un-calibrated estimators.
Hubbard summarizes Bayes by pointing out that often a measurement question is more approachable if we begin with a proposition that we understand and then invert the question, this is the heart of the Bayesian Inversion. Hubbard uses Bayesian statistics to debunk four myths.
- Myth: “Absence of evidence is not evidence of absence” is wrong. The absence of evidence is data that, when inverted, provides information that reduces uncertainty. In the simple example used in the Chapter, Hubbard began with the known probability of accidents occurring in the rain. Does the a lack of accidents at a particular time tell us anything about whether it raining? Through a Bayesian Inversion, the answer is yes if we know there was no accident at a particular time we know something about the probability that it was raining. We have reframed the question to use the absence of something to tell us something about the probability of something else.
- Myth: Correlation is not evidence of causation. The logical proof that correlation is not evidence of causation follows a path similar to the proof of the absence of evidence discussed above. The classic example that correlation does not establish causation uses the relationship between the sun setting and crickets chirping. The sun goes down and almost all of the time, crickets begin chirp (a strong positive correlation). As with the absence of evidence, correlation can constitute evidence and additionally, correlation does increase the probability of causation.
- Myth: Ambiguous results tell us nothing. Assuming we are looking for evidence, the fact that that an observation is ambiguous or that we don’t see what we are looking for does not mean we have not learned anything. The lack of evidence provides information that is useful for understanding the probability of whether what we are looking for exists. Using the Bayesian Statistics, if we’re looking for evidence, the fact that we don’t see it or that the results are ambiguous provides information that changes the original estimated probability that what we are looking for exists.
- Myth: Each observation alone tells us nothing. Bayesian Statistics drives the point home that every observation provides information that changes what we knew before. Debunking this myth is important to organizations that are investigating the concept of software development productivity. Software development productivity is a complex concept and is affected by a myriad of factors. However, if knowing something about a single variable helps reduce uncertainty when considered among many other variables, then it is useful even in isolation.
Understanding Bayes is important, even if we can be instinctively Bayesian. Many estimation problems, ranging from story points to portfolio-level valuation,s use analogies. The use of analogies in estimation is an example of the use of Bayes Theorem. Analogies are a set of observation known observations and the understanding of the distribution of those observations. We choose an analogy from a set of observations and then use what we know about the present to determine how that effects the analogy. Chapter 10 re-jumpstarted my knowledge of Baye, but I had to crack open a couple of my university textbooks and my copy of Schaum’s Outline of Business Statistics in order re-baseline my knowledge of Bayes Theorem and Bayesian Statistics.
A parting comment . . . if I publish Re-Read Saturday on 90% of the Saturdays in a year, what is the probability that if I publish a blog on Saturday that it will be part of a Re-Read? If you need help, check out the downloads available at http://www.howtomeasureanything.com/
Add your answer to comments or to the Software Process and Measurement Cast Facebook page.
Chapter 1: The Challenge of Intangibles
Chapter 2: An Intuitive Measurement Habit: Eratosthenes, Enrico, and Emily
Chapter 3: The Illusions of Intangibles: Why Immeasurables Aren’t