In the seventh essay of The Mythical Man-Month, Fred P. Brooks begins to tackle the concept of estimating. While there are many estimating techniques, Brooks’ approach is a history/data-based approach, which we would understand as today as parametric estimation. Parametric estimation is generally a technique that generates a prediction of the effort needed to deliver a project based on historical data of productivity, staffing and quality. Estimating is not a straightforward extrapolation of has happened in the past to what will happen in the future, and mistaking it as such is fraught with potential issues. Brooks identified two potentially significant estimating errors that can occur when you use the past to predict the future without interpretation.
Often the only data available is the information about one part of the project’s life cycle. The first issue Brooks identified was that you cannot estimate the entire job or project by just estimating the coding and inferring the rest. There are many variables that might affect the relationship between development and testing. For example, some changes can impact more of the code than others, requiring more or less regression testing. The link between the effort required to deliver different types of work is not linear. The ability to estimate based on history requires a knowledge of project specific practices and attributes including competency, complexity and technical constraints.
Not all projects are the same. The second issue Brooks identified was that one type of project is not applicable for predicting another. Brooks used the differences between small projects and programming systems products to illustrate his point. Each type of work requires different activities, not just scaled up versions of the same tasks. Similarly, consider the differences in the tasks and activities required for building a smart phone app compared to building a large data warehouse application. Simply put, they are radically different. Brooks drove the point home using the analogy of extrapolating the record time for 100-yard dash (9.07 seconds according to Wikipedia) to the time to run a mile. The linear extrapolation would mean that a mile could be run in 2.40 (ish) minutes (a mile is 1760 yards) the current record is 3.43.13.
A significant portion of this essay is a review of a number of studies that illustrated the relationship between work done and the estimate. Brooks used these studies to highlight different factors that could impact the ability to extrapolate what has happened in the past to an estimate of the future (note: I infer from the descriptions that these studies dealt with the issue of completeness and relevance. The surveys, listed by the person that generated the data, and the conclusions we can draw from an understanding of the data included:
- Charles Portman’s Data – Slippages occurred primarily because only half the time available was productive. Unrelated jobs meetings, paperwork, downtime, vacations and other non-productive tasks used the remainder.
- Joel Aron’s Data – Productivity was negatively related to the number of interactions among programmers. As the number of interactions goes up, productivity goes down.
- John Harr’s Data- The variation between estimates and actuals tend to be affected by the size of workgroups, length of time and number of modules. Complexity of the program being worked on could also be a contributor.
- OS/360 Data- Confirmed the striking differences in productivity driven by the complexity and difficulty of the task.
- Corbatoó’s Data – Programming languages affect productivity. Higher-level languages are more productive. Said a little differently, writing a function in Ruby on Rails requires less time than writing the same function in macro assembler language.
I believe that the surveys and data discussed are less important that the statistical recognition that there are many factors that must be addressed when trying to predict the future. In the end, estimation requires relevant historical data regardless of method, but the data must be relevant. Relevance is short hand for accounting for the factors that affect the type work you are doing. In homogeneous environments, complexity and language may not be as big a determinant of productivity as the number of interactions driven by team size or the amount of non-productive time teams have to absorb. The problem with historical data is that gathering the data requires effort, time and/or money. The need to expend resources to generate, collect or purchase historical data is often used as a bugaboo to resist collecting the data and as a tool to avoid using parametric or historical estimating techniques.
Recognize that the the term historical data should not scare you away. Historical data can be as simple as a Scrum team collecting their velocity or productivity every sprint and using it to calculate an average for planning and estimating. Historical data can be as complex as a pallet of information including project effort, size, duration, team capabilities and project context.
Previous installments of the Re-read of The Mythical Man-Month