Historical data doesn't come from historical ruins.

Historical data doesn’t come from historical ruins.

Historical data is needed for any form of consistent estimation.  The problem with historical data is that gathering the data requires effort, time or money.  The need to expend resources to generate, collect or purchase historical data is often used as a bugaboo to resist collecting the data and as a tool to avoid using parametric or historical estimating techniques.

Historical data can be as simple as a Scrum team collecting their velocity or productivity every sprint and using it to calculate an average for planning and estimating or as complex as the set of data that teams using parametric estimation collect which includes a more robust pallet of data including project effort, size, duration, team capabilities and project context. In both cases the data collected needs to be for the method you are using and the level of granularity that you are going to estimate or plan.  For instance, if you are estimating at the project level you need data at a project level. If you are estimating at a task level you need to collect historical data at the task level.

Here is my recommended pallet of historical data for estimating at the project level:

Original Estimate (effort, duration, staffing)

Actual Outcome (effort, duration, staffing)

Cost (estimated and actual) – Cost data can be broken down based on the source.  Examples of further levels of granularity include hardware costs, software purchase or license costs and contractor versus internal personnel costs.

Capabilities (predicted and actual) – Capabilities describe the level of competency of the team.  Examples of capabilities include team skill set, experience level, roles and control structures.

Size (predicted and actual) – Size is a measure of the end project delivered by the project.  In a software project, size is a measure of the functionality that will be delivered by the project (IFPUG Function Points is an example of measure of software functionality).

Context – Context is the story of the project including whether anything out of the norm that happened. For example, knowing that half the project team was temporarily reassigned during the project may be important to know when analyzing the data.

Project Demographics (who was the customer, what were the product(s) affected, what methods were used, what was the primary technology, were any of the technologies new to the team, what were the primary languages, were any of the languages new to the team)

If we were to need to estimate (not plan) at a phase, release or sprint level then the data collected would need to be collected at that level.

Historical data is a requirement for effective budgeting and estimation.  The best data is data is data from your organization projects.  This means that you have to define the information you want, collect the data and analyze the data.  The collection of data also infers that someone needs to record the data as it happens (time accounting and project level accounting).  Only collect the information you need and only at the level you are going to use.  Remember, data collection for each measure or additional level of information will require more effort both from those analyzing the data, those collecting the data and perhaps more importantly by those that have to record the data.  Balance the level of measurement overhead with the benefit you can extract in the near term.  Collecting data that you might need or that will pay off in a few years will usually end up costing more than it will return and may well disenchant the people you are asking to collect and record the data.  When they become disenchanted your data quality will suffer (or potentially stop being reported).  When beginning an estimation program immediately start collecting your own data, BUT also consider reaching out to external sources of data to jump start the program that will ensure you can begin estimating as you collect your own data.