Baseline, not base line...

Baseline, not base line…

Measuring a process generates a baseline.  By contrast, a benchmark is a comparison of a baseline to another baseline.  Benchmarks can compare baselines to other internal baselines or external baselines.  I am often asked whether it possible to externally benchmark measures and metrics that have no industry definition or occasionally are team specific. Without developing a common definition of the measure or metric so that data is comparable, the answer is no.  A valid baseline line and benchmark requires that the measure or metric being collected is defined and consistently collected by all parties using the benchmark.

Measures or metrics used in external benchmarks need to be based on published or agreed upon standards between the parties involved in the benchmark.  Most examples of standards are obvious.  For example, in the software field there are a myriad of standards that can be leveraged to define software metrics.  Examples of standards groups include: IEEE, ISO, IFPUG, COSMIC and OMG. Metrics that are defined by these standards can be externally benchmarked and there are numerous sources of data.  Measures without international standards require all parties to specifically define what is being measured.  I recently ran across a simple example. The definition of a month caused a lot of discussion.  An organization compared function points per month (a simple throughput metric) to benchmark data they purchased.  The organization’s data was remarkably below the baseline.  The problem was that the benchmark used the common definition of a month (12 in a year) while their data used an internal definition of a 13 period year. The benchmark data or their data should have been modified to be comparable.

Applying the defined metric consistently is also critical and not always a given.  For example, when discussing the cost of an IT project understanding what is included is important for consistency.  Project costs could include hardware, software development and changes, purchased software, management costs, project management costs, business participation costs, and the list could go on ad-infinitum.  Another example might be the use of story points (a relative measure based on team perception), while a team may well be able to apply the measure consistently because it is based on perception comparisons, outside of the team would be at best valueless and at worst dangerous.

The data needed to create a baseline and for a benchmark comparison must be based on a common definition that is understood by all parties, or the results will generate misunderstandings.  A common definition is only a step along the route to a valuable baseline or benchmark, the data collection must be done on a consistent basis.  It is one thing to agree upon a definition and then have that definition consistently applied during data collection. Even metrics like IFPUG Function Points, which have a standard definition and rigorous training, can show up to a five percent variance between counters.  Less rigorously defined and trained metrics are unknowns that require due diligence by anyone that use them.