Selecting a software size metric sets you down a specific track.

Deciding on which software size metric you should use is a fairly momentous decision. Much like deciding on a development platform the decision on which size measure will commit an organization to different types of tools, process and techniques. For example the processes and tools needed to count lines of code would be different than those needed to support story points as a sizing technique. The goals of the measurement program will be instrumental in the determining which type of size metrics will be the most useful. Measurement goals will help you choose between four macro attributes of organization specific and industry defined metrics and between physical and logical metrics. For example, if benchmarking against other firms or industry data is required to attain your measurement goal using organizationally defined metrics would be less viable. Similarly if you have a heterogeneous software environment then selecting a functional metric would make more sense than using a physical metric (logical metrics normalizes varied technology).

Figure 1:Balancing Organizational Perspective Versus Organizational Environment


The second checkbox is whether the measure has an externally defined and documented methodology. Why is definition important? Definition is the precursor to repeatability and consistency, which allows comparability. Consistency and repeatability are prerequisites for the ability to generate data needed to use the scientific method such as Six Sigma and tools used to support Kiazen. Finally, an external definition reduces the amount of effort that is required to construct and implement measurement programs.

Even where a definition exists a wide range of nuances are possible. Examples of the range of definitions begin with the most defined, the functional precision of ISO functional metrics to the less defined methodology of Use Case Points which began with a single academic definition and has evolved into many functional variants. The variants seen in UCP are a reflection of having no central control point to control methods evolution, which we will explore later in this model. The range of formality of definition is captured in Figure 2.


Figure 3 consolidates the view of formality of definition with the delineation between logical and physical metrics. Each measure has strengths and weaknesses. The first two items in our checklist are macro filters.


Each measure of size fits a specific combination of organizational goals, environmental constraints and needs however the field of potential software sizing metrics is wide and varied. Once the macro filter is applied each subsequent step in the checklist will narrow the field of potential size measures.

Small, Medium and Large or Low, Average  and High?

Lots of ways to measure

Measurement proliferation is when organizations decide that everything can should be measured therefore there is a rapid increase in measures and metrics. There are at least two measurement proliferation scenarios, and they both have as great of a chance of destroying your measurement program as helping it.  The two scenarios can be summarized as proliferation of breadth (measuring everything), followed by proliferation of depth (measuring the same thing many ways).

There are many items that are very important to measure, and it’s difficult to restrain yourself once you’ve started.  Because it seems important to measure many activities within an IT organization, many measurement teams think measuring everything is important.  Unfortunately, measuring what is really important is rarely easy or straightforward. When organizations slip into the “measure everything” mode, often times what gets measured is not related to the organization’s target behavior (the real needs).  When measures are not related to the target behavior, it will be easy to breed unexpected behaviors (not indeterminate or unpredictable, just not what was expected).  For example, one organization determined that the personal capability was a key metric.  More capability would translate into higher productivity and quality.  During the research into the topic, it was determined that capability was too difficult or “touchy-feely” to measure directly. The organization decided that counting requirements were a rough proxy for systems capability, and if systems capability went up, it must be a reflection of personal capability.  So, of course, they measured requirements.  One unanticipated behavior was that the requirements became more granular (actually more consistent), which caused the appearance that increased capability that could not be sustained (or easily approved) after the initial baseline of the measure.

The explosion of pre-defined measures drives the second proliferation scenario, having too many measures for the same concept. Capers Jones mentioned a number of examples in my interview with him for SPaMCAST.  Capers caught my imagination with the statement that there are many functional metrics are currently in use, ranging from IFPUG function points to cosmic; with use case points, NESMA function points and others in between.  This is in addition to counting lines of code, object and ants.  The fracturing in the world of functional metrics has occurred for many reasons, ranging from a natural maturation of the measurement category to the explosion of information sharing on the web. Regardless of the reason for the proliferation, using multiple measures for the same concept just because you can, can have unintended consequences. Having multiple measures for the same concept can cause focus making the concept seem more important than it is. Secondly having multiple measures may send a message that no one is quite sure how to measure the concept which can lead to confusion by the casual observer.  Generally this no reason to use multiple methods to measure the same concept within any organization. Even if each measure was understood, proliferation of multiple measures to measure the same concept will waste time and money. An organization I recently observed had implemented IFPUG Function Points, Cosmic Function Points, Use Case Points and Story Points to measure software size. This organization had spent the time and effort to find a conversion mechanism so that each measure could be combined for reporting. In this case the proliferation metrics for the same concept had become an ‘effort eater.’ Unfortunately it is not uncommon to see organizations trying to compare the productivity of projects based on very different yardsticks rather than adopting a single measure for size. The value of measurement tends to get lost when there is no common basis for discussion. A single measure will provide that common basis.

Both the proliferation of breadth and of depth have upsides, everybody gets to collect, report and use their favorite measure, and downsides, (which sound very similar) everybody gets to collect, report and use their favorite measures.  Extra choices come at a cost: the cost of effort, communication and compatibility.  The selection of measures and metrics must be approached with the end in mind – your organization’s business goals.  Allowing the proliferation of measures and metrics, whether in depth or breadth, must be approached with great thought, or it will cost you dearly in information and credibility.

04_14 Garlic

Producing now, so we can consume later.

A food chain is the sequence between production and consumption. In software development the development team generates the functionality that is delivered to a user. In the measurement food chain, the measures that the team generates and collect marks the beginning of process that is consumed to create analysis and reports.  As the team develops, enhances or maintains functionality they consume raw materials, such as effort, ideas and the time to produce an output to be measured.  Managers and administrators monitor the consumption of inputs, the process of transformation and the outputs.  Each of these components can be analyzed and measured; transformed into a number that equates to value or cost.  The comparison of value to cost can be evaluated against the trials and tribulation of production, adding a significant component to the overall value equation.

The question that begs to be asked is ‘who needs this data?’  Who can and does leverage the output of measurement?  Does the audience for measurement include the project and support personnel that create and maintain the functionality?  Or is measurement merely a tool to control the work and workers?   In order to maximize value of you metrics program all constituencies must derive value: development teams, administrators, project managers and organizational managers.  Design measures with this end in mind.


The measurement/performance feedback loop causes an addiction to a single metric. The addict will exclude what is really important.

There is a famous adage: you get what you measure. When an organization measures a specific activity or process, people tend to execute so they maximize their performance against that measure. Managers and change agents often create measures to incentivize teams or individuals to perform work in a specific then to generate a feedback loop. The measurement/performance feedback loop causes an addiction to a single metric. The addict will exclude what is really important. Chasing the endorphins that the feedback will generate is the sin of lust in the measurement world. Lust, like wrath, is a loss of control which affects your ability to think clearly. Balanced goals and medium to long-term focus are tools to defeat the worst side effects of measurement lust. The ultimate solution is a focus on the long-term goals of the organization.

How does this type of unbalanced behavior occur?  Usually measurement lust is generated by either an unbalanced measurement programs or performance compensation programs.   Both cases can generate the same types of unintended consequences. I call this the “one number syndrome”. An example of the “one number syndrome” is when outsourcing contracts include penalty and bonus clauses based on a single measure, such as productivity improvements.  Productivity is a simple metric that can be affected by a wide range of project and organizational attributes. Therefore just focusing on measuring just productivity can have all sorts of outcomes as teams tweak the attributes affecting productivity and then review performance based on feedback.  For example, one common tactic used to influence productivity is by changing the level of quality that a project is targeting; generally higher quality generates lower productivity and vice versa. Another typical example of organizations or teams maximize productivity is to throttle the work entering the organization. Reducing the work entering an organization or team generally increases productivity. In our examples the feedback loop created by fixating on improving productivity may have the unintended consequence.

A critical shortcoming caused by measurement lust is a shift toward short-term thinking as teams attempt to maximize the factors that will use to just their performance. We have all seen the type of short-term thinking that occurs when a manager (or an organization) does everything in their power to make some monthly goal. At the time the choices are made they seem to be perfectly rational. Short-term thinking has the ability to convert the choices made today into the boat anchors of the next quarter. For example, right after I left university I worked for a now defunct garment manufacturer. On occasion salespeople would rush a client into an order at the end of a sales cycle to make their quota. All sorts of shenanigans typically ensued including returns, sale rebates but the behavior always caught up one or two sales periods later. In a cycle of chasing short-term goals with short-term thinking, a major failure is merely a matter of time. I’m convinced from reading the accounts of the Enron debacle that the cycle of short-term thinking generated by the lust to meet their numbers made it less and less likely that anyone could perceive just how irrational their decisions were becoming.

The fix is easy (at least conceptually). You need to recognize that measurement is a behavioral tool and create a balanced set of measures (frameworks like the Balanced Scorecard are very helpful) that therefore encourage balanced behavior.  I strongly suggest that as you are defining measures and metrics, take the time to forecast the behaviors each measure could generate.  Ask yourself whether these are the behaviors you want and whether other measures will be needed to avoid negative excesses.

Lust rarely occurs without a negative feedback loop that enables the behavior. Measures like productivity or velocity when used for purely process improvement or planning rather than to judge performance (or for bonuses) don’t create measurement lust. Balanced goals, balanced metrics, balanced feedback and balanced compensation are all a part of plan to generate balanced behavior. Imbalances of any of these layers will generate imbalances in behavior. Rebalancing can change behavior but just make sure it is the behavior you anticipate and it doesn’t cause unintended consequences by shifting measurement lust to another target.

3068483640_328b020efa_bGluttony is over-indulgence to the point of waste.  Gluttony brings to mind pictures of someone consuming food at a rate well beyond simple need.  In measurement, gluttony then can be exemplified by programs that collect data that has no near-term need or purpose.  When asked why the data was collected, the most common answer boils down to ‘we might need it someday…’

Why is the collection of data just in case, for future use or just because it can be done a problem?  The problems caused by measurement gluttony fall into two basic categories.  The first is that it wastes the effort of the measurement team, and second because it wastes credibility.

Wasting effort dilutes the measurement team’s resources that should be focused on collecting and analyzing data that can make a difference.  Unless the measurement program has unlimited resources, over collection can obscure important trends and events by reducing time for analysis and interpretation.  Any program that scrimps on analysis and interpretation is asking trouble, much as a person with clogged arteries.  Measures without analysis and interpretation are dangerous because people see what they like in the data due to clustering illusion (cognitive bias). Clustering illusion (or clustering bias) is the tendency to see patterns in clusters or streaks of in a smaller sample of data inside larger data sets. Once a pattern is seen it becomes difficult to stop people from believing that the does not exist.

The second problem of measurement gluttony occurs because it wastes the credibility of the measurement team.  Collecting data that is warehoused just in case it might be important causes those who provide the measures and metrics to wonder what is being done the data. Collecting data that you are not using will create an atmosphere of mystery and fear.  Add other typical organizational problems, such as not being transparent and open about communication of measurement results, and fear will turn into resistance.   A sure sign of problems is when you  begin hearing consistent questions about what you are doing, such as “just what is it that you do with this data?” All measures should have a feedback loop to those being measured so they understand what you are doing, how the data is being used and what the analysis means.  Telling people that you are not doing anything with the data doesn’t count as feedback. Simply put, don’t collect the data if you are not going to use it and make sure you are using the data you are collecting to make improvements!

A simple rule is to collect only the measurement data that you need and CAN use.  Make sure all stakeholders understand what you are going to do with the data.  If you feel that you are over-collecting, go on a quick data diet.  One strategy for cutting back is to begin in the areas you feel safest [SAFEST HOW?]. For example, start with a measure that you have not based a positive action on in the last 6 months. Gluttony in measurement gums up the works just like it does in a human body; the result of measurement gluttony slows down reactions and creates resistance, which can lead to a fatal event for your program.


Greed is taking all the food and not leaving some for everyone else.

Greed is taking all the food and not leaving some for everyone else.

Greed, in metrics programs, means allowing metrics to be used as a tool to game the system to gain more resources than one needs or deserves.  At that point measurement programs start down the path to abandonment. The literature shows that greed, like envy, is affected by a combination of personal and organizational attributes.   Whether the root of the problem is nature or nurture, organizational culture can make the incidence of greed worse and that is something we can do something about.

One of the critical cultural drivers that create a platform for greed is fear.  W. Edward Deming in his famous 14 Principles addressed fear: “Drive out fear, so that everyone may work effectively for the company.” Fear is its own disease, however combined with an extremely competitive culture that stresses win/lose transactions, it creates an atmosphere that causes greed to become an economically rational behavior.  Accumulating and hoarding resources reduces your internal competitors’ ability to compete and reduces the possibility of losing because of lack of resources.  Fear-driven greed creates its own insidious cycle of ever increasing fear as the person infected with greed fears that their resource horde is at risk and requires defense (attributed to Sun Tzu in the Art of War). An example of the negative behaviors caused by fear that I recently heard about was a company that had announced that they cull the lower ten percent of the organization annually at the beginning of last year.  Their thought was that completion would help them identify the best and the brightest.  In a recent management meeting the person telling the story indicated that the CIO had expressed exasperation with projects that hadn’t shared resources and that there were cases in which personnel managers had actively redirected resources to less critical projects.

Creating an atmosphere that fosters greed can generate a whole host of bad behaviors including:

  1. Disloyalty
  2. Betrayal
  3. Hoarding
  4. Cliques/silos
  5. Manipulation of authority

Coupling goals, objectives and bonuses to measures in your metrics program can induce greed and have a dramatic effect on many of the other Seven Deadly Sins. For example, programs that have wrestled with implementing a common measure of project size and focused on measuring effectiveness and efficiency will be able to highlight how resources are used.  Organizations that then set goals that based on comparing team effectiveness and efficiency will create an environment in which hoarding resources generate a higher economic burden on the hoarder, because it reduces the efficiency of other teams.  That potential places a burden on a measurement program to create an environment where greed is less likely to occur.

Measurement programs can help create an atmosphere that defuses greed by providing transparency and accountability for results. Alternately as we have seen in earlier parts of this essay, poor measurement programs can and do foster a wide range of poor behaviors.

14545519494_12ab1ba776_kSloth plagues many measurement programs as they age.  As time goes by, it is easy for practitioners to drift away from the passionate pursuit of transforming data into knowledge. Sloth in measurement programs is typically not caused by laziness. Leaders of measurement groups begin as true believers, full of energy. However over time, many programs fall prey to wandering relevance. When relevance is allowed to waiver it is very difficult to maintain the same level of energy as when the program was new and shiny. Relevance can slip away if measurement goals are not periodically challenged and validated. An overall reduction in energy can occur even when goals are synchronized, if there is a conflict on how the data will be used and analyzed between any of the stakeholder classes (measurement team, management or the measured). Your energy will wane if your work results in public floggings or fire drills (at the very least it will make you unpopular).

The drift into sloth may be a reflection of a metrics palette that is not relevant to the organization’s business, therefore not likely to produce revelations that create excitement and interest.  This can cause a cascade of further issues.  Few metrics programs begin life by selecting irrelevant metrics, except by mistake, however over time relevance can wander as goals and organizational needs change.  Without consistent review, relevance will wane and it will be easy for metrics personnel to lose interest and become indifferent and disengaged.

In order to avoid or reclaim your program from sloth due to drifting goals; synchronize measurement goals with the organization goals periodically.  I suggest mapping each measurement goal and measure to the organizations goals.  If a direct link can’t be traced, I suggest that you replace the measure.  Note: measurement goals should be reviewed and validated any time a significant management change occurs.

When usage is the culprit, your job is to counsel all stakeholders on proper usage. However, if management wants to use measurement as a stick, it is their prerogative. Your prerogative is to change fields or to act out and accept the consequences. If the usage is a driver for lack of energy, you probably failed much earlier in the measurement program and turning the ship will be very difficult. Remember that it pays to spend time counseling the organization about how to use measurement data from day one rather than getting trapped in a reactionary mode.

The same symptoms occur when management is either disinterested (not engaged and not disposed positively or negatively toward the topic) or has become uninterested (disengaged). The distinction between disinterested and uninterested is important because the solutions are different. Disinterest requires marketing to find a reason to care; to be connected.  A stakeholder that has become uninterested needs to be reconnected with by providing information so their decisions matter.  Whatever the reason for actively disengaging or losing interest, loosing passion for metrics will sap the vitality of your program and begin a death spiral.  Keep your metrics relevant and that relevance will provide protection against waning interest. Metrics professionals should ensure there is an explicit linkage between your metrics palate and the business goals of your organization.  Periodically audit your metrics program.  As part of the audit map the linkages between each metric and the organizations business goals.  Make sure you are passionate about what you do.  Sharing your passion of developing knowledge and illustrating truth will help generate a community of need and support.

Synchronizing goals, making metrics relevant and instilling passion may not immunize your metrics program from failure but they will certainly stave off the deadly sin of sloth. If you can’t generate passion or generate information and knowledge from the metrics program to generate relevance consider a new position, because in the long run not making the change isn’t really an option..


Get every new post delivered to your Inbox.

Join 4,372 other followers