Listen to the Software Process and Measurement Cast (Here)

The Software Process and Measurement Cast features our interview with Charley Tichenor and Talmon Ben-Cnaan on the Software Non-Functional Assessment Process (SNAP).  SNAP is a standard process for measuring non-functional size.  Both Talmon and Charley are playing an instrumental role in developing and evolving the SNAP process and metric.  SNAP helps developers and leaders to shine a light on non-functional work required for software development and is useful for analyzing, planning and estimating work.

Talmon’s Bio:

Talmon Ben-Cnaan is the chairperson of the International Function Point User Group (IFPUG) committee for Non-Functional Software Sizing (NFSSC) and a Quality Manager at Amdocs. He led the Quality Measurements in his company, was responsible for collecting and analyzing measurements of software development projects and provided reports to senior management, based on those measurements. Talmon was also responsible for implementing Function Points in his organization.

Currently he manages quality operations and test methodology in Amdocs Testing division. The Amdocs Testing division includes more than 2,200 experts, located at more than 30 sites worldwide, and specializing in testing for the Telecommunication Service Providers.

Amdocs is the market leader in the Telecommunications market, with over 22,000 employees, delivering the most advanced business support systems (BSS), operational support systems (OSS), and service delivery to Communications Service Providers in more than 50 countries around the world.

Charley’s Bio:

Charley Tichenor has been a member of the International Function Point Users Group since 1991, and twice certified as a Certified Function Point Specialist.  He is currently a member of the IFPUG Non-functional Sizing Standards Committee, providing data collection and analysis support.  He recently retired from the US government with 32 years’ experience as an Operations Research Analyst, and is currently an Adjunct Professor with Marymount University in Washington, DC, teaching business analytics courses.  He has a BSBA degree from The Ohio State University, an MBA from Virginia Tech, and a Ph.D. in Business from Berne University.

 

Note:  Charley begins the interview with a work required disclaimer but then we SNAP to it … so to speak.

Next

In the next Software Process and Measurement Cast we will feature our essay on product owners.  The role of the product owner is one of the hardest to implement when embracing Agile. However how the role of the product owner is implemented is often a clear determinant of success with Agile.  The ideas in our essay can help you get it right.

We will also have new columns from the Software Sensei, Kim Pries and Jo Ann Sweeney with her Explaining Communication series.

Call to action!

We are in the middle of a re-read of John Kotter’s classic Leading Change on the Software Process and Measurement Blog.  Are you participating in the re-read? Please feel free to jump in and add your thoughts and comments!

After we finish the current re-read will need to decide which book will be next.  We are building a list of the books that have had the most influence on readers of the blog and listeners to the podcast.  Can you answer the question?

What are the two books that have most influenced you career (business, technical or philosophical)?  Send the titles to spamcastinfo@gmail.com.

First, we will compile a list and publish it on the blog.  Second, we will use the list to drive future  “Re-read” Saturdays. Re-read Saturday is an exciting new feature that began on the Software Process and Measurement blog on November 8th.  Feel free to choose you platform; send an email, leave a message on the blog, Facebook or just tweet the list (use hashtag #SPaMCAST)!

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

398982625_e475db57c5_bThe fourth step in our checklist for selecting a size metric is an evaluation of the temporal component. This step focuses your evaluation on answering the question, “Is the metric available when you need it?” When do you need to know how big a project is depends on what you intend to do with the data (that goal thing again). The majority of goals can be viewed as either estimation related (forward view) or measurement related (historical view). Different sizing metrics can be initially applied at different times during a project’s life. For example, Use Case Points can’t be developed until Use Cases are developed, lines of code can’t be counted until you are deep into construction, or at the very earliest, in technical design.

1Untitled

The major dichotomy is between estimation needs and measurement needs. As Figure 4 suggests, determining size from requirements (or earlier) will require focusing on functional metrics. Functional metrics can be applied earlier in the process (regardless of methodology) because they are based on a higher-level of abstraction that is more closely aligned with the business description of the project. Developing estimates or sizing later in the in the development process opens the possibility of more physical metrics which are more closely aligned with how developers view their work.

3466780657_aec63156b8_b

Selecting a software size metric sets you down a specific track.

Deciding on which software size metric you should use is a fairly momentous decision. Much like deciding on a development platform the decision on which size measure will commit an organization to different types of tools, process and techniques. For example the processes and tools needed to count lines of code would be different than those needed to support story points as a sizing technique. The goals of the measurement program will be instrumental in the determining which type of size metrics will be the most useful. Measurement goals will help you choose between four macro attributes of organization specific and industry defined metrics and between physical and logical metrics. For example, if benchmarking against other firms or industry data is required to attain your measurement goal using organizationally defined metrics would be less viable. Similarly if you have a heterogeneous software environment then selecting a functional metric would make more sense than using a physical metric (logical metrics normalizes varied technology).

Figure 1:Balancing Organizational Perspective Versus Organizational Environment

Untitled

The second checkbox is whether the measure has an externally defined and documented methodology. Why is definition important? Definition is the precursor to repeatability and consistency, which allows comparability. Consistency and repeatability are prerequisites for the ability to generate data needed to use the scientific method such as Six Sigma and tools used to support Kiazen. Finally, an external definition reduces the amount of effort that is required to construct and implement measurement programs.

Even where a definition exists a wide range of nuances are possible. Examples of the range of definitions begin with the most defined, the functional precision of ISO functional metrics to the less defined methodology of Use Case Points which began with a single academic definition and has evolved into many functional variants. The variants seen in UCP are a reflection of having no central control point to control methods evolution, which we will explore later in this model. The range of formality of definition is captured in Figure 2.

Untitled2

Figure 3 consolidates the view of formality of definition with the delineation between logical and physical metrics. Each measure has strengths and weaknesses. The first two items in our checklist are macro filters.

Untitled3

Each measure of size fits a specific combination of organizational goals, environmental constraints and needs however the field of potential software sizing metrics is wide and varied. Once the macro filter is applied each subsequent step in the checklist will narrow the field of potential size measures.

Size matters

Size matters.

All jokes aside, size matters. Size matters because at least intellectually we all recognize that there is a relationship between the size of product and the effort required to build. We might argue over degree of the relationship or whether there are other attributes required to define the relationship, but the point is that size and effort are related. Size is important for estimating project effort, cost and duration. Size also provides us with a platform for topics as varied as scope management (defining scope creep and churn) to benchmarking. In a nutshell, size matters both as an input into the planning and controlling development processes and as a denomination to enable comparison between projects.

Finding the specific measure of software size for your organization is part art and part science. The selection of your size measure must deliver the data need to meet the measurement goal and to fit within the corporate culture (culture includes both people and the methodologies the organization uses). A framework for evaluation would include the following categories:

  • Supports measurement goal
  • Industry recognized
  • Published methodology
  • Useable when needed
  • Accurate
  • Easy enough
Careful, you might come up short.

Careful, you might come up short.

Using a single metric to represent the performance of entire team or organization is like picking a single point in space.  The team can move in an infinite number of directions from that point in their next sprint or project.  If the measure is important to the team we would assume that human nature would tend to push the team to maximize their performance. The opposite would be true if it was not important to the team. Gaming, positive or negative, often occur at the expense of other critical measures.  An example I observed (more than once) was a contract that specifies payment on productivity (output per unit of input) without mechanisms to temper human nature.  In most cases time-to-market and quality where measured, but were not involved in payment.  In each case, productivity was maximized at the expense of quality or time-to-market.  These were unintended consequences of poorly constructed contracts; in my opinion neither side of the contractual equation consequently wanted to compromise quality or time-to-market.

While developing a single metric is an admirable goal, the process of constructing this type of metric will require substantial thought and effort.  Metrics programs that are still in their development period typically cannot afford the time or effort required for developing a single metric (or the loss of organizational capital if they fail). Regardless of where the metrics program is in its development process, I would suggest an approach that develops an index of individual metrics or the use of a balanced scorecard (a group of metrics that show a balanced view of organizational performance developed by Kaplan and Norton in the 1990s – we will tackle this in detail in the future) is more expeditious.  Using a pallet of well know measures and metrics will leverage the basic knowledge and understanding measurement programs and their stakeholders have developed during the development and implementation of individual measures and metrics.

Will one metric make communication easier?

Will a single metric make communication easier?

Measuring software development (inclusive of development, enhancement and support activities) generally requires a pallet of specific measures. Measures typically include productivity, time-to-market, quality, customer satisfaction and budget (the list can go on and on). Making sense of the measures that might be predictive (forecast the future) or reflective (tell us about the past) and may sent seemly conflicting or contradictory messages is difficult. Proponents of a single metrics suggest simplifying the process by developing or adopting a single metric that they believe embodies the performance towards the organizations goals and predicts whether that performance will continue. Can adopting a single metric as a core principle in a metrics program enhance communication and therefore the value of a measurement program?

The primary goal of any metrics program in IT, whether stated or not, is to generate and communicate information.  A metrics program acts as a platform to connect metrics users and data providers. This process of connection is done by collecting, analyzing and communicating information to all of the stakeholders. The IT environment in general and the software development environment specifically is complex. That complexity is often translated in to a wide variety of measures and metrics that are difficult to understand and consume unless you spend your career analyzing the data. Unless you are working for a think tank that level of analysis is generally out of reach which is why managers and measurement professionals have and continue to seek a single number to use to communicate progress and predict the future of their departments.

Development of a single metric that can be easily explained holds great promise as a means for simplifying communication.  A single metric will simplify communication needs if (and it is a big if), a metric can be developed that is easily explainable and is it as useful in predicting performance as most metrics are in reflecting performance.  While there are many elements of good communication such as a simple message, ensuring the communication has few moving parts and is relevant to the receiver are critical.  A simple metric by definition has few moving parts.  The starting point for developing a single metric are the design requirements of simplicity and relevance which can be controlled and tuned (hopefully) by the measurement group as business needs change.

Developing a single metric is a tall order for a metric program, which is why most approaches to this problem use indexes (such as Larry Putnam’s PI). Indices are generally more difficult (albeit there are exceptions, such as the Dow Jones Industrial Average) to understand for wider audiences or fall into the overly academic trap requiring a trained a cadre to generate and interpret them. Regardless of what has been pursued, a single metric done correctly would foster communication and communication is instrumental for generating value and success from a measurement program.

A good number for a birthday but not for a metric!

A good number for a birthday but not for a metric!

In the Lord of the Rings, J.R.R. Tolkien wrote that nine rings of power were created, however a single ring was then fashioned to bind them all.  The goal on many metrics programs is to find the “one ring,” or to create a single metric that will accurately reflect the past, predict the future and track changes.  The creation of a single, easily understood metric that can satisfy all of these needs is the holy grail of all metrics programs. To date the quest for the one metric has been fruitless. However while the quest should continue until both research and testing can be done, adopting a single metric can be dangerous.

A single, understandable metric would have substantial benefits, ranging from the ability to provide an improved communications platform, to a tool to support process improvement activities on areas of the organization where change can make a difference in the metric. An example of a single metric is the Dow Jones Industrial Average (DJIA), which summarizes a large number of individual measures (individual stock prices) into a single easily explainable index. Whether you like or dislike the DJIA most everyone can interpret changes in the index and trends over time. Every daily business program en Market Place (American Public Media, heard on National Public Radio) reports the performance of the DJIA. The problem is when DJIA becomes the only number bereft of context that a problem begins to occur. Often the simplicity has become a narcotic.

Anyone attempting to find a one metric solution (or to use the one metric solutions currently marketed) have a tough hill to climb. There are issues with a one metric solution that must be addressed when designing and developing the solution.  The first of these issues is context. What is important to one organization is different what is important to another and what is important today may not be important tomorrow. How would a single metric morph to reflect these complexities? Lord of the Rings had fewer changes in goals than a typical IT department. A second category of issues ranges is environmental complexity. Complexity includes the interactions between the metric and the human users through the basic mathematical complexity of creating a metric with both the historical and predictive power required.  In my opinion, the most intricate issues swirl around the metrics/human interaction.  In general people will use any measure for wildly divergent purposes ranging providing status to identifying process improvement. Each different use triggers a different behavior.

When seeking a single metric we need to answer the bottom line question is the effort worth the cost. Stated in a less black and white manner, will any single metric be more valuable as a communication tool than the loss of information and transparency that the metric would have?

 

Small, Medium and Large or Low, Average  and High?

Lots of ways to measure

Measurement proliferation is when organizations decide that everything can should be measured therefore there is a rapid increase in measures and metrics. There are at least two measurement proliferation scenarios, and they both have as great of a chance of destroying your measurement program as helping it.  The two scenarios can be summarized as proliferation of breadth (measuring everything), followed by proliferation of depth (measuring the same thing many ways).

There are many items that are very important to measure, and it’s difficult to restrain yourself once you’ve started.  Because it seems important to measure many activities within an IT organization, many measurement teams think measuring everything is important.  Unfortunately, measuring what is really important is rarely easy or straightforward. When organizations slip into the “measure everything” mode, often times what gets measured is not related to the organization’s target behavior (the real needs).  When measures are not related to the target behavior, it will be easy to breed unexpected behaviors (not indeterminate or unpredictable, just not what was expected).  For example, one organization determined that the personal capability was a key metric.  More capability would translate into higher productivity and quality.  During the research into the topic, it was determined that capability was too difficult or “touchy-feely” to measure directly. The organization decided that counting requirements were a rough proxy for systems capability, and if systems capability went up, it must be a reflection of personal capability.  So, of course, they measured requirements.  One unanticipated behavior was that the requirements became more granular (actually more consistent), which caused the appearance that increased capability that could not be sustained (or easily approved) after the initial baseline of the measure.

The explosion of pre-defined measures drives the second proliferation scenario, having too many measures for the same concept. Capers Jones mentioned a number of examples in my interview with him for SPaMCAST.  Capers caught my imagination with the statement that there are many functional metrics are currently in use, ranging from IFPUG function points to cosmic; with use case points, NESMA function points and others in between.  This is in addition to counting lines of code, object and ants.  The fracturing in the world of functional metrics has occurred for many reasons, ranging from a natural maturation of the measurement category to the explosion of information sharing on the web. Regardless of the reason for the proliferation, using multiple measures for the same concept just because you can, can have unintended consequences. Having multiple measures for the same concept can cause focus making the concept seem more important than it is. Secondly having multiple measures may send a message that no one is quite sure how to measure the concept which can lead to confusion by the casual observer.  Generally this no reason to use multiple methods to measure the same concept within any organization. Even if each measure was understood, proliferation of multiple measures to measure the same concept will waste time and money. An organization I recently observed had implemented IFPUG Function Points, Cosmic Function Points, Use Case Points and Story Points to measure software size. This organization had spent the time and effort to find a conversion mechanism so that each measure could be combined for reporting. In this case the proliferation metrics for the same concept had become an ‘effort eater.’ Unfortunately it is not uncommon to see organizations trying to compare the productivity of projects based on very different yardsticks rather than adopting a single measure for size. The value of measurement tends to get lost when there is no common basis for discussion. A single measure will provide that common basis.

Both the proliferation of breadth and of depth have upsides, everybody gets to collect, report and use their favorite measure, and downsides, (which sound very similar) everybody gets to collect, report and use their favorite measures.  Extra choices come at a cost: the cost of effort, communication and compatibility.  The selection of measures and metrics must be approached with the end in mind – your organization’s business goals.  Allowing the proliferation of measures and metrics, whether in depth or breadth, must be approached with great thought, or it will cost you dearly in information and credibility.

1123084506_c69acb0424_b

The measurement/performance feedback loop causes an addiction to a single metric. The addict will exclude what is really important.

There is a famous adage: you get what you measure. When an organization measures a specific activity or process, people tend to execute so they maximize their performance against that measure. Managers and change agents often create measures to incentivize teams or individuals to perform work in a specific then to generate a feedback loop. The measurement/performance feedback loop causes an addiction to a single metric. The addict will exclude what is really important. Chasing the endorphins that the feedback will generate is the sin of lust in the measurement world. Lust, like wrath, is a loss of control which affects your ability to think clearly. Balanced goals and medium to long-term focus are tools to defeat the worst side effects of measurement lust. The ultimate solution is a focus on the long-term goals of the organization.

How does this type of unbalanced behavior occur?  Usually measurement lust is generated by either an unbalanced measurement programs or performance compensation programs.   Both cases can generate the same types of unintended consequences. I call this the “one number syndrome”. An example of the “one number syndrome” is when outsourcing contracts include penalty and bonus clauses based on a single measure, such as productivity improvements.  Productivity is a simple metric that can be affected by a wide range of project and organizational attributes. Therefore just focusing on measuring just productivity can have all sorts of outcomes as teams tweak the attributes affecting productivity and then review performance based on feedback.  For example, one common tactic used to influence productivity is by changing the level of quality that a project is targeting; generally higher quality generates lower productivity and vice versa. Another typical example of organizations or teams maximize productivity is to throttle the work entering the organization. Reducing the work entering an organization or team generally increases productivity. In our examples the feedback loop created by fixating on improving productivity may have the unintended consequence.

A critical shortcoming caused by measurement lust is a shift toward short-term thinking as teams attempt to maximize the factors that will use to just their performance. We have all seen the type of short-term thinking that occurs when a manager (or an organization) does everything in their power to make some monthly goal. At the time the choices are made they seem to be perfectly rational. Short-term thinking has the ability to convert the choices made today into the boat anchors of the next quarter. For example, right after I left university I worked for a now defunct garment manufacturer. On occasion salespeople would rush a client into an order at the end of a sales cycle to make their quota. All sorts of shenanigans typically ensued including returns, sale rebates but the behavior always caught up one or two sales periods later. In a cycle of chasing short-term goals with short-term thinking, a major failure is merely a matter of time. I’m convinced from reading the accounts of the Enron debacle that the cycle of short-term thinking generated by the lust to meet their numbers made it less and less likely that anyone could perceive just how irrational their decisions were becoming.

The fix is easy (at least conceptually). You need to recognize that measurement is a behavioral tool and create a balanced set of measures (frameworks like the Balanced Scorecard are very helpful) that therefore encourage balanced behavior.  I strongly suggest that as you are defining measures and metrics, take the time to forecast the behaviors each measure could generate.  Ask yourself whether these are the behaviors you want and whether other measures will be needed to avoid negative excesses.

Lust rarely occurs without a negative feedback loop that enables the behavior. Measures like productivity or velocity when used for purely process improvement or planning rather than to judge performance (or for bonuses) don’t create measurement lust. Balanced goals, balanced metrics, balanced feedback and balanced compensation are all a part of plan to generate balanced behavior. Imbalances of any of these layers will generate imbalances in behavior. Rebalancing can change behavior but just make sure it is the behavior you anticipate and it doesn’t cause unintended consequences by shifting measurement lust to another target.

3068483640_328b020efa_bGluttony is over-indulgence to the point of waste.  Gluttony brings to mind pictures of someone consuming food at a rate well beyond simple need.  In measurement, gluttony then can be exemplified by programs that collect data that has no near-term need or purpose.  When asked why the data was collected, the most common answer boils down to ‘we might need it someday…’

Why is the collection of data just in case, for future use or just because it can be done a problem?  The problems caused by measurement gluttony fall into two basic categories.  The first is that it wastes the effort of the measurement team, and second because it wastes credibility.

Wasting effort dilutes the measurement team’s resources that should be focused on collecting and analyzing data that can make a difference.  Unless the measurement program has unlimited resources, over collection can obscure important trends and events by reducing time for analysis and interpretation.  Any program that scrimps on analysis and interpretation is asking trouble, much as a person with clogged arteries.  Measures without analysis and interpretation are dangerous because people see what they like in the data due to clustering illusion (cognitive bias). Clustering illusion (or clustering bias) is the tendency to see patterns in clusters or streaks of in a smaller sample of data inside larger data sets. Once a pattern is seen it becomes difficult to stop people from believing that the does not exist.

The second problem of measurement gluttony occurs because it wastes the credibility of the measurement team.  Collecting data that is warehoused just in case it might be important causes those who provide the measures and metrics to wonder what is being done the data. Collecting data that you are not using will create an atmosphere of mystery and fear.  Add other typical organizational problems, such as not being transparent and open about communication of measurement results, and fear will turn into resistance.   A sure sign of problems is when you  begin hearing consistent questions about what you are doing, such as “just what is it that you do with this data?” All measures should have a feedback loop to those being measured so they understand what you are doing, how the data is being used and what the analysis means.  Telling people that you are not doing anything with the data doesn’t count as feedback. Simply put, don’t collect the data if you are not going to use it and make sure you are using the data you are collecting to make improvements!

A simple rule is to collect only the measurement data that you need and CAN use.  Make sure all stakeholders understand what you are going to do with the data.  If you feel that you are over-collecting, go on a quick data diet.  One strategy for cutting back is to begin in the areas you feel safest [SAFEST HOW?]. For example, start with a measure that you have not based a positive action on in the last 6 months. Gluttony in measurement gums up the works just like it does in a human body; the result of measurement gluttony slows down reactions and creates resistance, which can lead to a fatal event for your program.