Listen to the Software Process and Measurement Cast (Here)

The Software Process and Measurement Cast features our interview with Charley Tichenor and Talmon Ben-Cnaan on the Software Non-Functional Assessment Process (SNAP).  SNAP is a standard process for measuring non-functional size.  Both Talmon and Charley are playing an instrumental role in developing and evolving the SNAP process and metric.  SNAP helps developers and leaders to shine a light on non-functional work required for software development and is useful for analyzing, planning and estimating work.

Talmon’s Bio:

Talmon Ben-Cnaan is the chairperson of the International Function Point User Group (IFPUG) committee for Non-Functional Software Sizing (NFSSC) and a Quality Manager at Amdocs. He led the Quality Measurements in his company, was responsible for collecting and analyzing measurements of software development projects and provided reports to senior management, based on those measurements. Talmon was also responsible for implementing Function Points in his organization.

Currently he manages quality operations and test methodology in Amdocs Testing division. The Amdocs Testing division includes more than 2,200 experts, located at more than 30 sites worldwide, and specializing in testing for the Telecommunication Service Providers.

Amdocs is the market leader in the Telecommunications market, with over 22,000 employees, delivering the most advanced business support systems (BSS), operational support systems (OSS), and service delivery to Communications Service Providers in more than 50 countries around the world.

Charley’s Bio:

Charley Tichenor has been a member of the International Function Point Users Group since 1991, and twice certified as a Certified Function Point Specialist.  He is currently a member of the IFPUG Non-functional Sizing Standards Committee, providing data collection and analysis support.  He recently retired from the US government with 32 years’ experience as an Operations Research Analyst, and is currently an Adjunct Professor with Marymount University in Washington, DC, teaching business analytics courses.  He has a BSBA degree from The Ohio State University, an MBA from Virginia Tech, and a Ph.D. in Business from Berne University.

 

Note:  Charley begins the interview with a work required disclaimer but then we SNAP to it … so to speak.

Next

In the next Software Process and Measurement Cast we will feature our essay on product owners.  The role of the product owner is one of the hardest to implement when embracing Agile. However how the role of the product owner is implemented is often a clear determinant of success with Agile.  The ideas in our essay can help you get it right.

We will also have new columns from the Software Sensei, Kim Pries and Jo Ann Sweeney with her Explaining Communication series.

Call to action!

We are in the middle of a re-read of John Kotter’s classic Leading Change on the Software Process and Measurement Blog.  Are you participating in the re-read? Please feel free to jump in and add your thoughts and comments!

After we finish the current re-read will need to decide which book will be next.  We are building a list of the books that have had the most influence on readers of the blog and listeners to the podcast.  Can you answer the question?

What are the two books that have most influenced you career (business, technical or philosophical)?  Send the titles to spamcastinfo@gmail.com.

First, we will compile a list and publish it on the blog.  Second, we will use the list to drive future  “Re-read” Saturdays. Re-read Saturday is an exciting new feature that began on the Software Process and Measurement blog on November 8th.  Feel free to choose you platform; send an email, leave a message on the blog, Facebook or just tweet the list (use hashtag #SPaMCAST)!

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Productivity is number of clothes sewed per hour.

Productivity is number of clothes sewed per hour.

Definition:

Labor productivity measures the efficiency of the transformation of labor into something of higher value. It is the amount of output per unit of input: an example in manufacturing terms can be expressed as  X widgets for every hour worked. Labor productivity (typically just called productivity) is a fairly simple manufacturing concept that is useful in IT. It is a powerful metric, made even more powerful by it’s simplicity. At its heart, productivity is a measure of the efficiency of a process (or group of processes). That knowledge can be tool to target and facilitate change. The problem with using productivity in a software environment is the lack of a universally agreed upon output unit of measure.

The lack of a universally agreed upon, tangible unit of output (for example cars in an automobile factory or steel from a steel mill) means that software processes often struggle to define and measure productivity because they’re forced to use esoteric size measures. IT has gone down three paths to solve this problem. The three basic camps to size software include relative measures (e.g. story points), physical measures (e.g. lines of code) and functional measures (e.g. function points). In all cases these measures of software size seek to measure the output of the processes and are defined independently of the input (effort or labor cost).

Formula:
The standard formula for labor productivity is:

Productivity = output / input

If you were using lines of code for productivity, the equation would be as follows:

Productivity = Lines of Code / Hours to Code the Lines of Code

Uses:
There are numerous factors that can influence productivity like skills, architecture, tools, time compression, programming language and level of quality. Organizations want to determine the impact of these factors on the development environment.

The measurement of productivity has two macro purposes. The first purpose is to determine efficiency. When productivity is known a baseline can be produced (line in the sand) then compared to external benchmarks. Comparisons between projects can indicate whether one process is better than another. The ability to make a comparison allows you to use efficiency as a tool in a decision-making process. The number and types of decisions that can be made using this tool are bounded only by your imagination and the granularity of the measurement.

The second macro rational for measuring productivity is as a basis for estimation. In its simplest form a parametric estimate can be calculated by multiplying size by a productivity rate.

Issues:
1. The lack of a consistent size measure is the biggest barrier for measuring productivity.
2. Poor time accounting runs a close second. Time account issues range from misallocation of time to equating billing time to effort time.
3. Productivity is not a single number and is most accurately described as a curve which makes it appear complicated.

Variants or Related Measures:
1. Cost per unit of work
2. Delivery rate
3. Velocity (agile)
4. Efficiency

Criticisms:
There are several criticisms of the using productivity in the software development and maintenance environment. The most prevalent is an argument that all software projects are different and therefore are better measured by metrics focusing on terminal value rather than by metrics focused on process efficiency (artesian versus manufacturing discussion). I would suggest that while the result of a software project tends to be different most of the steps taken are the same which makes the measure valid but that productivity should never be confused with value.

A second criticism of the use of productivity is a result of improper deployment. Numerous organizations and consultants promote the use of a single number for productivity. The use of a single number to describe the productivity the typical IT organization does match reality at the shop floor level when the metrics is used to make comparisons or for estimation. For example, would you expect a web project to have the same productivity rate of a macro assembler project? Would you expect a small project and a very large project to have the same productivity? In either case the projects would take different steps along their life cycles therefore we would expect their productivity to be different. I suggest that an organization analyze their data to look for clusters of performance. Typical clusters can include: client server projects, technology specific projects, package implementations and many others. Each will have a statistically different signature. An example of a productivity signature expressed as an equation is shown below:

Labor Productivity=46.719177-(0.0935884*Size)+(0.0001578*((Size-269.857)^2))

(Note this is an example of a very specialized productivity equation for a set of client server projects tailored for a design, code and unit testing. The results would not representative a typical organization.)

A third criticism is that labor productivity is an overly simple metric that does not reflect quality, value or speed. I would suggest that two out three of these criticisms are correct. Labor productivity is does not measure speed (although speed and effort are related) and does not address value (although value and effort may be related). Quality may be a red herring if rework due to defects is incorporated into productivity equation. In any case productivity should not be evaluated in a vacuum. Measurement programs should incorporate a palette of metrics to develop a holistic picture of a project or organization.

398982625_e475db57c5_bThe fourth step in our checklist for selecting a size metric is an evaluation of the temporal component. This step focuses your evaluation on answering the question, “Is the metric available when you need it?” When do you need to know how big a project is depends on what you intend to do with the data (that goal thing again). The majority of goals can be viewed as either estimation related (forward view) or measurement related (historical view). Different sizing metrics can be initially applied at different times during a project’s life. For example, Use Case Points can’t be developed until Use Cases are developed, lines of code can’t be counted until you are deep into construction, or at the very earliest, in technical design.

1Untitled

The major dichotomy is between estimation needs and measurement needs. As Figure 4 suggests, determining size from requirements (or earlier) will require focusing on functional metrics. Functional metrics can be applied earlier in the process (regardless of methodology) because they are based on a higher-level of abstraction that is more closely aligned with the business description of the project. Developing estimates or sizing later in the in the development process opens the possibility of more physical metrics which are more closely aligned with how developers view their work.

3466780657_aec63156b8_b

Selecting a software size metric sets you down a specific track.

Deciding on which software size metric you should use is a fairly momentous decision. Much like deciding on a development platform the decision on which size measure will commit an organization to different types of tools, process and techniques. For example the processes and tools needed to count lines of code would be different than those needed to support story points as a sizing technique. The goals of the measurement program will be instrumental in the determining which type of size metrics will be the most useful. Measurement goals will help you choose between four macro attributes of organization specific and industry defined metrics and between physical and logical metrics. For example, if benchmarking against other firms or industry data is required to attain your measurement goal using organizationally defined metrics would be less viable. Similarly if you have a heterogeneous software environment then selecting a functional metric would make more sense than using a physical metric (logical metrics normalizes varied technology).

Figure 1:Balancing Organizational Perspective Versus Organizational Environment

Untitled

The second checkbox is whether the measure has an externally defined and documented methodology. Why is definition important? Definition is the precursor to repeatability and consistency, which allows comparability. Consistency and repeatability are prerequisites for the ability to generate data needed to use the scientific method such as Six Sigma and tools used to support Kiazen. Finally, an external definition reduces the amount of effort that is required to construct and implement measurement programs.

Even where a definition exists a wide range of nuances are possible. Examples of the range of definitions begin with the most defined, the functional precision of ISO functional metrics to the less defined methodology of Use Case Points which began with a single academic definition and has evolved into many functional variants. The variants seen in UCP are a reflection of having no central control point to control methods evolution, which we will explore later in this model. The range of formality of definition is captured in Figure 2.

Untitled2

Figure 3 consolidates the view of formality of definition with the delineation between logical and physical metrics. Each measure has strengths and weaknesses. The first two items in our checklist are macro filters.

Untitled3

Each measure of size fits a specific combination of organizational goals, environmental constraints and needs however the field of potential software sizing metrics is wide and varied. Once the macro filter is applied each subsequent step in the checklist will narrow the field of potential size measures.

Small, Medium and Large or Low, Average  and High?

Lots of ways to measure

Measurement proliferation is when organizations decide that everything can should be measured therefore there is a rapid increase in measures and metrics. There are at least two measurement proliferation scenarios, and they both have as great of a chance of destroying your measurement program as helping it.  The two scenarios can be summarized as proliferation of breadth (measuring everything), followed by proliferation of depth (measuring the same thing many ways).

There are many items that are very important to measure, and it’s difficult to restrain yourself once you’ve started.  Because it seems important to measure many activities within an IT organization, many measurement teams think measuring everything is important.  Unfortunately, measuring what is really important is rarely easy or straightforward. When organizations slip into the “measure everything” mode, often times what gets measured is not related to the organization’s target behavior (the real needs).  When measures are not related to the target behavior, it will be easy to breed unexpected behaviors (not indeterminate or unpredictable, just not what was expected).  For example, one organization determined that the personal capability was a key metric.  More capability would translate into higher productivity and quality.  During the research into the topic, it was determined that capability was too difficult or “touchy-feely” to measure directly. The organization decided that counting requirements were a rough proxy for systems capability, and if systems capability went up, it must be a reflection of personal capability.  So, of course, they measured requirements.  One unanticipated behavior was that the requirements became more granular (actually more consistent), which caused the appearance that increased capability that could not be sustained (or easily approved) after the initial baseline of the measure.

The explosion of pre-defined measures drives the second proliferation scenario, having too many measures for the same concept. Capers Jones mentioned a number of examples in my interview with him for SPaMCAST.  Capers caught my imagination with the statement that there are many functional metrics are currently in use, ranging from IFPUG function points to cosmic; with use case points, NESMA function points and others in between.  This is in addition to counting lines of code, object and ants.  The fracturing in the world of functional metrics has occurred for many reasons, ranging from a natural maturation of the measurement category to the explosion of information sharing on the web. Regardless of the reason for the proliferation, using multiple measures for the same concept just because you can, can have unintended consequences. Having multiple measures for the same concept can cause focus making the concept seem more important than it is. Secondly having multiple measures may send a message that no one is quite sure how to measure the concept which can lead to confusion by the casual observer.  Generally this no reason to use multiple methods to measure the same concept within any organization. Even if each measure was understood, proliferation of multiple measures to measure the same concept will waste time and money. An organization I recently observed had implemented IFPUG Function Points, Cosmic Function Points, Use Case Points and Story Points to measure software size. This organization had spent the time and effort to find a conversion mechanism so that each measure could be combined for reporting. In this case the proliferation metrics for the same concept had become an ‘effort eater.’ Unfortunately it is not uncommon to see organizations trying to compare the productivity of projects based on very different yardsticks rather than adopting a single measure for size. The value of measurement tends to get lost when there is no common basis for discussion. A single measure will provide that common basis.

Both the proliferation of breadth and of depth have upsides, everybody gets to collect, report and use their favorite measure, and downsides, (which sound very similar) everybody gets to collect, report and use their favorite measures.  Extra choices come at a cost: the cost of effort, communication and compatibility.  The selection of measures and metrics must be approached with the end in mind – your organization’s business goals.  Allowing the proliferation of measures and metrics, whether in depth or breadth, must be approached with great thought, or it will cost you dearly in information and credibility.

1123084506_c69acb0424_b

The measurement/performance feedback loop causes an addiction to a single metric. The addict will exclude what is really important.

There is a famous adage: you get what you measure. When an organization measures a specific activity or process, people tend to execute so they maximize their performance against that measure. Managers and change agents often create measures to incentivize teams or individuals to perform work in a specific then to generate a feedback loop. The measurement/performance feedback loop causes an addiction to a single metric. The addict will exclude what is really important. Chasing the endorphins that the feedback will generate is the sin of lust in the measurement world. Lust, like wrath, is a loss of control which affects your ability to think clearly. Balanced goals and medium to long-term focus are tools to defeat the worst side effects of measurement lust. The ultimate solution is a focus on the long-term goals of the organization.

How does this type of unbalanced behavior occur?  Usually measurement lust is generated by either an unbalanced measurement programs or performance compensation programs.   Both cases can generate the same types of unintended consequences. I call this the “one number syndrome”. An example of the “one number syndrome” is when outsourcing contracts include penalty and bonus clauses based on a single measure, such as productivity improvements.  Productivity is a simple metric that can be affected by a wide range of project and organizational attributes. Therefore just focusing on measuring just productivity can have all sorts of outcomes as teams tweak the attributes affecting productivity and then review performance based on feedback.  For example, one common tactic used to influence productivity is by changing the level of quality that a project is targeting; generally higher quality generates lower productivity and vice versa. Another typical example of organizations or teams maximize productivity is to throttle the work entering the organization. Reducing the work entering an organization or team generally increases productivity. In our examples the feedback loop created by fixating on improving productivity may have the unintended consequence.

A critical shortcoming caused by measurement lust is a shift toward short-term thinking as teams attempt to maximize the factors that will use to just their performance. We have all seen the type of short-term thinking that occurs when a manager (or an organization) does everything in their power to make some monthly goal. At the time the choices are made they seem to be perfectly rational. Short-term thinking has the ability to convert the choices made today into the boat anchors of the next quarter. For example, right after I left university I worked for a now defunct garment manufacturer. On occasion salespeople would rush a client into an order at the end of a sales cycle to make their quota. All sorts of shenanigans typically ensued including returns, sale rebates but the behavior always caught up one or two sales periods later. In a cycle of chasing short-term goals with short-term thinking, a major failure is merely a matter of time. I’m convinced from reading the accounts of the Enron debacle that the cycle of short-term thinking generated by the lust to meet their numbers made it less and less likely that anyone could perceive just how irrational their decisions were becoming.

The fix is easy (at least conceptually). You need to recognize that measurement is a behavioral tool and create a balanced set of measures (frameworks like the Balanced Scorecard are very helpful) that therefore encourage balanced behavior.  I strongly suggest that as you are defining measures and metrics, take the time to forecast the behaviors each measure could generate.  Ask yourself whether these are the behaviors you want and whether other measures will be needed to avoid negative excesses.

Lust rarely occurs without a negative feedback loop that enables the behavior. Measures like productivity or velocity when used for purely process improvement or planning rather than to judge performance (or for bonuses) don’t create measurement lust. Balanced goals, balanced metrics, balanced feedback and balanced compensation are all a part of plan to generate balanced behavior. Imbalances of any of these layers will generate imbalances in behavior. Rebalancing can change behavior but just make sure it is the behavior you anticipate and it doesn’t cause unintended consequences by shifting measurement lust to another target.

3068483640_328b020efa_bGluttony is over-indulgence to the point of waste.  Gluttony brings to mind pictures of someone consuming food at a rate well beyond simple need.  In measurement, gluttony then can be exemplified by programs that collect data that has no near-term need or purpose.  When asked why the data was collected, the most common answer boils down to ‘we might need it someday…’

Why is the collection of data just in case, for future use or just because it can be done a problem?  The problems caused by measurement gluttony fall into two basic categories.  The first is that it wastes the effort of the measurement team, and second because it wastes credibility.

Wasting effort dilutes the measurement team’s resources that should be focused on collecting and analyzing data that can make a difference.  Unless the measurement program has unlimited resources, over collection can obscure important trends and events by reducing time for analysis and interpretation.  Any program that scrimps on analysis and interpretation is asking trouble, much as a person with clogged arteries.  Measures without analysis and interpretation are dangerous because people see what they like in the data due to clustering illusion (cognitive bias). Clustering illusion (or clustering bias) is the tendency to see patterns in clusters or streaks of in a smaller sample of data inside larger data sets. Once a pattern is seen it becomes difficult to stop people from believing that the does not exist.

The second problem of measurement gluttony occurs because it wastes the credibility of the measurement team.  Collecting data that is warehoused just in case it might be important causes those who provide the measures and metrics to wonder what is being done the data. Collecting data that you are not using will create an atmosphere of mystery and fear.  Add other typical organizational problems, such as not being transparent and open about communication of measurement results, and fear will turn into resistance.   A sure sign of problems is when you  begin hearing consistent questions about what you are doing, such as “just what is it that you do with this data?” All measures should have a feedback loop to those being measured so they understand what you are doing, how the data is being used and what the analysis means.  Telling people that you are not doing anything with the data doesn’t count as feedback. Simply put, don’t collect the data if you are not going to use it and make sure you are using the data you are collecting to make improvements!

A simple rule is to collect only the measurement data that you need and CAN use.  Make sure all stakeholders understand what you are going to do with the data.  If you feel that you are over-collecting, go on a quick data diet.  One strategy for cutting back is to begin in the areas you feel safest [SAFEST HOW?]. For example, start with a measure that you have not based a positive action on in the last 6 months. Gluttony in measurement gums up the works just like it does in a human body; the result of measurement gluttony slows down reactions and creates resistance, which can lead to a fatal event for your program.