Metrics Minute Entry:  Automated Test Cases Passed

Audio Version:  SPaMCAST 217

Definition:

Automated Test Cases Passed (ATCP) is primarily a progress measure represented as a ratio that compares the number of automated test cases that have passed to the total number of automated cases that will be executed (TATC). Progress metrics are collected iteratively over time; for example, a day, a sprint, a release or a phase (waterfall). Progress measures are generally presented graphically to show the trend in a process’s output. This measure is easy to collect (counting physical items) and to interpret (simple graphs or a percentage).  The metric can be used to support organizational goals for test automation (combinable across many projects for trending) therefore tends to be adopted fairly early in a metrics program. The simplicity of the metric limits the predictive power because it does not reflect the overall complexity of the development environment; however the metric does provide a simple snapshot of activity.

Formula:

Automated Test Cases Passed is defined as the percentage of automated test cases that pass, divided by the total number of automated test cases. This is represented by the following equation:

ATCP          No. of automated test that pass

PATCP (%) =                ——– =      ( —————————————— )

TATC          Total no. of automated test cases

PATCP  = Percent automate test cases passed

ATCP     = Number of automated test cases

TATC     = Number of automated test cases

S-Curve Tracking Graph Example:

S-Curve

Uses:

During a Sprint:  Tracking the number of passed cases to total automated cases should follow the classic S-curve shape, therefore being valuable as part of tracking regimen. This tracking technique is especially useful if automated testing begins as coding begins (like Test Driven Development). Some organizations use tracking the number of automated test cases passed as a percentage of the total automated test cases that are planned as tool to define when testing is complete.  This is a BAD idea.

Conclusion of Sprint:  The number of automated test cases passed compared to the total number of automated test case can highlight potential issues if the percentage does not meet expectations.

Organizational:  Counting the number of automated test cases passed puts a focus on automation of testing. The old adage “you get what you measure” comes to mind.

Issues:

  • There are multiple points that test cases are developed in a test driven environment which makes construction of the classic S-curve graph problematic.
  • At the conclusion of the iteration rationalization is required to explain any automated test cases that have not been passed.  Un-passed tests, unless rationalized, will be viewed as a black mark on the project’s performance.  Note: un-passed tests can occur for many reason such as bad tests, defects or tests that have been created for functionality that has not been developed yet
  • The use and granularity of terms like test case and test plan can vary. This makes the use of individual project count data across more than one team possibly less useful at the organizational level than the trend of the number of automated test case passed.  Using a standard tool set can minimize definition and usage variability.

Related or Variants Measures and Metrics:

  • Total Automated Test Cases
  • Percentage of Test Cases Automated
  • Test Coverage (measure of the percentage of code covered by tests)
  • Test Cases Passed

Criticisms:

  • Counting test cases obscures the complexities of testing. Not every test case is of equal value in terms of explanative or predictive power; therefore, the passing of any individual case or specific group of cases may not be indicative of quality or when development will be done precisely. This criticism is true but incomplete. Simply put, if a group of tests are expected to pass at a specific point, then knowing how many have not passed gives the developer a rough idea of what is in-front of her.
  • Not all tests will ever be automated and not all types of testing is measurable by counting the number of test cases. True. Most automated testing is contextual; however, not all types of tests are contextual. Exploratory testing, described as simultaneous learning, test design and test execution, is an important tool to try to ensure that testing is not biased totally to the captured requirements.
  • More automated test cases do not equate to better testing.  In black and white terms, this criticism is true.  The premise is that by counting and reporting the number of automated test cases passed (or the total number of anything) will cause an inflation of the number over time based on the belief that more is better.  I would suggest rather than abandoning the metric, it would be better to remind the team that they should only add test cases that add value and that, like code, they should refactor test cases often (automated or not).  An interesting measure to add in order to address the inflationary potential might be to report the number of refactored or removed test cases.
  • Just because all of the automated test cases have passed does not mean you have achieved defect-free code. The criticism is true IF you fall prey to the bias that once all of the automated test cases have been passed (see the first two criticisms) you are finished.

Thoughts and comments?  Contact information the Software Process and Measurement Cast by email:  spamcastinfo@gmail.com or voicemail,  +1-206-888-6111.

Metrics Minute: Return on Investment (ROI)
Thomas M. Cagley Jr.

Audio Version:  SPaMCAST 163

Definition

ROI is a standard measure of project profitability that is the discounted profits over the life of the project expressed as a percentage of initial investment.  ROI is a classic financial metric that when applied to projects, is typically used as a technique for project acceptance or, in retrospect, as a tool to evaluate overall performance. When applied in agile projects, ROI can be used to determine priority of epics, themes or even stories.

Formula

ROI = (Average Net Benefits) ÷ (Initial Costs)

Average Net Benefits is the residual value that results from a project after adding total benefits for a specific period and subtracting all expenses for the project for that period.

Initial Costs includes any expense related to the initial development of the project.

Usage

The primary use of ROI is to compare a potential project to other possible projects in order to determine which makes the most sense to pursue (like Return on Assets – ROA).  Projects with a higher ROI are more likely to be included in the portfolio of projects.  When applying ROI to agile projects I suggest at the very least using ROI as a tool to prioritize epics and features into a release plan.

Alternately ROI can be used as a tool to evaluate continued investment in a product or a portfolio of products.  In this case the metric would be periodically recalculated to determine whether an overall project portfolio represents an efficient use of assets or if segments of the portfolio offer high a higher return.

Issues

There are several issues with the calculation and use of ROI as a decision tool including poor math, failing to make an apples-to-apples comparison or in extreme cases the combination of poor math and bad comparisons.  The most significant issues are:

Return on Investment is a financial tool designed to help build a business case to support decision making by evaluating the forecasted impact to your organization’s bottom line.  ROI or any other financial analyses should not be your only evaluation criteria.  High ROI’s may be an artifact of wishful thinking or creative accounting for benefits.  Rigorous follow-up helps stop this type of behavior.

A second issue is failing to account for the time value of money.  The time value of money suggests that a benefit or cost today has a higher absolute value than the same value or cost in the future.  Techniques like Net Present Value normalize benefits and costs in the future based on a discount rate (such as the presumed interest rate over the period or the organizations internal rate of return).

Comparisons of factors like ROI for two projects must be across similar timeframes so that an apples-to-apples comparison can be made.  In environments that are rapidly evolving shorten the comparison period to favor projects that deliver benefits earlier rather than later.  The comparison problem can be exacerbated when used for the smaller work packages usually seen in agile projects. Gross value may be more easily determined for work packages like stories.

A final issue is that intangible benefits may affect the rational for project selection which forces organizations to address valuation of intangible benefits. This issue is a relative of the “numbers don’t tell the whole story” argument. As in the discussion of comparing work items above, as the work packages become more granular, it may become more difficult to accurately evaluate intangible or non-business value in order to calculate ROI. Create rules for how you will allocate or value intangible benefits and costs. Valuation of intangible benefits requires a judgment call and unless that judgment is based on rules or guidelines those judgment calls will be open to question or manipulation.

Related Metrics

  • Return on Equity (ROE)
  • Return on Assets (ROA)
  • Payback Period
  • Internal Rate of Return (IRR)
  • Net Present Value
  • Total cost of ownership

Criticisms

Criticisms of ROI, like the issues noted earlier, revolve around how the metric is used.

This first criticism is that comparisons across different types of projects are difficult due differing cost and benefit structures that effect ROI.  An example of problems that occur when comparing differing types of projects would be making an investment decision about an infrastructure project versus a project that generates revenue. It is usually more difficult to identify and monetize the benefits of an infrastructure project which makes these types of projects more difficult to fund even though they may be required to enable projects with a more classic benefit stream. This criticism is true; however, I would suggest this is a failure to enforce a linkage between infrastructure projects to the impact on revenue streams.

A second criticism: Valuing intangible costs and benefits is at best an inexact science and is open to internal political games. This criticism is the most troublesome. There isn’t general agreement on how to value the intangible.   I would suggest not using a one-size-fits-all solution; rather I would suggest ensuring that valuations of intangible cost or benefits are consistently evaluated based on a risk adjusted discount rate. The higher the risk, the higher the discount rate would be set. A formal approach to measuring or estimating risk will be required to avoid manipulation of the analysis.

A third criticism of ROI centers on using a single point in time to qualify projects. The calculation of ROI generally reflects a snapshot of at particular moment in time. Unless you live under a rock it is difficult to forget that the business environment is – – – dynamic. The idea that an ROI calculation can be done and evaluated once to ensure that the organizations portfolio of projects continually maximizes value is ludicrous. To deal with the issue I suggest periodic revaluations of the portfolio to ensure that the business environment does not change enough to require changing the composition of the project portfolio.  Note: I would suggest that there is a need re-evaluate projects more often if they in the gray area between acceptance and rejection.

 

Audio Version:  SPaMCAST 161

When talking about metrics to measure agile processes one should begin by defining “why” you want to measure followed immediately by “what” we will do with the data when we know it. As I have noted before, there are only two reasons to measure: The first is to generate specific behaviors and second to predict the future.  Organizational goals provide the rational for what to measure and the type of measure determines whether it drives behavior or provides direction.

Change only occurs when three states are satisfied. There needs to be a trigger (1), those that are being asked to change must have the ability (2) and the motivation (3) to change.  Measurement as a tool to predict the future can provide the trigger and motivation for change.  Goals are a mechanism to order how we interpret the data generated as we measure but they are passed through many filters until they arrive at the individual where action happens.  The hierarchy begins with strategy and then is interpreted organizationally and operationally until individuals place their stamp on it and act.  Transparency and balance are required to help ensure that each interpretation stays true to the organizational goal.

Effective measurement is a balance.  On one side collection of the measures which are then synthesized into metrics.  Collection signals what is important to the organization which is balanced by the insights, actions, change and transformations that are generated.  Balance in measurement is much akin to the second law of thermodynamics.  To paraphrase, for every measure there must be an equal and opposite reaction.   A simple metrics model to enforce balance includes metrics in four categories

      1. Productivity
      1. Quality
      1. Predictability
      1. Value

The model help focus measurement in order to deliver value based on actions taken to make the business better. 

Agile measurement requires embracing seven philosophies as they relate to measurement.  Agile measurement must:

  • Reinforce desired AGILE behavior
  • Focus on results
  • Measure trends
  • Be easy to collect
  • Include context
  • Create real conversation
  • INCLUDE ONLY WHAT IS ABSOLUTELY NEEDED

If measurement incents behavior, incent the right behavior!  As an example, developing a tasks level schedule for the whole project then measuring tasks completed against that schedule flies in the face of the daily planning and re-planning required to support effectiveness of self-managed and self-organized teams.   

Interim steps are far less important than results.  Does the code work?  What did we finish?  Can it be delivered? By focusing on results we are far less likely to miss the big picture.  I recently read a status report that proudly announced that the team had met 95% of the dates during the current release.  The one they missed was the final delivery date . . .

In most cases any individual observation is a reflection of the process.  Deming used the term “common cause”.   The only way to address a “common cause” problem is to change the process.  Trends are useful in describing the capacity of the process (predicting the future).  I use a rule that individual observations are studied only when they are three standard deviations away from the mean. In other words, study individual observations when they are different enough to matter; otherwise, reflect on the trend of the observations.

Collecting information, any information, takes time and effort.  Firstly, any effort that you take away from delivering functionality reduces ITs value to the company.  Secondly, there is always a resistance to work that is perceived as overhead.  Automate as much as possible to reduce the impact and make sure the information collected can be consumed at the point of collection, as well as in the hallowed halls of the PMO.

What does a number really mean?  It has been said that a set of metrics have never solved a problem and while I don’t think in absolutes I would suggest that numbers without context are far less valuable and more open to highly variable interpretations that will require even more effort to contain and manage.

Generating a conversation is a corollary to the need for context.  Conversation causes data to be synthesized and contextualized so that common organizational knowledge is generated.  Knowledge is the real value. 

Only measure what is absolutely needed.  For every metric I would ask you to think about what you will do with the information you are requesting and then how perceived overhead will affect acceptance before you act.

Principles keep us honest and focused on delivering value to the organization.  Principles that an organization develops, adapts and adopts are a reflection of what the organization values.

A subset of all of the metrics in the world makes sense in the agile environment.  The pallet the Metrics Minute suggests is as follows.  The term pallet is used as a considered opinion, a pallet suggests selecting those that are really needed to paint the picture that has value. The measures in the pallet are as follows: 

      • ROI (Value)
      • Customer Satisfaction (value)
      • Team Satisfaction (value)
      • Business Value – Burn-up (Value)
      • Velocity – Burn-down (predictability)
      • Sprint Escape Rate (Predictability)
      • Test Case Pass Rate  – Automated (Quality)
      • Technical Debt (Quality)
      • Work-In-Process (Productivity)
      • Cycle Time (Productivity)

Each of these metrics will be (or have been) covered in detail in other Metrics Minute entries so that you can determine if these metrics are useful in answering the questions you have about your agile development program. Rarely will any individual project, team or organization need to use all of the metrics in the agile metrics pallet.   The pallet is just . . .  a pallet. Pick only those that you need for balanced view of your development process.  Create your own color by blending metrics.

Metrics Minute:  IFPUG Function Points

Audio Version on SPaMCAST 145

Description:

IFPUG Function Points are a measure of the functionality delivered by the project or application being counted based on a set of rules documented in the IFPUG Counting Practices Manual. The measure of delivered functionality is a proxy for size which can be used in estimating and measuring work. An analogy for function points is the measure of the number of square feet (or square meters) of a house. Knowing the number of square feet provides one view of the house but not other attributes such as price and number of bedrooms.Knowledge of functional size is a step in understanding development and / or maintenance effort.

Uses:

Denominator: Size is a descriptor that is generally used to add interpretive information to other attributes or as a tool to normalize other attributes. When used to normalize other measures or attributes, size is usually used as a denominator. Effort per function point is an example of using function points as denominator.

Estimation: Size is a partial predictor of effort or duration. Estimating projects is an important use of software size. Effort can be thought of as a function of size, behavior and technical complexity.

Reporting: Many measures and metrics are collected and used in most organizations to paint a picture of project performance, progress or success. Organizational report cards may also be leveraged, again with many individual metrics, any one of which may be difficult to compare individually.  Use function points as a denominator to synchronize many disparate measures so that they may be compared and reported.

Control: Understanding performance allows project managers, team leaders and project team members to understand where they are in an overall project or piece of work and therefore take action to change the trajectory of the work. Knowledge allows the organization to control the flow of work in order to influence the delivery of functionality and value in a predictable and controlled manner.

IFPUG Function Point Overview:

IFPUG Function Points are determined by classifying functionality found in an application into five categories. Three that relate to transactions and two that relate to data. All applications, whether they are performing a standard business function (like human resources or accounting) or are an embedded system have the same five components.

IFPUG function points classify whole transactions (elementary processes) into three categories. The first is an external input.  An external input brings information into the application and then stores that information in logical groups of data called internal logical files. External outputs and external inquiries are the second and third types of transactions. Both of these types of transactions transport information out of the application. The difference between the two types of transactions is that external output requires more processing logic than direct retrieval which is reflective of an external inquiry. Each transaction has a prescribed weight based on a low, average, high scale. Placement of an individual transaction on the scale is determined by counting the number of fields and logical files required to generate the transaction.

Data is classified into two categories, internal logical files and external interface files. Both groupings of data are user recognizable entities that represent holistic business concepts, such as customer and order. The primary difference between the two categories is that an internal logical file is maintained within the application being counted and an external interface file is maintained elsewhere. Each holistic grouping of data can be comprised of more than one subgroup. For example in the logical group customer, customer name and demographics might be one subgroup and customer address a second logical subgroup. The number of subgroups and fields are used to determine the weighting based on a similar low, average, high scale as mentioned earlier.

A function point count for a project will identify and size all logical files and transactions that have been added, changed or deleted and any required conversion functionality.  A count of an application identifies and sizes all functionality that existed within the boundary of the application when the count was done. Determining the unadjusted function point count begins by the summing components categorized into buckets using the low, average, high scale noted earlier by component type.  The IFPUG methodology provides weights for each of the buckets which are then summed to the unadjusted function point count. Optionally the count can be adjusted by applying a correction factor called the value adjustment factor. The value adjustment factor is determined by rating 14 general system characteristics on a scale of zero to five. The general system characteristics rate characteristics from ranging for data communications to the ability of the users being able to change the system. This is sometimes wrongly thought of as complexity. I would suggest that the value adjustment factor is a reflection of nonfunctional requirements for the application. Reviewing the original documentation defining function points reveals that the general system characteristics and value adjustment factor was originally created to adjust between online and batch applications.  This portion of the methodology is currently defined as optional. Once calculated the value adjustment factor is multiplied by the unadjusted function point count to create the adjusted function point count.  The value adjustment factor can adjust a function point count by plus or minus 35%, ranging from .65 to 1.35 (general business applications are around 1).

The IFPUG methodology provides a set of formulas for an application count, an enhancement project count and a development project count.  The formulas are used to determine the count.  The type of count does not change the rules for determining the components or their weight.

Issues:

Counting function points takes effort. The activity of function point counting requires some degree of expertise and effort both from the counter and the subject matter experts that are required to describe the application or project. The effort is inversely proportional to the quality of the project document or application documentation. Integrating the counting process into the project activities like analysis or design reviews minimizes the extra effort and maximizes the value of the process by shining a light on what is being built and how the work is being done.

Function points count functionality delivered, not complexity. Size is related to complexity; however technical complexity is driven by many other factors other than size. Factors influencing technical complexity can include algorithms, math and complexity of the data, reuse and other constraints. The lack of a direct and complete linkage between size and complexity causes complaints such as size and effort seem disconnected. Adjusting for complexity in estimation or measurement analysis requires a separate step to measure technical complexity.

Related Measures:

  • COSMIC Function Points
  • Mark II Function Points
  • NESMA Function Points
  • Use Case Points

Criticisms:

Function points do not measure everything required to understand the effort required to maintain or build software. IFPUG Function Points measure functional size and nothing else. Projects and applications can involve significant amounts of non-functional and technical requirements which do not register on the function point scale. The same criticism can be leveled in terms of the understanding of performance and the analysis of results. This criticism is valid. In order to create a full measurement profile of any project or application a larger pallet of metrics is needed, size is the foundation. Estimation tools generally provide a mechanism for adjusting other attributes such as nonfunctional requirements or technical complexity. Measurement programs must include a pallet of metrics to fully describe performance because size is only a single attribute albeit important.

Function points were defined in 1979, some of the words used to describe the methodology have not aged well. This criticism is true but somewhat disingenuous because all methods that have longevity will have this problem. Terms like “file” are not as ubiquitous as they once were; however, the concept of a logical group of data is just as valid today as it was in 1979. Finding the translation path between the words in the methodology and any specific platform or technology is a task that all counters must perform (it is also why the Counting Practice Manual is much larger than the 38 pages of rules that define IFPUG Function Points). The process of translation between the methodology and new platforms and technologies is a step that will be required for all sizing techniques if they stand the test of time and are used more than a few years and outside of the single technology area.

A final criticism is that training and expertise is required for function point analysis that is outside of the normal training required for analysis, design, coding or testing.  Again, this criticism is true but again not outside of the ordinary specialization found in information technology. Training and expertise is required to ensure consistency. IFPUG provides a test driven certification program as a marker to ensure consistency to the IFPUG Function Point rules (Certified Function Point Specialist or CFPS).

Financial Metrics Overview:  Return Metrics

Audio Version (SPaMCAST 139)

The Metrics Minute begins an exploration of financial metrics.  The first class of financial metrics is those metrics that are used to decide which projects should be done.  I call these return metrics and they ask the questions: Should we do this project? And more broadly, are their better uses for our assets? Generally these questions are asked at the beginning of a project but can be equally powerful at different times during the project life cycle. Return metrics are focused on the financial aspect of the project, but less tangible assets can be incorporated.  Attributes such as risk and strategy are examples of attributes that are often quantified and incorporated.

Examples of this class of metric include Return on Assets (ROA), Return on Investment (ROI), payback period and Internal Rate of Return (IRR). In all cases these metrics account for income, a hurdle rate (interest rate or expected rate of return) and a comparison cost, based either on assets, income, or equity. While these ratios are a rich source of decision making information, they are not tools with which to manage a project or programs.

Metrics Minute: Return on Assets (ROA)
Thomas M. Cagley Jr.

Definition

Return on Assets (ROA) is a ratio of the earnings generated from a project compared to the assets used to generate that revenue. ROA is a classic financial metric that when applied to projects, is typically used as a technique for project acceptance or, in retrospect, as a tool to evaluate overall performance.

Formula

ROA = (Estimated Net Income + Estimated Interest Expense) ÷ (Average Assets during the period)

Estimated Net Income is the residual income that results from a project after adding total revenue and subtracting all expenses for the project for a specific period of time.

Estimated Interest Expense is any interest expense related to the project so that the costs associated with funding those assets is ignored by adding the interest expense back into net income.

Average Assets represent either the statistical average or mode of the asset valuation for those assets involved in generating the project’s income.  An average is used to at least partially negate the value variance across the period of time being studied (for example, a year).

Usage

The primary use of ROA is to compare a potential project to other possible projects in order to determine which makes the most sense to pursue.  Projects with a higher ROA are more likely to be included in the portfolio of projects.

An alternative use of ROA is as a tool to evaluate projects or a portfolio of completed projects.  In this case the metric can be periodically recalculated using completed projects to determine whether an overall project portfolio represents an efficient use of assets or if segments of the portfolio offer high a higher return on assets.

Issues

There are several issues with the calculation and use of ROA as a decision tool.  Most of these center either on making apples-to-apples comparisons or using the metric without regard to other factors.  The most significant issues are:

Return on assets is not useful for comparisons between projects in different industries or product segments because the factors of scale and peculiar capital requirements can be different. Stratifying groups of project into so that an apples-to-apples comparison can be made is one method that can be used to ensure an apples-to-apples comparison. Similar methods to ensure good comparisons are required when organizational goals require a spread of projects across a diverse product portfolio that has differing return profiles.

A second issue is that intangible assets may affect the rational for project selection which forces organizations to address valuation of intangible assets. This issue is a relative of the “numbers don’t tell the whole story” argument. Valuation of intangible assets requires a judgment call and unless that judgment is based on rules or guidelines those judgment calls will be open to question or manipulation.

Related Metrics

  • Return on Equity (ROE)
  • Return on Investment (ROI)
  • Payback Period
  • Internal Rate of Return (IRR)
  • Net Present Value
  • Total cost of ownership

Criticisms

Criticisms of ROA, like the issues noted earlier, revolve around how metric is used.

This first criticism is comparisons across companies and industries are difficult due differing asset structures that are not very easy to discern.  Asset structures may differ between product lines within the same company which makes comparisons difficult. This criticism is true; however, unless there is an organizational imperative for project and product diversity, the organization should be maximizing return on assets to ensure it is meeting its fiduciary responsibility.

A second criticism:  While most users of the ROA metric would agree that other factors are important to consider, most organizations have not determined a set of standard factors to consider. For example, risk or risk weighting of the ROA results can provide a richer decision-making tool; however, there isn’t general agreement on when and how to use this additional factor.   This criticism is the most troublesome. I would suggest not using a one-size-fits-all set of factors or discount rate; rather I would discount the net income based on a risk adjusted discount rate. The higher the risk, the higher the discount rate would be set. A formal approach to measuring or estimating risk will be required to avoid manipulation of the analysis. Other financial ratios or metrics are useful in expanding the factors being weighed when comparing potential projects (or performance). As an example, ROI (return on investment) may present a different perspective if cash outflow needs to be more highly weighted. In some scenarios specific assets other than cash may be underutilized; therefore, a lower hurdle rate might be in order (or liquidation of the assets) to ensure more intensive use of the asset class.

A third criticism of ROA centers on the valuation of the Total Assets when it reflects a snapshot of a particular moment in time. The idea is that asset valuation can and does vary, and that variance could change how any specific project is evaluated.  To deal with the issue I suggest periodic revaluations of the portfolio to ensure that the business environment does not change enough to effect the composition of the project portfolio.  Note the needed to re-evaluate projects is need more often for projects that are on the line between acceptance and rejection.

I am building a mind map for the project i have titled, Metric Minutes. I currently focusing on financial metrics. The portion of the mind map is shown below.

20110609-085454.jpg

Are there other financial metrics you use or recommend?

Metrics Minute:  Value at Risk
Thomas M Cagley Jr.

 Audio Version (SPaMCAST 135)

Definition:

Value at Risk represents the potential impact of risk on the value of a project or portfolio of projects.  Risk is monitored at specific points (a topic of further discussion in the near future) of the project life cycle.  Monitoring includes an evaluation of the potential cost impact of remediating the risks that have not been fully remediated weighted by the probability of occurrence.  Where the cost impact of risk is above program risk tolerance specific remediation plans will be established to reduce the estimated risk impact.  The value at risk metric provides the team with a tool for prioritizing risks and risk management activities. 

Formula:

In its simplest form the equation for Value at Risk (for IT projects) is:

Value at Risk = Probability of Risk * Estimated Cost Impact of Un-remediated Risk

A more precise view of Value at Risk would reflect the time value of money using the following formula:

Value at Risk = Probability * Net Present Value of Estimated Cost Impact of Un-remediated Risk

The formula could be made to reflect the variability of the cost impact over time however on projects of less than a year this is not usually necessary.

Uses:

The Value at Risk metric has three primary uses.  All uses fall in the category of risk management. 

The first use of Value at Risk is perhaps the most important; quantifying risks and linking them to the value of the project makes the potential impact of each risk and the overall portfolio of risks easily understandable.   Reducing the impact of risks to Dollars, EUROs, Rupees,  Pounds or any other currency and then maintaining the analysis as the project evolves is language everyone on the project can understand .

The second use of this technique is as a tool in prioritizing risks so that resources can be targeted on the risks that have the greatest weighted potential to affect value.

The third use is of Value at Risk is as a monitoring tool and when combined with risk tolerance guides as a tool to precipitate action. For example, when used as a project or program metric, the value at risk can be monitored and reviewed at specific points of the life cycle.  Were the value at risk is above program or organizations risk tolerance specific remediation plans can be established to reduce the estimated impact.

Criticisms:

There are several criticisms of Value at Risk; many of the criticisms of this method focus on the step of quantifying risks.  

The first criticism is that numbers do not cover everything.  The criticism argues that not all of the risks can be easily quantified.  Risks driven by factors such as morale, economic and political volatility, emerging technologies and external market innovation are typically considered intangible risks.  I would suggest that almost anything can be measured (just see the book How to Measure Anything).  Intangible risks might be difficult to define in concrete and dollar terms but rather than giving up I would suggest recognizing that quantification is possible but requires a greater degree of subjectivity, intuition and monitoring vigilance.

The second criticism of Value at Risk (or any risk quantification technique) is that it is hard to predict the unpredictable.  From Wikipedia, “The Black Swan Theory or Theory of Black Swan Events is a metaphor that encapsulates the concept that, the event is a surprise (to the observer) and has a major impact. After the fact, the event is rationalized by hindsight.”   The criticism is valid; Value at Risk can only quantify that which is knowable.  Risk management techniques are not one time events.  Everyone involved or interested in the project must be constantly aware of the world in and around the project, when new risks begin to emerge they need to be identified and evaluated. 

A third criticism is that new risks are difficult to quantify because we do not have any historical data or experience to evaluate the risk.   The corollary to this criticism is that analysis puts too much weight on the past because there is an assumption that the past is an accurate predictor of the future.  Again both are valid criticisms but rather than an argument for not trying to quantify any risk, I would suggest that both are augments for quantification, collection of current history and constant monitoring.  

A final criticism is based on the mathematics of the Value at Risk formula.  A risk that has high probability and low loss could have a similar valuation as a risk with a low probability and a high potential loss.   This criticism is valid and related to the “numbers don’t cover everything” criticism noted earlier.  Even though the criticism is valid it is manageable.  Analysis of individual risks should never reflect a simple ranking. 

Related or Variants Measures:

  • ·         Risk matrix
  • ·         Risk impact evaluations

Issues:

As with criticisms there are a few potential issues with measuring risks, most of these issues are driven by human nature therefore forewarned is forearmed. 

The first criticism is that when leveraging quantitative risk analysis there can be a temptation to over-interpret the data. Psychologists have long known that humans have a great a facility for recognizing patterns even when they do not exist.  Combining the use of a diverse team and transparency across the entire risk analysis life cycle will reduce the potential for succumbing to over analysis and visits to blind alleys.

The second issue is, that since risk measurement and risk management is not a onetime affair, risks need to be monitored, assessed and avoided (and remediated when it makes sense) over the entire life of a project or program.  Reassessing the impact of a risk on the anticipated delivery value of the project requires time and effort.  Time and effort that some would argue is overhead and not focused on delivering functionality, the problem is that if risks transform themselves into issues it might be better if the functionality wasn’t delivered (an extreme case that hopefully never occurs).  Rather than an issue I would suggest that this is a statement of fact.  Evolving risks require continual adjustment of your risk analysis because we live in a dynamic environment; Value at Risk is a tool that makes the output of the monitoring process visible and easily understandable.

The final issue is that estimating risk probabilities is difficult.  Human nature tends to drive perception into a binary on / off position.  Hans Peter Duerr, a successor to Werner Heisenberg, the discoverer of the famous Uncertainty Principle, a foundation of quantum science once opined, “We want it to be either yes or no. But the truth is always somewhere on the way between yes and no.”  Group consensus techniques like Delphi are useful to ensure multiple points of view are involved in estimating probabilities which will help keep binary perceptions at bay.

 

Audio Version:  SPaMCAST 133

Definition:

A burn-up chart is a graph that tracks progress over time by accumulating functionality as it is completed. The accumulated functionality can be compared to a goal, such as a budget or release plan, to provide the team and others with feedback. Graphically the X axis is time and the Y axis is accumulated functionality completed over that period of time. The burn-up chart, like its cousin the burn-down chart, provides a simple yet powerful tool to provide visibility into the sprint or program.

The burn-up chart can be thought of as the mirror image of the burn-down chart, but it is generally extended over multiple sprints to show the strategy being followed as the project builds toward release and product delivery.

Formula

As with its close cousin, the burn-down chart, there is not really a formula for a burn-up chart as much as there are instructions on how to graphically represent progress against a goal. In its simplest form, the X axis represents time (days, sprints, or releases) and the Y axis represents the accumulated functionality completed over that period of time (stories, value or cost).

Using the basic form of the graph as a base, other data can be integrated. The plan data for multiple sprints and the release schedule can be overlaid on the chart to give visibility into the anticipated flow of the project.  The project budget can be added to the chart to provide feedback on budget utilization; the budget line can be raised or lowered to show how much work remains as changes to scope are incorporated. Value can be tracked on a second Y axis to show the team the relationship between work and value. The burn-up chart is one chart covering the big three topics of on-time, on-budget and on-schedule.

Uses

The burn-up and burn-down charts have many similar uses such as planning and monitoring. Rather than repeat the discussion already published in the Metrics Minute: Burn-Down Charts, I will focus on the major distinctions between the charts. The unique power of the burn-up chart can be found in its ability to provide a holistic view for the project. The view is holistic because the chart can be used to show progress over sprints and releases. To use a photographic analogy, the burn-up chart provides a panoramic view rather than a normal picture, which is narrow slice of real life.

James Shore suggests developing a risk-adjusted burn-up chart to make and meet commitments.  It tracks commitments alongside progress. Adding a continually refined risk-adjustment to the burn-up chart accounts for Steve McConnell’s cone of uncertainty.

Issues

One issue is that a burn-up chart provides visibility at a high level, rather than progress at a story level. This causes some people concern. Burn-up, like burn-down charts, are not designed as a precise report on where each story is on a day-to-day basis, but rather as a view of how functionality or value has been delivered. The burn-down chart and the detail found in a card wall (or proxy) are better tools to drive a project on a day-to-day basis.

The second issue occurs because value is recognized when a piece of work is complete.  In software projects this equates to functional code that has been integrated and tested.  Value is not recognized until work in process is complete.   Alistair Cockburn suggests this strategy yields more reliable information about the projects actual state than the classic plan based mode. The problem is that if the increment of work is too large the graph will resemble a J with the majority of value accumulating late in the sprint leaving room from negative surprises. The way to avoid this issue is to increase the granularity of the units of work.  This will allow the team to recognize value more quickly, and therefore smooth out the curve.

The third issue is also related to the choice of unit of measure for use on the vertical axis. The unit needs to be predictable early in the process. Most physical measures, such as lines of code, can’t be known until you’re very near the end of the sprint or projects. Leveraging a metric that can expand even though scope has not changed muddles the information provided by the chart, thus reducing its usefulness. This issue can be avoided by choosing unit of measure that can’t expand unless additional work is accepted into the sprint or project.  Functional measures, such as Quick and Early Function Points, that translate stories or requirements into numbers are a disciplined mechanism for addressing the unit of measure issue without adding overhead.

Related or Variant Measures

  • Functional Size
  • Velocity
  • Burn-down Charts
  • Earned Value
  • Productivity
  • Cumulative Flow Diagrams

Criticisms

The most significant criticism of the burn-up chart is that, in its basic form, it does not reflect the flow of work.  Work in some sprints does not lead to completed functionality, therefore value is not accrued.  Integrating work flow by using techniques like Kanban and cumulative flow diagrams is a way of accruing value on a more granular basis if small complete packages can’t be used.

The final criticism is that burn-up charts do not predict surprises.  This is a false criticism as a surprise by definition is not predictable.  Use the value at risk metric as a risk management tool to help avoid surprises, but vigilance should never be supplanted by measurement.

 

Metrics Minute:  Burn Down Charts
Thomas M. Cagley Jr.

Audio Version

Definition:

Burn Down Charts are a graphical representation of the work left to be done and of the progress that has been made. The chart is typically drawn to show progress against predictions. The analogy of a glide path has been used to paint a picture of the slope and the ultimate destination of a burn down chart which targeted at completion. One of the most powerful attractions of the burn down chart is that it involves psychology by emotionally tying the metric to completion through the visual representation of a path counting down to zero. 

Formula

There is not really a formula for a burn down chart as much as there are instructions on how to represent progress against a goal graphically. Burn down charts are represented as a standard x/y chart. The x-axis is used as a representation of time. Typically days are used although I have seen charts that span multiple sprints so denominators such as months, milestones or sprints are occasionally used. The y-axis is denominated in units of work (for example story points or hours of effort). The line that connects the total amount of work to be done in the sprint and the length of the sprint is called the ideal line. This forms an isosceles triangle. In most burn down charts the amount of work remaining is also drawn between the two axes then compared to the ideal line. The difference between the two plots represents whether the project is ahead or behind schedule.

 

 Uses

There are two primary uses of a burn down chart; planning and monitoring. The burn down chart represents either what is intended to be delivered in a sprint in terms of stories or the amount of effort that is intended to be expended to deliver the agreed upon functionality. Through the quantification of what a sprint intends to deliver based on the estimates of the team and historical velocity, the chart represents a plan (a weakness for some that we will discuss later). The power of visual planning is even more evident when burn down charts are used for sprint and release planning for large programs.  

The second primary use of a burn down chart (and maybe the most important) is for monitoring and control based on the visual representations of the plan and progress against that plan. At a glance, the chart can tell whether you are ahead or behind schedule which provides the team with the impetus for action. For example, if progress is not being made fast enough additional effort can be brought to bear or scope can be reduced. Alternately, if progress is racing ahead of the ideal line, additional work can be accepted into the sprint.

Issues

A burn down chart can look like a roller coaster ride rather than the smooth slope that a glide path evokes. This saw toothed pattern reflects that both tasks and stories are not completed at the rate shown as the ideal slope. This rougher pattern suggests that it is important to know when a gap between the ideal and actual performance reflects a signal that action needs to be taken rather than the noise of the normal flow of work.  One solution I have seen used is to create a set of control limits (as if the burn down chart was a control chart) to create signals. Limits require making many assumptions about process discipline and capability that might not make sense. I would suggest that the use of mechanistic guides should only be used as a measure of last resort or perhaps as a set of training wheels. Rather, I would suggest relying on the judgment of the scrum master and team to provide the guidance needed to direct the sprint based on a strong definition of done.

 A second issue is that a burn down chart provides visibility at a high level rather than progress at a story level which causes some people concern. Burn down charts are not designed as a precise report on where each story is on a day-to-day basis but rather as a view of how much work is left to be done and whether the team is on track to meet its sprint goals. Those who expect the chart to have the same precision as a detailed schedule will not have their needs met. In order to meet the needs of those that need precise knowledge of individual stories, I would suggest a visit to the card wall or its proxy is a better source than the burn down chart.

An effective burn down chart used for monitoring and tracking effort needs to reflect the amount of effort required to complete the work defined by the sprint at specific point in time. In order to deliver this type of information the actual line can’t be a pure reflection of what is left of time originally planned rather it means that the actual line is something that is more like using earned value. The tracking of work remaining for any open task must become part of the daily rituals of the sprint team. I suggest that work left on a task be collected during the standup meeting and captured in the tool being used to track stories and activities. In one organization I was involved with a user field in TFS was created to track this data. Using this method, the actual line will reflect what needs to be done rather than just what was planned to be done.

A final issue with burn down charts is that it is not always easy to reflect changes in the number of stories while in flight. One simple alternative is the burn UP chart. Another options is to increase (or decrease) the ideal line at the point where the scope was changed or to extend the ideal below zero to reflect additions to scope. The former option is the one I suggest. This option can also be used when there are specific events that consume a significant amount of effort that are not spread evenly across a sprint.  

Related or Variant Measures

  • Functional Size
  • Velocity
  • Burn Up Charts
  • Earned Value
  • Productivity

 

Criticisms

The single most heard criticism is that a burn down chart does not provide as much status information as a project schedule or detailed status report (definitions vary). I would argue that the data presented is equivalent and if produced daily that it actually provide more actionable information and far less excuses. The real issue is that it is different and because it is different training and education are required. I do not suggest phased transitions as it is to easy to hold on to the past.

The false signal issue is a fair criticism. The saw tooth pattern seen in both effort and story denominated burn down charts require a conversation with the scrum master or team lead before reacting. Building an understand of how to interpret the burn down chart and building trust in self directed teams to see the trends and to take action themselves is critical for making a transition to this form of progress and status reporting.

Audio Version:  Software Process and Measurement Cast 119.

Definition:

The simple definition of velocity, as it is currently used, is the amount of work that is completed in a period of time (typically a sprint). The definition is related to productivity which is the amount of effort required to complete a unit of work and delivery rate which measures the speed that work is completed.  The inclusion of a time box (the sprint) creates a fixed duration which transforms velocity into more of a productivity metric than a speed metric (how much work can be be done in a specific timescale by a specific team). Therefore to truly measure velocity you need to estimate the units of work completed, have a definition of complete and have a time box.

The definition of complete, in agile is typically functional code however I think the definition can be stretched to reflect the terminal deliverable the sprint team has committed to create and based on the definition of done (for example requirements for a sprint team working on requirements or completed test cases in a test sprint) that the team has established.

Many agile projects use the concept of story points as a metaphor for size or functional code. Note other functional size measures can be just as easily used. Examples in this paper will use story points as a unit of measure.  What is not size however is effort or duration. Effort is an input that is consumed while transforming ideas into functional code. The amount of effort required for the transformation is a reflection of size, complexity and other factors. Duration like effort is consumed by a sprint not created therefore does not measure what is delivered.

Formula

To calculate velocity, simply add up the size estimates of the features (user stories, requirements, backlog items, etc.) successfully delivered in an iteration.  The use of the size estimates allows the team to distinguish between items of differing levels of granularity.  Successfully delivered should equate to the definition of done.

Velocity = Story Points Completed Per Sprint

And:

Average velocity = Average Number of Story Points Per Sprint

The formula becomes more complex if staffing varies between sprints (and potentially less valuable as a predictive measure).  In order to account for variable staffing the velocity formula would have to be modified as follows:

velocity per person = sum (size of completed features in a sprint / number of people) / number of sprints or observations

To be really precise (not necessarily more accurate) we would have to understand the variability of the data as variability would help define level of confidence.  Variability generated by differences in team member capabilities is one of the reasons that predicability is enhanced by team stability. As you can see, the more complex the environmental scenario becomes, the less simple the math must be to describe the scenario.

Uses:

Velocity is used as a tool in project planning and reporting. Velocity is used in planning to predict how much work will be completed in a sprint and in reporting to communicate what has been done.

When used for planning and estimation the team’s velocity is used along with a prioritized set of granular features (e.g., user stories, backlog items, requirements, etc.) that have been sized or estimated.  The team uses these factors to select what can be done in the upcoming sprint. When the sprint is complete the results are used to update velocity for the next sprint. This is a top down estimation process using historical data.

Over a number of sprints velocity can be used both as a macro planning tool (when will will the project be done) and a reporting tool (we planned at this velocity and are delivering at this velocity).

Velocity can be used in all methodologies and because it is team specific, it is agnostic in terms of units of size.

Issues

As with all metrics, velocity has it’s share of issues.

The first is that there is an expectation of team stability inherent in the metric. Velocity is impacted by team size and composition and without collecting additional attributes and correlating these attributes to performance, change is not predictable (except by gut feel or Ouija Board). There should always be notes kept on team size and capability so that you can understand your data over time.

Similarly team dynamics change over time, sometimes radically. Radical changes in  team dynamics will affect velocity. Note shocks to any system of work are apt to create the same issue. Measurement personnel, SCRUM masters and team leaders need to be aware of people’s personalities and how they change over time.

The first time application of velocity requires either historical data of other similar teams and projects or an estimate. In a perfect world a few sprints would be executed and data gathered before expectations are set however generally clients want an idea of if a project will be completed, when it will be completed and the functions that will be delivered along the way.  Estimates of velocity based on the teams knowledge of the past or other crowd sourcing techniques are relatively safe starting points assuming continuos recalibration.

The final issue is the requirement for a good definition of done. Done is a concept that has been driven home in the agile community. To quote Mayank Gupta (http://www.scrumalliance.org/articles/106-definition-of-done-a-reference), “An explicit and concrete definition of done may seem small but it can be the most critical checkpoint of an agile project.”  A concrete definition of done provides the basis for estimating velocity by reducing variability based on features that are in different states of completion.  Done also focuses the team by providing a goal to pursue. Make sure you have a crisp definition of done and recognize how that definition can change from sprint to sprint.

Related Metrics:

Productivity (size / effort)

Delivery Rate (duration / size)

Criticisms:

The first criticism of velocity is that the metric is not comparable between teams and by Inference is not useful as a benchmark. Velocity was conceived as a tool for Scrum Masters and Team Leads to manage and plan individual sprints. There are no overarching set of rules for the metric to enforce standardization therefore one velocity is apt to reflect something different than the next. The criticism is correct but perhaps off the mark. As a team level tool velocity works because it is very easy to use and can be consistent, adding the complexity of standards and rules to make it more organizational will by definition reduce the simplicity and therefore the usefulness at the team level.

A second criticism is that estimates and budgets are typically set early in a projects life.  Team level velocity may well be an unknown until later.  The dichotomy between estimating and planning (or budgeting and estimating for that matter) is often overlooked.  Estimates developed early in a project or in projects with multiple teams require different techniques to generate. In large projects applying team level velocities requires using techniques more akin to portfolio management which add significant levels of overhead. I would suggest that velocity is more valuable as a team planning tool than as a budgeting or estimation tool at a macro level.

A final criticism is that backlog items may not be defined at consistent level of granularity therefore when applied, velocity may deliver inconsistent results. I tend to dismiss this criticism as it is true for any mechanism that relies on relative sizing. Team consistency will help reduce the variability in sizing however all teams should strive to break backlog items into as atomic stories as possible.