Story Points Are A Fence!

A recent discussion with a Scrum Master colleague reminded me that conversations are filled with metaphors.  Metaphors are used to simplify and represent abstract concepts so they can highlight and or offer a comparison.  According to James Geary in his TED talk from July 15, 2010, we use, on average, six metaphors a minute in conversation.  We use metaphors because they are useful. Story points are a metaphor. Story points represent a piece of work. In software, a story point is an abstraction used to talk about a piece of functional code that is not perfectly understood.  Some pieces of code are harder, bigger, take longer to complete, messier, and might not be as well understood…. the list can go on. That is why story points come in different sizes. Historically two scales have been used. Both scales are based on the Fibonacci sequence. Every person and every team has a different perspective of what story point means because it is a metaphor. However, the understanding generated by the abstraction is enough to allow team members to talk about the functionality or go get a rough approximation of what can be done by the team in a sprint or iteration.  Inside the team, the metaphor allows a conversation. Unfortunately, all useful metaphors are used and extended until their marginal utility to facilitate a conversation is reduced to zero (otherwise known as the rule – all good metaphors will be used until they are kicked to death). Story points are no different. (more…)

The language they understand is months and dollars (or any other type of currency).

The language they understand is months and dollars (or any other type of currency).

Clients, stakeholders and pointy haired bosses really do care about how long a project will take, how much the project will cost and by extension the effort required to deliver the project. What clients, stakeholders and bosses don’t care about is how much the team needs to think or the complexity of the stories or features, except as those attributes effect the duration, cost and effort.  The language they understand is months and dollars (or any other type of currency). Teams however, need to speak in terms of complexity and code (programming languages). Story points are an attempt to create a common understanding.

When a team uses story points, t-shirt or other relative sizing techniques, they hash a myriad of factors together.  When a team decomposes problem they have to assess complexity, capability and capacity in order to determine how long a story, feature or task will take (and therefor cost).  The number of moving parts in this mental algebra makes the outcome variable.  That variability generates debates on how rational it is to estimate at this level that we will not tackle in this essay.  When the team translates their individual perceptions (that include complexity, capacity and capability) into story points or other relative sizing techniques, they are attempting to share an understanding with stakeholders of how long and at what price (with a pinch of variability).  For example, if a team using t-shirt sizing and two week sprints indicate they can deliver 1 large story and 2 two medium or 1 medium and 5 small stories based on past performance, it would be fairly easy to determine when the items on the backlog will be delivered and a fair approximation on the number of sprints (aka effort, which equates to cost).

Clients, stakeholders and bosses are not interested in the t-shirt sizes or the number of story points, but they do care about whether a feature will take a long time to build or cost a lot. The process of sizing helps technical teams translate how hard a story or a project is into words that clients, stakeholders and bosses can understand intimately.

Trail Length Are An Estimate of size,  while the time need to hike  is another story!

Trail length is an estimate of size, while the time need to hike it is another story!

More than occasionally I am asked, “Why should we size as part of estimation?”  In many cases the actual question is, “why can’t we just estimate hours?”  It is a good idea to size for many reasons, such as generating an estimate in a quantitative, repeatable process, but in the long run, sizing is all about the conversation it generates.

It is well established that size provides a major contribution to the cost of an engineering project.  In houses, bridges, planes, trains and automobiles the use of size as part of estimating cost and effort is a mature behavior. The common belief is that size can and does play a similar role in software. Estimation based on size (also known as parametric estimation) can be expressed as a function of size, complexity and capabilities.

E = f(size, complexity, capabilities)

In a parametric estimate these three factors are used to develop a set of equations that include a productivity rate, which is used to translate size into effort.

Size is a measure of the functionality that will be delivered by the project.  The bar for any project-level size measure is whether it can be known early in the project, whether it is predictive and whether the team can apply the metric consistently.  A popular physical measure is lines of code, function points are the most popular functional measure and story points are the most common relative measure of size.

Complexity refers to the technical complexity of the work being done and includes numerous properties of a project (examples of complexity could include code structure, math and logic structure).  Business problems with increased complexity generally require increased levels of effort to satisfy them.

Capabilities include the dimensions of skills, experience, processes, team structure and tools (estimation tools include a much broader list).  Variation in each capability influences the level of effort the project will require.

Parametric estimation is a top-down approach to generating a project estimate.  Planning exercises are then used to convert the effort estimate into a schedule and duration.  Planning is generally a bottom-up process driven by the identification of tasks, order of execution and specific staffing assignments.  Bottom-up planning can be fairly accurate and precise over short time horizons. Top-down estimation is generally easier than bottom-up estimation early in a project, while task-based planning makes sense in tactical, short-term scenarios. Examples of estimation and planning in an Agile project include iteration/sprint planning, which includes planning poker (sizing) and task planning (bottom-up plan).  A detailed schedule built from tasks in a waterfall project would be example of a bottom-up plan.  As most of us know, plans become less accurate as we push them further into the future even if they are done to the same level of precision. Size-based estimation provides a mechanism to predict the rough course of the project before release planning can be performed then again, as a tool to support and triangulate release planning.

The act of building a logical case for a function point count or participating in a planning poker session helps those that are doing an estimate to collect, organize and investigate the information that is known about a need or requirement.  As the data is collected, questions can be asked and conversations had which enrich understanding and knowledge.  The process of developing the understanding needed to estimate size provides a wide range of benefits ranging from simply a better understanding of requirements to a crisper understanding of risks.

A second reason for estimating size as a separate step in the process is that separating it out allows a discussion of velocity or productivity as a separate entity.  By fixing one part of the size, the complexity and capability equation, we gain greater focus on the other parts like team capabilities, processes, risks or changes that will affect velocity.  Greater focus leads to greater understanding, which leads to a better estimate.

A third reason for estimating size of the software project as part of the overall estimation process is that by isolating the size of the work when capabilities change or knowledge about the project increases, the estimate can more easily be re-scaled. In most projects that exist for more than a few months, understanding of the business problem, how to solve that problem and capabilities of the team increase while at the same time the perceived complexity[1] of the solution decreases. If a team has jumped from requirements or stories directly to an effort estimate  it will require more effort to re-estimate the remaining work because they will not be able to reuse previous estimate because the original rational will have change. When you have captured size re-estimation becomes a re-scaling exercise. Re-scaling is much closer to a math exercise (productivity x size) which saves time and energy.  At best, re-estimation is more time consuming and yields the same value.  The ability to re-scale will aid in sprint planning and in release planning. Why waste time when we should be focusing on delivering value?

Finally, why size?  In the words of David Herron, author and Vice President of Solution Services at the David Consulting Group, “Sizing is all about the conversation that it generates.”  Conversations create a crisper, deeper understanding of the requirements and the steps needed to satisfy the business need.  Determining the size of the project is a tool with which to focus a discussion as to whether requirements are understood.  If a requirement can’t be sized, you can’t know enough to actually fulfill it.  Planning poker is an example of a sizing conversation. I am always amazed at the richness of the information that is exposed during a group-planning poker session (please remember to take notes).  The conversation provides many of the nuances a story or requirement just can’t provide.

Estimates, by definition, are wrong.  The question is just how wrong.   The search for knowledge generated by the conversations needed to size a project provides the best platform for starting a project well.  That same knowledge provides the additional inputs needed to complete the size, complexity, capability equation in order to yield a project estimate.  If you are asked, “Why size?” it might be tempting to fire off the answer “Why not?” but in the end, I think you will change more minds by suggesting that it is all about the conversation after you have made the more quantitative arguments.

Check out an audio version of this essay as part of  SPaMCAST 201


[1] Perceived complexity is more important than actual complexity as what is perceived more directly drives behavior than actual complexity.

Size is a matter of a frame of reference.

Size is a matter of a frame of reference.

In this entry, we continue to address the questions I received during a recent webinar on User Stories. I felt that it was important to share and expand on the answers that I included in the webinar with a broader audience.  Today’s question:

“As an oversight person to a DoD contractor, I’m interested in what to do to increase consistency of sizing user stories.  Tips and tricks that seem to help would be great to hear.” 

There are two basic strategies for increasing sizing consistency:

  1. Use industry-defined size metrics, like IFPUG function points, and
  2. Use a frame of reference.

The best answer to this question is to use industry-standard software measurement techniques, such as IFPUG Function Points.  There versions of this measure that can be applied to user stories to consistently derive a value for functional size.  For example, Quick and Early Function Points is a technique that identifies function points based on the subject and verb in a user story.  The accuracy is high when compared to a detail count after the code is developed. Function points, whether IFPUG, COSMIC, Mark II or NESMA, are all based on a set of published rules that a assure consistency.  

Making story points more consistent is problematic.  As I have noted in the past consistency is challenging in environments where team composition is dynamic. However there are steps that can improve consistency.  Begin by training the entire team on story points.  The training should be done as group and include everyone on the team, regardless of whether they have had the training before or their role on the team. Exercises during the training should be designed to ensure that team members gain more insight into each other’s thought process. Note: the training needs to be more than a PowerPoint presentation and must include a hands-on application of story point sizing techniques. Build a story points refresher training into team activities every few months. Training and consistency are linked. Second select two or three well understood features and develop a story point number as team for each of the feature in the sample.  These features will be used to establish a frame of reference when story pointing backlog items.  My experience has shown that having two or three well understood reference points increases the perception of consistency.  A final note on improving story point consistency is to remember to involve the whole team in developing the story point number for the user stories (product owner, Scrum Master and development team) and consider an outside coach for to ensure the process of setting story points is done in a collaborative manner.  All of these suggestions can be used to support large programs with multiple teams, however consistency will be reduced with every added team.

Two cautionary notes:  In order to apply any of these methods you must learn the rules or techniques and you must have well-formed user stories (the adage garbage in, garbage out holds). Second, recognize that in Agile, user stories are actively being interpreted, groomed, split and added to during the project.  The size of a story can and will change as it is interpreted and accepted into a sprint. 

Consistency requires rules such as those published for IFPUG Function Points (an ISO Standard) that everyone involved in the process can learn and apply.  When using relative measures, i.e. story points, consistency can only come from training, stable teams and a good set of reference points.  In both cases, consistency must be tempered with an understanding that the team will gather more knowledge about what they are creating as time goes by, which might affect the size of what is being delivered.

Story points?

Story points?

Recently I did a webinar on User stories for my day job. During my preparation for the webinar I asked everyone that was registered to provide the questions they wanted to be addressed. I received a number of fantastic questions. I felt that it was important to share the answers with a broader audience. 

One of the questions from Grigory Kolesnikov I was asked was indicative of a second group of questions: 

“Which is better to use as a metric for project planning:

  1. User stories,
  2. Local self-made proxies,
  3. Functional points, or
  4.  Any other options?”

Given the topic of the webinar the answer focused on whether story points were the best metric for project planning.

Size is one of the predictors of how much work will be required to deliver a project. Assuming all project attributes, with the exception of size, stay the same, a larger project will require more effort to complete than a smaller project. Therefore knowing size is an important factor in answering questions like “how long will this take” or “how much will this project cost”.  While these are questions fraught with dangers, they are always asked. If you have to compete for work they are generally difficult not to answer. While not a perfect analogy, I do not know a person that builds or is involved in building a home that can’t answer that question (on either side of the transaction). Which metric you should use to plan the project depends on the type of project or program and whether you are an internal or external provider (i.e. whether you have to compete for work).  Said a different way, as all good consultants know the answer is – it depends.

User stories are very useful, both for release planning and iteration planning in projects that are being done with one or small number of stable teams. The stability of the teams is important for the team to be able to develop a common frame of reference for applying story points. When teams are unable to develop a common frame of reference (or need to redevelop the frame of reference due to changes in the team) their application of story points will vary widely.  A feature that in sprint 1 might have been 5 story points might be 11 in sprint 3.  While this might not seem to be a big shift, the variability of the how the team perceives size will also be exhibited in the team’s velocity.  Velocity is used in release planning and iteration planning.  The higher degree of variability in the team’s performance from sprint to sprint, the less predictive. If performance measured in story points (velocity) is highly variable it will be  less useful for project planning.  Simply put, if you struggle to remember who is on your team on a day-to-day basis, story points are not going to be very valuable. 

External providers generally have strong contractual incentives to deliver based on set of requirements in a statement of work, RFP or some other binding document.  While contracts can (and should be) tailored to address how Agile manages the flow of work through a dynamic backlog, most are not, and until accounting, purchasing and legal are brought into the world of Agile contracts will be difficult.  For example, outsourcing contracts many times include performance expectations.  These expectations need to be observable, understandable and independently measureable in order to be binding and to build trust.  Relative measures like story points fail on this point.  Story points, as noted in other posts, are also not useful for benchmarking.  

Story points not the equivalent to duct tape. You can do most anything with duct tape. Story points are a team-based mechanism for planning sprints and releases. Teams with a rotating door for membership or projects that have specific contractual performance stipulations need to use more formal sizing tools for planning.

Story points make a poor organizational measure of software size.

Story points make a poor organizational measure of software size.

Recently I did a webinar on User Stories for my day job as Vice President of Consulting at the David Consulting Group. During my preparation for the webinar I asked everyone that was registered to provide the questions they wanted to be addressed.  I received quite a few responses.  I did my best to answer the questions, however I thought it would be a good idea to circle back and address a number of the questions more formally. A number of the questions concerned using story points.

The first set of questions focused on using story points to compare teams and to other organizations.  

Questions Set 1: Story Points as an Organizational Measure of Software Size

Story points make a poor organizational measure of software size because they represent an individual team’s perspective and can’t be used to benchmark performance between teams or organizations.

Story points (vs function points) are relative measure based on the team’s perception of the size of the work.  The determination of size is based on level of understanding, how complex and how much work is required compared to other units of work. Every team will have a different perception of the size of work. For example one team thinks that adding a backup to their order entry system is fairly easy and call the work five story points, while a second team might size the same work as eight story points.  Does the difference mean that the second team thinks the work is nearly twice as difficult or does it represent a different frame of reference?  Story points do not provide that level of explanative power and should not be used in this fashion. Inferring the degree of real difficulty or the length of time required to deliver the function based on an outsiders perception of the reported story point size will lead to wrong answers.

There are many published and commercially available benchmarks for function points include IFPUG, COSMIC, NESMA or MarkII varieties (all of which are ISO Standards).  These benchmarks represent data collected or reported using a set of internationally published standards for sizing software. Given that story points are by definition a measure based on a specific team’s perception and not on a set of published rules, there are no industry standards for story point performance. 

In order to benchmark and compare performance between groups, an organization needs to adopt a measure or metric based on a set of published and industry accepted rules. Story points, while valuable at a team level, by definition fail on this point. Story points, as they are currently defined, can’t be used to compare between teams or organizations. Any organization that is publishing industry performance standards based on story points have either redefined story points OR just does not understand what story points represent.

Barnacles attach to ships and add drag

Barnacles attach to ships and add drag

In earlier entries of the Daily Process Thoughts, I believe we have established that basic Scrum process can be defined using three roles, five events and three artifacts. This is the Scrum canon, much like the canon of stories that make up the Sherlock Holmes series. Many organizations and consultants have added practices to Scrum in order to meet specific needs (if those needs are real or perceived is an open question).  At the Scrum Gathering in Las Vegas in 2013, Dr. Alistair Cockburn called these additions barnacles; Ken Schwaber has written extensively about “Scrum buts. . .” and “Scrum ands. . .”. Barnacles grow on the hull of ships and create drag, slowing the ship or requiring greater energy to drive the same distance. However for all of the downsides of a barnacles they serve a purpose, deliver a benefit, or they would not exist in nature.  The additions to Scrum must compete and deliver value or they will be swept aside.  Several of the more common barnacles are related to defining and estimating work. They include:

  • User Stories, phrased in the now classic “persona, goal, benefit” format, are an addition to the canon of Scrum.  User stories provide a disciplined framework for describing units of work, which improves communication.
  • Story Cards, which generally include the user story, acceptance criteria, size and other ancillary pieces of information, provide a means to the organize units of work a team will be working on.  Organization of information provides a means of visualizing gaps and keeping track of work items so they can be communicated (and in my case, so they don’t get lost).
  • Story Points, are a representation of the size of a user story or any unit of work based on the collective understanding of the team that is being tasked to deliver the work.  The use of story points provides a team with a consistent scale that helps team members communicate about their perception of size.
  • Planning poker, a variant of the Delphi estimation process, acts as mechanism to structure the discussion of size and estimation within Agile teams to increase communication, ensure all relevant voices are heard and to control paralysis by filibuster.

Add to the potential additions technical practices like Test Driven Development, Behavior Driven Development, Continuous Builds and hybrids like Scrumban and the number of potential barnacles can grow quite large.  That is the nature of a framework.  Techniques, practices and processes are bolted on to the framework first to see if they improve performance.  As practitioners and methodologists we must insure that only those that provide tangible, demonstrable value are allowed to stay bolted.  Remember that each organization and team may require more or less barnacles to be effective.  Like the Sherlock Holmes stories, others have extended the canon with their own stories, practices and process.  Some are valuable and find traction, while others are experiments that have run their course.