Measurement for Scrum – What are Appropriate Measures?

We’ve seen the risks of assuming that everything is normal distribution, and also the problem with reporting a single number. What else do we need to be aware of? What can we usefully measure?

Risks in Measurement

Many important things can’t be measured using formal measurement systems. As a result, not enough attention is paid to them. For example, Technical Debt – there are some attempts to measure code complexity but they don’t work well, so organizations sweep the problem under the rug until a code base becomes too complex and expensive to maintain. One of Nassim Taleb’s key points is that Black Swan events couldn’t have been predicted by mathematics before the event, however they look obvious in hindsight. Only better questions will help spot a risk in advance.

Always consider how your metric will be gamed. “Tell me how you measure me, and I will tell you how I will behave” - Eliyahu Goldratt[1]. People will do what it takes to meet what they believe they’re being measured on, so we should only use metrics that, if gamed, we would be happy with the result. Example: if we measure Velocity (number of Story Points per Sprint), teams will rapidly find ways of inflating their Velocity (estimate higher, reduce effort on testing, etc.) in ways that produce a weaker product.

People will do what it takes to meet what they believe they’re being measured on.

Focus on a change or trend, and not the raw number itself. A change in a metric is just a hint to go look and see. It doesn’t, in and of itself, tell us anything. For example, Unit Test code coverage (expressed as a percentage) goes down over the course of one Sprint. This may be bad – perhaps new code was introduced without accompanying Unit tests. Or it may be good – well tested code that was redundant was eliminated. So a metric is just a hint to have a conversation.

The Six Sigma[2] manufacturing process makes significant use of metrics to reduce defect rates by eliminating variability. But applying that same approach to software and product development would have a disastrous effect. Knowledge work, e.g. the work of building products (software or otherwise), is hard to measure because the nature of the work is inherently variable. What’s the standard time to implement a feature? There is none. How long does it take to fix an average bug? There are no average bugs. When measuring, we need to be careful not to use the measure to eliminate the variability inherent to the work.

And then there’s Vanity Metrics. Be careful of metrics that measure things that the customer doesn’t value. LeanKit shows that focusing on 5 ‘9s’ (99.999%) up time might cause us to put energy into solving the wrong problem. In the LeanKit example, 5 ‘9s’ is the goal, and when customers complain about downtime, the DevOps person notices that 75% of the down time occurs when the nightly maintenance scripts are run. So they put time into improving the scripts to eliminate the problems, but the customer complaints don’t go away. Yes, the complaints were about the percentage of downtime… but about downtime during the day! Focusing on the evening downtime misses the point. In another case, American Airlines has promised a response to customer complaints within 20 minutes. However, many of the responses are clearly canned e.g. if you mention power supply problems on a particular plane, you’ll get a response back that suggests that other people in your row might have been using the power, along with a link to the airlines FAQ. Instead of a useless response within 20 minutes, what the recipient really wanted was to know that maintenance had been sent a ticket for the problem.

Recheck your metrics frequently (at a minimum, every 3 months) and ask if they’re still relevant. Ask if they’re still helping to drive change. Drop metrics that are no longer helping.

Effective Measurements in Agile Teams

When considering effective metrics, ask:

  • How will this measure help our customer?
  • Does this measure have a “best before” date?
  • How will a smart team game this metric? (And if they game it, will we be happy?)
Potential MeasureWhat is it trying to measure?How does it help?How will it be gamed? Other concerns?
Defects that escape the SprintQualityIt sends a strong signal that zero defects is desirable.People might argue about whether an item is a defect or just a new User Story.
Static Code Analysis (e.g. tools like Sonar and FindBugs, PMD)Quality by doing static analysisStatic analysis tools can spot small mistakes in code and warn developers of potential bugs.Static analysis can give people a false sense of security since the tools only check for bugs that can be proven by examining the code – they don’t check for logic, implementation or intent errors. Tools are often used to summarize health with one or two numbers. As usual, the number is useful if it tells you to roll back the covers, but risky if you assume it means an area of your code base is safe.
Test CoverageHow much of the code is visited by automated tests (unit or acceptance tests)It can help spot untested or even unused code.People often assume that code visited by a test has been tested. All a test coverage tells us is the code was used by a test. It shows nothing about whether it was tested.
Customer Satisfaction measured via NPS (Net Promoter Score)Whether the customer would be happy to recommend your product to their friends on a scale of 1-10. They’re then asked why they choose that number.Are you delighting your customers?It can only be measured infrequently, so information arrives a long time after your work has been done. It doesn’t correlate well to the product work your organization has done. The power is in the answer to the “why” question.
Team Member HappinessEach Team rates their happiness with the Team or Organization on a scale of 1-5.Happier people are more productive and focused. Jeff Sutherland has correlated Happiness with an increase Productivity and Revenue. When a team member’s happiness dips it can be a warning sign that something we’re doing is harming our organization.Happiness can sometimes become a trap – a team’s productivity plateaus and they’re happy.
#### Cycle Time
Cycle TimeHow long it takes an item to production from the moment the team starts working on it.Work in Progress – whether it’s in development, test, documentation or some other state delivers no value to the customer. Work only delivers value when the customer has it.
Lead TimeLength of time from the moment the idea is received to when it is delivered to the customer.Also takes into account the time an item sits waiting in a queue (Product Backlog,or otherwise before the team works on it).
#### Interruptions
InterruptionsHow many times during the Sprint the team(s) have to deal with…Creates awareness around the rate of interruption for the time.
Support Issues…a support issueHighlights either quality (product bugs) or usability issues (feature isn’t used as the team intended).
Non-Product Backlog work requested…work items that didn’t come from the Product BacklogHighlights quality work that is bypassing Product Ownership and Portfolio Management.
Other Interruptions… interruptions that are not part of the previous categoriesHighlights Dark Work and other things that are distracting the team
#### Dangerous Measures
VelocityAverage number of Stories or Story Points completed per SprintCan be used to forecast when a feature will be released. It can also help a team spot long-term trends in their own work.It is often misused to compare teams or measure productivity, which leads to a culture of velocity above all else (especially quality). If used, it should be used by only the team itself.
Bug/Defect CountNumber of Bugs logged in on the bug tracking systemA measure of qualityAt best, measure defects in quality on the surface of the system. Missing: - Doesn’t measure the internal quality of the code - Never complete since we will never find all the bugs - Stale Bugs that can be reproduced or are in parts of the system that are no longer used cloud the measure
Return on InvestmentRevenue per Feature divided by CostCan’t realistically be measured.
ProductivityCan’t be measured, and if it was, it’s unclear how it would help guide teams.
More Potential Measures
Ratio of Fixing Work to Feature WorkMeasure very roughly the amount of time spent on fixing bugs vs new feature work. This shouldn’t require tracking hours, we want a rough ratio.Helps us understand where effort is being spent. Helps a PO make better decisions about where they’re spending their money.
How long does it take to test end to end?How much testing would it require to have confidence in release?Reminds us of the overhead/tax we need to pay before releasing our product on the world. Since it’s part of cycle time, we should feel pressure to reduce this from days to hours.
Release FrequencyHow much time between releases?Reducing time between releases can help us get paid more often.
Backlog change rateHow often are Product Backlog items added, removed, reprioritized, etc?Too little and we should be concerned that we’re not building the product the customer wants. Too much and we should be concerned about our ability to build a stable product.
Skills Matrix/Team LearningA skills matrix is a tool to help the team understand where it has a lot of knowledge and where it is weak. To measure, just look for where we have changed in the last period of time.It can help nudge people to spend some of their learning time in a specific area where the team is weak.If team members perceive they’re personally being measured on this and this measures ties to their performance review/pay, they inflate their self-rating. So rate of change in Skills Matrix or Team learning is only useful if it is used by the team for their own needs.
Cross Team DependenciesThe number of things that one team needs from another team.The more things we need from other teams, the less freedom we have in prioritizing work (lead time is increased, etc.) The measurement allows us to spot the kinds of dependencies we suffer from. If we haven’t already, this measure should force us to discover Feature Teams. And if we have, it should force us to invest in skills where we still depend on other teams.
Number of recent experiments runCount the number of experiments run since the last measurement.Agile Organizations are continually challenging the status quo. Counting the number of experiments can act as a reminder to run them and measure the results.Experiments for the sake of experiments is also bad, so the trend doesn’t matter as much as being reminded to run experiments.
Unfinished Stories, aka Adherence to Definition of DoneHow many Stories at the end of sprint didn’t meet the definition of Done.Helps by forcing team members to focus on getting fewer stories to truly done every sprint.By splitting stories into smaller chunks, teams are less likely to have unfinished work. This usually results in greater flexibility for the Product Owner and better throughput – so gaming this measure is a good idea.
Stories that have been DeployedHow many Stories have been deployed?It forces team members to focus on getting small chunks of valuable work to production where we make money.
 

As Nassim Taleb points out, the best measures still won’t measure your biggest risks. Risk registers are popular, but since their construction isn’t scientific, they suffer from cognitive biases[3][4].

Some guidelines to help with measurement:

  • Assume all measures will be gamed.
  • Measure the simplest things that might help.
  • Fewer measures create simpler systems and simpler systems create less complex solutions.
  • Usually the real story is hidden behind the measure, and the measure was just a hint about where to look. Use measures to hint about what questions to ask.
  • Don’t measure performance, measure outcomes.
  • Assume all measures eventually stop delivering value and should be dropped.
  • Don’t burden your team members to collect your measurements.

In general, Scrum favours qualitative over quantitative measures. Measurement doesn’t replace watching and listening.

Mark Levison

Mark Levison

Mark Levison has been helping Scrum teams and organizations with Agile, Scrum and Kanban style approaches since 2001. From certified scrum master training to custom Agile courses, he has helped well over 8,000 individuals, earning him respect and top rated reviews as one of the pioneers within the industry, as well as a raft of certifications from the ScrumAlliance. Mark has been a speaker at various Agile Conferences for more than 20 years, and is a published Scrum author with eBooks as well as articles on InfoQ.com, ScrumAlliance.org and AgileAlliance.org.

Get Certified

Explore what Scrum is and how to make it work for you in our Scrum Certification training. Hands-on learning will guide you to improve teamwork, deliver quick feedback, and achieve better products and results.

About this course

Focuses on the role of the team and the ScrumMaster. Get the skills and practical experience necessary to improve teamwork, take the exam, and advance your career with a certification that is in high demand today. Often the best fit for anyone new to Scrum.

Learning and Benefits

Relatable Scenarios

Learn on-the-job applications of key Scrum concepts, skills, principles, along with practical solutions that you can apply the next day for difficult, real-life situations.

Respected Certification

Everything you need to earn your Scrum Alliance® ScrumMaster certification, including exam fee and membership, and so much more.

Practical Exercises

With focus on the challenges that real teams face, and tools to dig deeper. You don’t need more boring Scrum theory. You need something you can sink your teeth into to see immediate results.

Jargon-Free Learning

This workshop is not just for software development or people with a computer science degree. We’ve helped many non-software teams with Scrum.

Career Advancement

Use Scrum knowledge to standout at work, get paid more, and impress your customer, all without burning out.

Ongoing Support

Our active Scrum community forum is a safe place to ask questions. Long after you earn the Certified Scrum Master certification, you will have access to the forum, course materials, and additional valuable resources.