One of the questions people tend to have when thinking about story point estimation is how many hours are in a story point. With some people, they say they're trying to estimate the stories in hours, and then figure out how many story points they are from the hours estimate. I don't have a problem with that, as long as they really understand how story points relate to hours. When they don't understand the relation, they tend to think a simple conversion like 5 hours per story point is sufficient for estimating in points this way. I disagree.

I came across a Mike Cohn's blog entry titled How Do Story Points Relate to Hours? where he discusses the relationship between story points and hours. To summarize his thoughts, he states that the bucket for x-story-point estimates represents a distribution of hours. For example, the five-story-point stories estimated by a particular team would relate to a range between 15 and 30 hours. The hours distribution for one-story-point stories is going to be different than the distribution for the eight-story-point stories.

What this also means, is that the distribution in the smaller story point estimates should be much tighter than the distribution in the larger story point estimates. Estimating the size of a story that asks for a new label and a new entry box on a web page should be much easier to estimate than a story that asks for adding single-sign-on support across multiple web applications. To illustrate this concept to my team, I decided to collect data from our last 6 iterations and graph the different story point estimates against both the estimated hours, and the actual hours.



If you were to take a collection of estimated stories from a team and come up with a number like 5 hours per story point, then you would be assuming that all stories are estimated with the same level of accuracy. In practice, the larger stories contain more unknowns and therefore suffer from less accurate estimations than the smaller stories. The graphs above demonstrate this and tell me that each story point level (1,2,3,5, etc) has it's own level of accuracy and using a single conversion for estimation is not a good idea. Other than helping me illustrate the hours relation to my team, how else can this information be used? It can be used to show product owners how important it is for us to break stories down into smaller (hopefully easier estimated) stories.

You can download my spreadsheet here. I've provided a legacy xls format here. Just plug in your team's numbers and see how it looks.

14 comments:

  1. Excellent post, Mike. I love that you gathered the data on this. I've just tweeted about this as I think this is important for people to see. Thanks for sharing it.

    ReplyDelete
  2. Very cool and helps put into context the reason why hours is not a good reporting metric.

    It is also interesting to see that you found some value in tracking the actual time spent which is somewhat contrarian to some of the purists out there. Understanding where you are and what you are doing helps you to find ways to improve your view of the future.

    ReplyDelete
  3. Not to interrupt or hijack the discussion but would you mind mentioning what you used to generate the graphs? I'm a new scrummaster and I'm in need of sw apps for producing docs for work and for training.
    Thank you

    ReplyDelete
  4. Please disregard post above. My bad, I missed your last paragraph...

    ReplyDelete
  5. Mike, this is good work. It is always a pleasure to see data in support of assumptions.

    With your data, could you extend your assertion even further and state that the total variability of two stories (2 and 3 points, respectively) is less than the variability of one 5 point story?

    ReplyDelete
  6. I like it. When rolling up many, many User Stories in a large project, what is the net effect of the distribution of hours? Can we just use averages (ignoring the distributions) since the variances will cancel each other out in aggregate?

    ReplyDelete
  7. Hi Alex, you've proposed an interesting question about how additive story points are. My initial thought is that the variability of 1 five-story-point story is going to be greater than the variability of a two-story-point and three-story-point combined.

    I'll put the data together and see. I've often wondered if that was the case.

    ReplyDelete
  8. I would actually question this whole approach mainly due to two things

    1) I don't think this data is of any benefit other than comparing estimates to actuals which is mainly used as a bad metric to evaluate people and projects

    2) It stops or significantly hinders and then hides the hunt for process improvement and the demonstrated increase in velocity which motivates teams.

    Bob Schatz,CST
    Agile Infusion

    ReplyDelete
  9. thanks for the spreadsheet !

    ReplyDelete
  10. Very nice blog post and well done!

    Would it be possible to offer the spreadsheet in the legacy xls-format? Or maybe even ods?

    I'd like to know, if the data set was large enough. And maybe you can publish your thoughts on variability, like promised above ;-)

    ReplyDelete
  11. I added a link for the xls format. Good idea!

    Thanks for the reminder on the variability. I've gathered the data on the subject and will post it soon.

    ReplyDelete
  12. Great article with real data... thanks. I think it also articulates and demonstrates the risks of leaving large stories or epics in your backlog which haven't been broken down, but which may have been committed to a release. A single 5-pointer is much riskier than five 1-pointers, of course!

    ReplyDelete
  13. Hi, a nice article and spreadsheet you have here. I was just wondering if you also did a test for normality of your data? In other words: were the estimated and actual hours for each story point category normally distributed? Thanks!

    ReplyDelete