Disciplined Continuous Improvement

Three techniques to constantly keep moving forward

Most cheesy picture I could find Googling for “continuous improvement”

The best teams I’ve worked with in my career share one thing in common: they’re very disciplined about improving themselves (as people and as a team), how they work (process), and their environment (the other parts of the organization they work with).

For the last nine months at Egnyte, I’ve primarily worked with the partner integrations team. This team runs pretty well, and as a result the question arose whether there are any best practices to share with other teams — this got me to reflect a little.

My first reaction was: sure, there are specific things other teams could borrow. For instance:

  1. Our technical-debt day: one day in-between sprints where developers work on things they feel are important to address (often technical debt).
  2. Our specific definition of ready and definition of done.
  3. Our specific JIRA workflow and Scrum board layout.
  4. Communication policies we’ve set up between other departments and ourselves.
  5. Code review guidelines (expectations from pull request issuer, reviewer, what it means to “approve” a pull request etc.)
  6. The dashboards we use to monitor our services.

But then I realized: there may be some value there, but every team is different, even for this particular team these practices these aren’t static— they evolve over time as challenges, pressures and context change.

However, there is a more “meta level” practice that I believe will benefit every team —now, a decade from now; whether they’re dealing with greenfield or years worth of technical debt.

None of our “best practices” fell out of the sky, they are all results of the mechanisms we put in place for disciplined continuous improvement.

What do I mean with disciplined continuous improvement?

  • Disciplined: definition: “having or exhibiting discipline; rigorous.” You don’t do things when it’s convenient; you don’t do things when you have time; you do them always, without exception.
  • Continuous: not in bursts, not during breaks, but: all the time (for some granularity of “time”), in small steps.
  • Improvement: looking for things that are not (yet) awesome, and take (small) steps to get them to be.

There are many ways to implement disciplined continuous improvement.

Here are just three things ways that I’ve applied (and have seen applied by others) successfully:

  1. Retrospectives
  2. Root-cause analysis
  3. 1:1 with team members

1. Retrospectives

Technically, this is a scrum thing, but even if you’re not a scrum fanboy (or fangirl) — even if the word makes you puke, please adopt just one thing: retrospectives.

A retrospective is a recurring team meeting (in scrum at the end of every sprint, so e.g. every two weeks), in which you look back and evaluate the last iteration of work: what went well, what didn’t go well, and most importantly: what can we do to do better next time?

There are many formats for running retrospectives, and there’s a certain art to facilitating a successful one. The way I usually run retrospectives is pretty straight forward, it consists of five phases:

  1. Follow-up on the action points from the last retrospective to verify we implemented all successfully (there’s no point in coming up with actions without follow-through).
  2. Brainstorming: everybody individually writes potential topics of discussion on sticky notes.
  3. Sharing and organizing: One by one, people put up their stickies (grouping them on-the-fly) in one of two sides of a white board (or wall): “Awesome” — space for praise and sharing happiness; and “Not yet awesome” — things we can still do better (or suck at).
  4. Dot voting — everybody gets to vote on things they would like to discuss further, by putting a big dot on the specific sticky.
  5. Discussion of top-voted items, writing down specific action points and assigning them to specific people during the discussion.

An essential thing to note here is that action points don’t need to be wholesale solutions. The goal is not to solve the problem completely — that would be great, but it’s not required — the goal is to make progress, to take a step forward.

Taking a small step is better than nothing at all. Sometimes a small step is enough, sometimes the same issue bubbles to the top again in the next retrospective, and you come up with another solution. Experimentation is encouraged.

To give you some sense of what can come out of such sessions, some issues raised during retrospectives and steps towards solving them:

  • “I get too much email and it takes me too much time to process!” — step: specific rules on what to cc the whole team, and what not to (to reduce the amount of irrelevant email), as well as markers in the email body like “FYI” or “Action required” (to make emails more skimmable).
  • “Tickets get stuck in the ‘verification’ step and as a result we don’t deliver anything!” — step: adopt the idea of “constraints” from Kanban, and if more than x tickets are waiting for verification, everybody (including developers) jump in to help verify tickets.
  • “Too much technical debt is accumulating!” — step: introduce a tech debt day during which everybody works to reduce technical debt.

The last case is a good example of “sometimes a small step is good enough.” Honestly, when we came up with the tech-debt day concept, my assumption was it wouldn’t make a big difference. But after about 3–4 months of using this idea, we have a lot of improvement to show for it. Small steps are awesome.

2. Root-cause analyses (5 Whys)

Sometimes disaster strikes. Disaster can many forms, ranging from a significant production outage, or (for pure development teams) not delivering an important feature.

In my career thus far I’ve applied (and have seen other apply) two approaches to handling disaster:

  1. Fix the problem, then keep your fingers crossed it won’t happen again.
  2. Fix the problem, use it as a learning experience, and take all possible measures to ensure a similar disaster cannot happen again.

I will admit that early in my career I opted for #1. It may sound ridiculous, and of course, nobody explicitly chooses the “keeping fingers crossed” part. However, fixing a production outage can be an achievement in itself, and often we dismiss the chances of it reoccurring: surely the same thing won’t happen again will it? Guess what — it definitely will, and sometimes sooner than you think. And, when it does happen, how dumb will you feel?

So, here’s my perspective on disasters today: yes, they suck — they’re stressful, they’re not fun. But, if you don’t take them as a learning experience, they are a pure waste of time. If you do take them as a learning experience, you may actually, one day, be happy they occurred.

So, how do you learn from disaster?

  1. Find the root cause through the 5 Whys.
  2. Create action points to address the root cause in such a way it won’t be an issue again. Extra browny points for also separately addressing all intermediate causes as fallback mechanisms.

In principle, conducting a “5 whys” session is simple — in a sense, the name says it all: ask “why?” 5 times, and find out the root cause. Boom.

What can be simpler?

I’ve conducted about two dozen of such sessions, and here’s what I have found so far:

  1. It is sometimes surprisingly hard to steer the analysis in a productive direction. It’s quite common that the analysis branches out in too many directions, and it takes some skill to predict which ones lead to helpful results.
  2. More often than not, the “5 whys” leads to unexpected root causes. Everybody has their assumptions before: “we don’t have to do 5 whys this time, I know the source of the issue!” However, more often than not, we end up with some interesting findings.

And the last part is where the discipline part comes in. Don’t do these analyses just if you have time, or when it’s convenient, or if it’s a new issue. The disciple aspect is to do it every single time. Especially if an issue happened before, and you already did the 5 whys on this topic. In that case ask yourself: why the hell did it happen again? Clearly, your last analysis didn’t find the root cause, or you didn’t execute on your action points from last time (or they weren’t very good ones). Even this can be turned into a learning experience.

3. Regular 1:1 meetings

The previous two techniques can be introduced into any team by anybody. They’re obviously great ideas, so who would object? The third technique, the regular 1:1 (“one-on-one”) meeting, is more of a “manager thing.” If some random team member would start setting up weekly 30+ minute individual meetings with all other members, this may not be received well. This is one of the perks of the “manager” title. You get to organize whatever meeting you want, because that’s what managers do, right?

The way I conduct 1:1 meetings is primarily inspired by “The Update, The Vent, and The Disaster” by Michael “Rands” Lopp — currently VP Engineering at Slack. Full disclosure:

Everybody has their own way of conducting 1:1 meetings. I always start mine with the same question:

“How are you doing?”

As a result we may talk about how they sleep badly because of their newborn; about their attempts to study machine learning, so they can realize their billion dollar idea; about how specific people are super aggressive in their code reviews and this is demotivating; about how they think the product is taking a completely wrong direction; about how certain people are undisciplined and never arrive at work on time; about how they never felt better working only on front-end stuff, and now they know this is their calling in life; about how technical debt is killing us; about how he said this and that, and how stupid that is; about how the whole world is going to hell; about how he just broke up with his girlfriend.

People often ask me how I became so smart (true story).

The answer is simple: I massively-parallelized and outsourced my intelligence. Of course, I will not deny my own amazing intellect (they don’t hand out PhD titles with cartons of milk — at least not in my convenience store), but the real secret is that I listen to all people in my teams (and outside, for that matter). They see things I don’t; they think in ways that I don’t; they have concerns that don’t; they have ideas that I don’t. Diversity, bitches! All this input combined surfaces yet more things to improve.

Wouldn’t you get the same thing from a retrospective? That’s where everybody shares their concerns and ideas for improvement as well, right?

Theoretically: yes, but in practice: no. For two reasons:

  1. Not everybody has the same level of “presence” in meetings. The fact that people don’t speak up, doesn’t mean they don’t have something to say. I often hear new ideas or concerns in 1:1 meetings. Often we agree to bring them up in a retrospective to have the whole team discuss them — many of these ideas are eventually implemented.
  2. 1:1 meetings are a better place to discuss half-baked ideas and pre-escalation issues (problems before they become real problems). In a 1:1 setting, bouncing ideas off each other can be a more fruitful environment than a group setting.

Like with retrospectives and 5 Whys — discipline is essential here too. 1:1s shouldn’t happen when we have nothing better to do, or scheduled for every other week, but canceled half of those times. They should happen every single time. Until it becomes unsustainable, I will have 1:1 meetings with everybody in my teams every single week, and only cancel in rare occasions (but to be honest, I can be more discipled here).

So there you have it: three techniques to implement disciplined continuous improvement. Let me know what you think — do you know of more techniques to continuously improve?