The Things I Learnt about DevOps When My Car Was Engulfed by Flames


This is a true story, based on a talk from DevOps Days London 2016:

It was a gorgeous sunny spring day, my family and I were driving through my home town of Bristol, ready for another weekend adventure. We were cruising along when my wife said quietly: “I can smell smoke”. Now, I’m a good mechanic, I used to restore classic cars. The car we were driving was modern and recently serviced. I checked the instruments, everything normal. “It must be outside”, I declared confidently. Two minutes later tentacles of smoke were curling around my ankles and my shins were getting remarkably warm.

We can learn something relevant to DevOps, and many other disciplines, from this experience… read the rest on InfoQ


The Entirely Random DevOps Days London Write Up

There are some great posts out there on the superlative DevOpsDays London 2016 Conference. Of particular note are Manual PaisDevOpsGuys  and Helen Beal’s DevOpsDays London: Making Happy. These are well structured, balanced pieces which neatly represent each talk along with insight from experienced authors.  I hate to disappoint, but this post is not one of those. It’s a random collection of the things that peaked my interest at DevOps Days London.


What is Legacy?
There was lots of talk of legacy, the BiModal debate rolled on. Much like the sporadic agile wars many of the detractors use out dated definitions as convenient ammunition, Gartner’s understanding of the challenges seems to have evolved away from its original strategy towards an exploit and explore approach for recent and legacy systems respectively.  Defining legacy is challenging, with considerations beyond just the age of the software or system. There were a few definitions I really liked:
“Legacy is code you can’t iterate on as quickly as you need to” – Casey West
“Legacy is code you don’t have automated tests for. – Micheal Feathers
“Legacy is where your customer money lives” –  Bridget Kromhout


The merits of really reading
I keep having the same conversation, like being in a endless loop, it goes like this:

Person: “Yeah, I know all about Conway’s law, it is super insightful”
Me: “Exactly, that example about teams building compilers, that’s a light bulb moment”
Person: “….the compilers?”
Me: (Thinks) “Have you really taken time to understand what you’re advising people to do, or are you just reciting tweets?”
Me: (Says) “ Check out the article, there’s lots more in it”

It is like this with so many topics, REST, OODA, Learning. That’s why I was so pleased to hear the ever-eloquent Gareth Rushgrove call out the value of reading academic papers in his talk.  These days we are so prone to snacking on sound bites we seldom get the satisfaction of a full reading meal, yet our brains cry out for this kind of nourishment. I believe papers, and source material in general, are the best way to gain a firm understanding of a topic, particularly because they build a picture of what motivated the author, not just what they did.  Much like software patterns, understanding the intent and motivation is key to successful application.


A couple of talks touched on burnout, as did a well attended and lively open space. What surprised me was how many people had direct experience of it, it remains an issue in the industry despite raised awareness and talk of sustainable, humane ways of working. Oliver Wood talked about his experiences working so many hours that he slipped a disk. Keen that others may avoid the same fate, he created GoodCoderBadPosture.  During his ignite he reminded us “you are ephemeral , you are not highly available“.  My talk (Things I learnt About DevOps When My Car Caught Fire) used the analogy of looking through the windscreen of my burnt out car, all the instruments you normally rely upon to sense the world are warped and confused, your view is fogged and distorted. If only we could see metal strain as readily as we can bad posture.

I noticed much of the burnout open space was concerned with what management and organisations can do prevent burnout, and recognise it’s occurrence. This is a reasonable standpoint given that our behaviours are generally shaped by the systems we work within. However In the spirit of DevOps we should also note that it isn’t a problem for effected individuals and their mangers to tackle alone.  While organisations take action, we might also ask ourselves:

1. “How would I know if one of my colleagues was suffering from burnout?”
2. “How would I help someone I thought on the verge of burnout?”
3. “How do my own behaviours effect the likelihood of burnout in my colleagues?”

There was a nice note in Jeromy Carriere’s talk, and a potential answer to question 3: “Work hard to make every alert exciting” this has implications for burnout, exciting alerts implies only being disturbed for hard problems, not simple switch flicking exercises.

There was plenty of talk of change, particularly the danger of not evolving and experimenting.  Change strategies were discussed, including the value of heading into conversations well armed with data.  It was clear from their talk that Microsoft are changing in places, for instance setting up open team rooms or neighborhoods.  I rate this approach, it appears to balance team privacy, open communications and the distractions of full open plan.

“It is not necessary to change, survival is not mandatory” – Deming (Who wasn’t present!)
“The riskiest thing is not to change” – Joanne Molesky

The change theme included the importance of investing time in the most valuable activities, and how to discover them. It highlighted that many of those valuable activities are operational features, not just shiny new toys to please users. If you’re in the mood for self reflection you might give some thought to this quote from Bridget Kromhout:
“When evaluating yourself don’t forget to look at the value you are adding”

A conference, with a culture
The thing I love about DevOpsDays, and the way it’s organised, is that it still feels like a community event, sure it’s scaled, but the level of friendliness, inclusion and support are almost as the first time I spoke in Goteborg 2011. The story of this scaling and principles behind it were told by Kris Buytaert, it’s surprising how many of the early adopters are still active. The conference manages to short circuit a lot of anchoring and group think by giving almost 50% of the time to open spaces. Taking responsibility for the schedule out of the committee’s hands into the delegate’s ensures that topics are relevant to attendees, right then and there. The willingness of speakers to stay and participate in these sessions is key to their success and makes for some great learning.  Not bad for a movement that still can’t agree what its about.

The Final Countdown; Adjourning Agile Teams


When introducing agile  I’m sometimes asked to assist with the creation of teams, I’ll be asked questions like:  how many testers do we need?  Is ten people enough?  Who manages performance?  Should they wear shorts?  These are predominantly valuable questions, lucky covered at length elsewhere.  Something that’s very seldom planned is what happens when these teams disband.  And yet, without due care team members may end up demotivated, disappointed or feeling unappreciated.  In the age where everyone is fighting to retain smart people, and to transform their organisations, at the expense of a little forethought, it seems a high risk to take.

New teams are recognised  to move through similar phases, regardless of their domain.  Tuckman’s stages of group development is the ubiquitous cycle, suggesting that teams move through the following stages as they gel and become productive:
Forming – When a team gets together there is a buzz, there is expectation, and caution as they figure out their mission and their colleagues.
Storming – The team realise what they are up against, both from their mission, and each other, there is vying for position, conflict and resistance.
Norming – The team start to behave as more collaboratively, making progress towards goals, and developing team relationships, they are increasingly effective as a group.
Performing – The team is stable and performing its best, there is respect, understanding and a strong sense of shared purpose.

The sequence provides a useful heuristic for what to expect as a new team gets together.  Of course progress isn’t strictly linear, and teams iterate through these stages.  Something as simple a desk swap may prompt a little storming, with care though the overall trend remains towards performing.

Having studied and developed the concept, a decade later Tuckman added a fifth stage.  It is sometimes referred to as ‘mourning’ although I suspect that’s largely due to rhyme compatibility with the other stages, Tuckman named it ‘adjourning’.

Imagine this:  you go on a sunny holiday with a bunch of  new friends, sure the journey was a bit fraught, it took a while to get used to the new place, pace and lifestyle, there were some heated words along the way, in the end everyone is getting along, doing their thing and generally having a good time.  The holiday peaks when everyone gets together, sharing experiences and ideas, the energy is tangible.  Then one day you notice someone has left the group; “needed elsewhere” apparently.  Next day the hotel barman disappears, then you notice no-one is organising activities anymore.  The swimming pool is turning a curious green hue.  Two people wonder off because ‘there’s a more interesting holiday going on over there’.  The kitchen runs out of food – the chef is only there once a week, mumbling about “other priorities”.  The manager who used to stroll over enthusiastically and ask; “how can I help today?” now seems afraid to look you in the eye.  Slowly people drift off until the sense of fun and bonhomie are lost.  Triumphs forgotten, you kill time until the holiday reaches its end date.  A sense of loss and a slow fade to boredom becomes the overriding memory.

The adjourning phase in teams is important because it sets up the attitude, enthusiasm and levels of energy taken to the next team.  What is carried forward is largely based on the emotional response someone has looking back at the project.  The final weeks are particularly important because we are recent creatures, more recent negative experiences will replace older positive feelings.

Sensitivity to the adjourning phase is especially important during an agile transformation, and when introducing change in general.  In those early days a change initiative needs allies and evangelists to support and promote it.  Peer to peer recommendations are particularly respected, and people who have enjoyed a project help form a cohort of change agents.  The opposite is also true, word of a poorly handled team will soon spread, and be seen as part of the ‘new way’.

So you need to make sure that a team has a positive emotional response when they think about the project otherwise, regardless of what ‘facts’ or ‘reasons’ are given, it’s the emotional side that will determine whether a similar initiative is supported, or resisted.  I’d suggest the following:

Mark The Occasion – Lunch, cakes, a flaming aquatic Viking burial for the team board, anything that underscores that the project is done.  No need for speeches, but be sure to say thanks, recognise achievements, and just mark the last time the team exists in this form.  Often it feels more useful to do when the whole team can attend, rather than the calendar close date.

Close it down – Agree what work should be completed, rather than allowing that nagging unfinished, lost opportunity feeling.  Consider the tasks that will make it possible for other teams to work with product when handed over.

Retrospect – There are two motivations for this, firstly to gain learning and insight for future projects, secondly so team members feel like they’ve been heard and that things will improve in future projects.  A good option is to hold a ‘futureospective’ focus the retro on the future, asking each participant to choose a couple of initiatives they’d introduce to their new team.

Communicate – Often team changes are requested by outside influences,  it is good to soften the feeling that team changes are being ‘done to them’, especially in an agile environment which encourages self organisation and team responsibility.  If the team faces a slow wind down with people moving over a few weeks, explain why, ask for input into how the team’s remaining commitments and assets should be managed.  Again you’ll uncover solid ideas and increase engagement.

In the aspect of adjourning agile teams are no different to any other, except perhaps that they are expected to move through the stages of group development more often and more rapidly than their counter parts.  We should be mindful that our search of agility does not lead to disenfranchised groups and teams that never truly form due to prior poor experiences when disbanding.

Compared to the effort we put into forming a team the effort required for a successful adjournment is small, and the rewards are high; raised enthusiasm, engagement and even increased support for transformation.  So let’s not short change our teams, lets facilitate the closing phase of team life with as much thought and attention as the beginning.

An introduction to ChatOps

I first heard of something resembling ChatOps about five years ago when I had the good fortune to share a beer with Scott Chacon, one of GitHub’s co-founders, while I ranted about Deming, he talked enthusiastically about their fledgling organization.  Surprisingly, one of the things he talked about with most passion was Hubot a sort of robot butler who hung around in Github chat rooms serving useful data and whimsical content with equal aplomb.  It seemed a great concept, simple and powerful, it improved operability whist increasing knowledge sharing and encouraging collaboration.

I often wonder why chatOps doesn’t garner much attention, especially as it appears to have played an important part in GitHub’s success.  Perhaps that’s because everyone is gazing adoringly at Docker, or perhaps because ChatOps sits discretely and indistinctly on the boundary between Culture and Tools.

By way of introduction ChatOps combines three key technologies: Asynchronous Chat, Robot Assistants and Automation; let’s spend a moment looking into each.  (The pictorially minded may prefer to spin through my spring DevOps Summit talk where ChatOps was one of my ‘Collaboration Catalysts’.

Asynchronous Chat
Asynchronous Chat needs no introduction, it allows people to congregate in a virtual space to view and post messages and media.  These apps are a good way for a distributed team to collaborate, but there are more subtle advantages – chats can be saved enabling a searching and reference.  Chats allow broadcast, without the publisher having to manage their audience.  You’ll understand the value of this if you have ever been trying to chase down a gnarly production issue with your manager over one shoulder and Project Manager on the other asking for updates.  Oddly, the speed of work does not increase with the frequency of update requests, quite the opposite in fact.
In this situation chat could be used to broadcast progress, without having to manage a distribution list, when people monitor chat, the originator doesn’t get distracted, and may even get proactive support.

In the context of ChatOps it is chat apps which can be readily extended that really matter.  That’s because many of the operations performed will be specific to an organisations and it’s systems, processes and integration requirements.  HipChat FlowdockSlack and Campfire are popular options, and choices are often driven by the lingua franca of the development team.

Robot Assistants
Robot assistants lurk in chat rooms waiting to do the user’s bidding.  They may wait to be summoned by a specific command, or step in when they think it’s needed.  Assistants may grab things, like logs from production, or find out who is on call.  This reduces the interruption cost for a user, who is already thinking and collaborating in chat.

A good bot also recognizes the value of play, amusing features are almost mandatory, from adding a mustache to a photo, meme generation to playing tunes.  A useful side effect of this is it encourages folks to hangout in chat rooms, humor keeps people engaged, and generally engaged people are more productive and ready to innovative.  Notable bots include LitaHubot,  Err and Stack Storm.  Iron Man’s J.A.R.V.I.S is similar in concept, but somewhat less likely to inundate you with pictures of small miserable faced dogs.

By way of an aside, Terri Winnograd, who later went on to mentor one Larry Page, pondered the utility of robot assistants as early as 1970.  Perhaps he had a premonition of clippy when he wrote:

“I should reiterate that good programming systems do not demand a super-intelligent program. We can get by with a moderately stupid assistant as long as he doesn’t make mistakes. The degree of Al needed is much less than that needed for a full-fledged natural language or vision system”

Automate, mate.
The third component is Automation.  Hooking the bot up to automation, and other deployment and operations tools, is where things get really interesting.  If a bot can integrate with search engines and meme generators, why not link it to development environments, perhaps even production?  Then, if people are discussing a thorny deployment problem they can call in logs, graphs and pertinent data.  The chat room, becomes the war-room; distributed, observable and documented for later learning.

Perhaps the pinnacle of ChatOps is allowing deployment orchestration through chat.   As Jessie Newland describes it succinctly in his highly recommended ChatOps at GitHub talk “Chat becomes the primary control surface for ops”  not only is it is convenient, but a chat client is more portable.  Chat can also serve as a layer of abstraction over the under laying tech, enabling it to change and evolve independently of the commands driving it.   This abstraction opens an opportunity for training, enabling production commands to be executed against a sandbox.  Of course, there is some risk to be considered, and it is possible to restrict commands to people or rooms.

Still not impressed?  In the same talk Jessie outlines a scenario where he makes a deployment, observes a problem, orchestrates load balancers, fixes and redeploys.  Impressively, it all happens in chat, all while keeping his team updated, and leaving a record, with minimum extra effort.

More than just tools?
Looking beyond tools, ChatOps brings more to teams than mere efficiencies. ChatOps liberates Institutionalized knowledge once locked in the heads of key, time challenged, individuals.  Once in the open, ways of doing things can be inspected and built upon.  This isn’t necessarily a threat to those people; often freeing them up to tackle more challenging problems.

ChatOps can be an excellent training tool.  Like the gallery trainee doctors use to observe a surgeon at work, chats can be reviewed and replayed for education.  Need to know how something is done?  Check the archives, and look at the commands used last time, or ask in Chat, someone can demonstrate directly, and show everyone else at the same time.

Having written this, I realize I have to some extent answered my own question: Why don’t we hear more about ChatOps?

Effective ChatOps requires maturity of culture and tools.  Even small things, like knowing more senior or experienced people are able to see, and potentially respond to, every comment, takes some getting used to for both parties.  The organisation’s culture must encourage the openness which allows productivity to thrive in the chat.  As such, striving towards ChatOps may provide a useful mechanism to highlight organisational and cultural impediments.  To make operational features available in chat requires not just trust, but investment in tech, safely connecting all those moving parts is not trivial.  To the many organizations who struggle to deploy once a month, ChatOps must seem like a distant Nirvana.

Despite the necessary investment ChatOps can bring many benefits, and can do it unobtrusively, at a pace of change that suits the community.  Using Chat as gateway to operations, adding capabilities when it is considered safe to do so, is an excellent way to introduce and observe new ideas.  ChatOps invites collaboration, and not just because it’s novel.  If all the engineers, regardless of title, hang out and work in the same space it helps build an appreciation of other’s challenges and responsibilities, not to mention attitude and sense of humour.


Five Gators Of The Apocalypse?

Wacky gators arcade machine

I generally dislike war and military metaphors for team and making activities.   Admittedly IT has a lot to learn from the military in terms of teams and scale, but in the wrong hands these metaphors seem to encourage unproductive conflict and counter-collaborative behaviours. This strikes me as odd because although prepared for conflict, the military spend much of their time avoiding or minimising it.  However, I do need to call upon a slightly violent metaphor to describe the relationship between constraints encountered when building a continuous delivery capability in an organisation.

The process of change reminds me of the nineties arcade game Whacky Gators, where a number of cheeky gators poke their heads out of caves, and you biff them with a large hammer, hands, or other appendage depending on personal preference. You never know which gator will appear, or when, and more than one might show up at once.

When encouraging continuous delivery (and by extension DevOps) those gators might be named: Culture, Tools, Organisation, Process, and Architecture.

These five are interdependent constraints, each affecting the other.  However, while inside Whacky Gators is a fairly simple randomiser determining which gator will surface, behind the scenes our organisations look more like a hideous HR Giger meets Heath Robinson mash up.  We can’t readily inspect them to determine what to change.

My theory is that when one constraint is eased it will reveal a new constraint in a different area. This is a tenet of most agile and learning methods – surface a significant issue, deal with it, see what surfaces next.  Often a method, and our expertise, focuses on just a couple of areas, we’re well versed at solving problems with technical solutions, or just improving our own team’s capability in isolation.

A good continuous delivery capability involves the whole engineering organisation (a great one involves the entire organisation). This means it is crucial to consider all five constraints, and when there is a problem be ready to shift focus and look for the solution in one of the other areas.  In fact, this simple shifting may lead to the root of a problem.  Do reams of process indicate a risk adverse culture?  The solution may not be more process, but a different attitude.  Are those tools covering up or compensating for some thorny, unspoken issue no one dared to face?  When trying to improve delivery capability there may be a temptation to replace an old tool with an improved version, maybe the need for that tool (and associated overheads) can disappear with an ounce of process and a healthy dollop of collaboration?

Returning to our Whacky Gators metaphor, the big question is how are you playing?  Do you simply wait for that same familiar gator to return?  The one you’re most comfortable dealing with?  Do you hover where you are comfortable while other opportunities pass by or, are you nimble, and brave enough to tackle each constraint as it appears?

While I was looking up Whacky Gators, I couldn’t resist a peak in the machine service manual, there I found this uncanny quote on success, as applicable in the game, as it is in change:
“The player does not score extra points by hitting harder; a light touch will activate the circuits and will lead to higher scores.”


Measure for Measure – exploring DevOps adoption metrics.

Confession; I find measuring stuff a fascinating challenge.  Sometimes measuring is straight forward, like the fuel gauge on your car, but often times it’s more complex.  The volt meter in your car quietly drains the battery while measuring its health.  The motivation survey in your inbox will quietly change your motivation.

Its termed the observer effect, where the act of measuring affects the thing you’re trying to measure.  Measuring, or even just assessing, the output of groups is similarly taxing, even the act of posing a question can project your own biases.  Last year I got interested in measuring the progress of steps towards DevOps culture. At Nokia Entertainment’s MixRadio development emporium we’d had good continuous delivery tools in place for months, but weren’t certain our culture continued to improve.  Complacency was a risk, but we couldn’t tell how large.  It seems one of the hardest parts of change is keeping things going in the period between that initial burst of enthusiasm and when practices truly take root as habit. 

So, I shared my thoughts at DevOps Days London, and received some really useful feedback from the crowd there.  I’ll let you into a secret though: I wasn’t happy, it just wasn’t rigorous enough. Paul Swartout and I created metrics focused on adoption, we wanted simple, no cost methods that anyone could use, without needing a big budget or corperate sponsorship. We called them ‘Vital Signs‘  and they comprised; Cycle Time, Shared Purpose, Motivation, Collaboration and Effectiveness.

The main aim was to benchmark, ready to see which way our desired ways of working were trending. However I wanted to capture elusive things: just how ready was the team to ignore organizational setup and work together?  We also didn’t want to bias for DevOps, Scrum, Kanban or any of our other preferred methods,  if someone found a better way, we wanted to learn.

The art was to find metrics in which these desirable behaviours surface, and of them only cycle time was measurable with any consistency.  We learnt an awful lot from the other metrics, particularly free form comments.  The problem was all that prose was impossible to graph, impossible to track.

Frustratingly those things that are hard to measure are amongst the most critical. They often indicate how long you can sustain a pace or practice.  It is very easy to focus exclusively on productivity, but you might be slowly killing your workforce, as Amazon recently discovered.  In general engineering teams aren’t a temporary construct, they need to be looked after for longer than the holiday season.  Engagement and well-being over time are going to drive quality and productivity as much as anything else. (Pseudo science here).

So why raise this now?  Well, I was enjoying a coffee at the DevOps Café , and was interested to hear a side remark about metrics by the ever eloquent Damon Edwards and John Willis.  They described the following as their ideal set of metrics:

  • Cycle Time – From customer report to change in production.
  • Mean Time To Detect (an issue)
  • Mean Time To Repair (or make a change)
  • Quality at source – or escape distance, how far do errors get before they are noticed?  Worst case: customer.
  • Repetition Rate – Does the same issue keep happening, or are we learning?

Used together, these are just genius, because it’s very hard to achieve good results without a healthy productive relationship between teams.  Furthermore it doesn’t matter how you describe what you’re doing: DevOps, OpsDev, Agile, Scrum or one great big group hug – those metrics don’t test adherence to a methodology.  I guess mavens like these will often drive common practice, and these metrics are very much evident in Puppet Lab’s excellent surveys, (some words on the magic here) (for DevOps archaeologists* early John Allspaw thoughts here).

But there is one metric which went un-mentioned, my old measurement nemesis; engagement.  I suspect that you could be proudly watching all the above metrics trending positive, and be rudely awoken by burn out or a rash of exit interviews.  To avoid surprises shouldn’t the impact of change on key people be monitored too?  Retention is a favourite for this.  A good indicator, but actually people leave for a lot of other reasons.  If someone departs for a role closer to the best trails it should not be seen as the first sign of DevOps culture crumbing into nothingness.

So while it seems DevOps operational metrics are mature, there is more work to be done to understand if we’re getting results and simultaneously creating a healthy, sustainable, culture.  That suggests three dimensions to measure for DevOps, and other flavours of adoption.

  • Efficiency – Our key measures like cycle time, and mean time to XYZ, are they improving?
  • Effectiveness – Is the right kind of work being done, and steering the team towards success in their organization’s field?
  • Engagement – Have we created an environment for people to be at their best?  Are we making the most of our talent?

Of course, these three need to be balanced – focus on one could easily be to the detriment of the others.  Measuring engagement, culture change and people things will always be hard, and methods flawed, but we should avoid measurement inversion, and strive to measure things not just because they are simple, but because they are valuable.

* Note to recruitment agents:  DevOps Archaeologist is not a real role, don’t go there.