Deployments should be non-events

The problem with “traditional” release management

I say “traditional” release management for the lack of a better categorisation. What I mean by that is having separate phases (and teams) for development, system test, user acceptance test (UAT) and finally production. The problem is that it just takes too long after the developer commits his code until it gets eventually deployed to production. In my experience, this is mostly due to delays in hand offs from one phase to another. And as we all know, thanks to Lean Thinking this is waste.

Why you (as a developer) should care

The deployments to these different environments are usually stressful, because they bare some amount of risk of taking down the system you are deploying to. And the closer you get to production, the higher the penalty for doing so becomes. Because of this risk, deployments usually take place when potential problems would have the least impact on the users of these environments, which in general happens to be out of business hours. And last but not least, deployments tend to be boring and tedious tasks. Or did you ever get excited about doing a release? Yeah, that’s what I thought.

Why you (as a manager) should care

Of course, you also care about the factors mentioned above, because you want your developers to be as happy as possible, right? But furthermore, you especially dislike the risks associated with deployments, because you have to factor them into all sorts of calculations. Additionally, all these delays between phases mean that your time to market for new features and bug fixes is way higher than it needs to be, which means you are loosing money. Constantly.

The obvious solution

It’s easy, right? The answer is continuous deployment. You fully automate your testing and deployment and you deploy straight to production after every commit. Sounds great. The problem is: Most organisations are not ready for full test automation or just not willing to give up manual testing. This may be for security or legal reasons, potential loss of revenue and/or reputation or something else. Bottom line is: Pure continuous deployment into production without any manual testing is not widely accepted. Period. I am not saying this goal is not worth aiming for. Trust me, I am all for it. I just don’t see it happening in the near future in most companies. In my opinion, the only way to practice continuous deployment into production is by having sophisticated real-time alerting combined with a system with built- in resilience. This would allow you to automatically detect and back out a failed deployment without an interruption of the system as a whole. Have a read of this blog post about how the guys at kaChing managed to achieve this.

A step in the right direction

Don’t worry, not all is lost. Even if continuous deployment is not achievable for you right now, that doesn’t mean you can’t improve your situation at all. You can take the first steps towards continuous deployment by decreasing the batch size of deployments and automating the deployment and hand-offs between the environments.

  • Decreasing the batch size should be easily achieved by just allowing automatic deployments into your first test environment. Only, of course, after a successful build that runs all your automated tests.
  • Automating the deployment may require some tailored scripts to suit your system. However, if you are using Hudson and your deployment artifact is a war or ear file you can use the deploy plugin, which is based on Cargo.
  • Automating the hand-offs could be achieved by using a central release management dashboard that is used by developers, testers and managers.

So far I haven’t found any Hudson plugin that would provide this. What I have in mind is something like the following:

  • Deployments can be configured to happen automatically or to require manual approval.
  • When deployment to an environment finishes, a configured set of people gets notified
  • The different environments where the application is running can be accessed directly from the dashboard
  • When approving a build, and all previous builds have been approved as well (there might be dependencies), deploy to next environment.
  • When rejecting a build
    1. A tester raises one or more issues for the build. The issue tracking system is tightly integrated in the dashboard and can be reached via a link, which pre-fills relevant fields about the current build and project. This automatically rejects the build.
    2. The developer that committed the change that triggered the rejected build gets notified.
    3. The developer fixes the issue (after creating a failing test to expose it, of course).
    4. The developer commits the fix with reference to the issue in the commit comment. This triggers a new automatic build, plus an automatic update of the issue and a notification to the tester who raised it (like the Trac SVN Policies Plugin).
    5. When the tester approves the new build the old build gets automatically approved as well, if this was the last outstanding issue for the old build.

In my mind it looks somewhat like this (click image to enlarge):

Release Management Dashboard

In order to do pure continuous deployment you could then configure the dashboard with only one environment (=production) with automatic deployment. You can, however, choose to set up as many intermediate environments as you like and also require manual approval of deployments to any of them.

Does anyone know if a Hudson plugin with this functionality exists or is in the making?

With this in place I believe deployments can be non-events, in the meaning of being effortless and stress free. This does not mean you should stop celebrating your releases, though.

5 comments

  1. Hi Manuel,

    this software life-cycle process demand sounds familiar to me. That is a common wish of customers.
    In a SOA landscape this topic is known under ‘Governance’. Life-cycle management is part of it. But its not only good for SOA. If your software landscape consist of many artifacts, you need a solution to ‘govern’ it.
    Governance tools allow do design a life-cycle process for every artifact. For every process step you can add design time polices. That can be approval policies as you mentioned or notification jobs.
    From my experience that is still a very theoretical topic. Although there are software solutions for this, I have never seen any in production. And unfortunately I don’t know an open source solution for it.
    In the near future I have the challenge to build up such a life-cycle management for a customer. Maybe I can contribute some hands on experience then.

  2. Hello,

    I’m currently using Gerrit (http://gerrit.googlecode.com/) and Hudson for something like this. Current workflow is something like this:

    1. Change is pushed into Gerrit
    2a. Hudson notices the change and runs a build reporting results to gerrit (http://blog.hudson-ci.org/content/pre-tested-commits-git).
    2b. Change is manually reviewed and tested. Reviewers/testers can pull the change for gerrit for testing.
    3. If change is good, ie. Hudson job is success and reviewers show green light, change is submitted to Gerrits main repository.
    4. Hudson notices a new change in Gerrits main repository and deploys a new version into beta server.

    I tried to explain this with flow chart in little blog post http://softwarefromnorth.blogspot.com/2010/04/almost-continuous-deployment-with.html

    I’m using beta server because our customers didn’t want to have this on live server. Beta and live server are swapped biweekly, though, meaning that beta comes live one and live server is cleaned.

  3. Hi!

    Interesting Idea, but how to automate user acceptance? How to automate security requirements? How to automate interfaces between companies? How to automate updated user documentation? Or production docuemtnation? How to guarantee that release units spanning multiple applications / dev groups with different policies work together well? How about the requirement that no developer ever touch the life systems? If it is automated, this is equivalent to direct access? Who tests the tests? And how?

    I don’t mind having a simple & lean release process, but aiming at a fully auztomated process where a developer commit updates the production environment after X automated tests is not a quility target.

    • Thank you Marc for your interesting pointers. I would like to say a few words to each of them.

      • automated user acceptance: Currently continuous deployment is mostly (only?) used with online off-the-shelf applications that don’t have a customer (=user) acceptance test phase. So, with bespoke software development for a specific customer you might only be able to automatically deploy as far as a UAT environment.
      • security requirements: You could write a test for every possible group of attacks you can think of and run them against your running application.
      • interfaces between companies: These interfaces should hopefully be clarified before testing an application. In any case issues regarding these interfaces can be detected by automated end-to-end tests.
      • user documentation: OK. You got me. I don’t have an answer for this one. However, I don’t really see user documentation as part of the deployed piece of software. Probably because I am a developer and I like to write code, not documentation. Don’t get me wrong, I understand that user documentation is an important part of any application, but it doesn’t influence if the application itself is correct or not. It is an accessory that helps the user understand and use the application, but it is not part of the application itself.
      • releases spanning multiple applications: Again, any issues here can be detected by automated end-to-end tests against all participating applications.
      • no developer touches life system: I don’t know why you would have a requirement like this in the first place, but let’s say you do. As I see it there are two givens here, (a) your code is developed by developers and (b) your code should eventually end up in your production system. Let’s for a moment assume that you can automate all testing, then you would want to eliminate all manual steps between the developer committing the code and the code being deployed to production. And that is exactly the idea behind continuous deployment, to get rid of unnecessary manual steps. So, technically when doing continuous deployment the developer is not touching the life system any more so as he did before, it’s just that the manual steps in between him and the life system have been automated.
      • testing the tests: There are different categories of tests, unit tests for your code units, end-to-end tests to test your running application and anything in between. These should hopefully cover for each other to some extent. But you are right, if your tests are crap, then they don’t say anything about your application. In my opinion, continuous deployment can only work together with test driven development and a very disciplined team that is experienced with developing this way.

      I would like to make it clear once more: When talking about automated tests I am not just talking about unit tests, but also about functional tests that test your deployed and running application. Furthermore, doing continuous deployment requires a high level of discipline from a development team that knows what they are doing.

Leave a comment