Article

Thoughts on deploying with Ansible

July 27th, 2014

In order to simplify our deployment procedure, we wrote a role in Ansible (we where using Capistrano before). The role is reasonably complete now and we've begun using it for projects in production. But during the creation there was a bit of discussion on certain points, and I'd like to share some of the insights we've had with you.

What is a deploy?

First of all, I'll clarify a little bit what our definition of a "deploy" is. We're assuming that the user used to deploy is already created during 'provisioning', and that the proper privileges are in place.

We've divided the process into 5 steps:

  1. Update the codebase + configuration
  2. Install dependencies
  3. Preserve shared resources
  4. Build tasks
  5. Finalize

And we follow the Capistrano directory structure with a "current" symlink pointing to a release:

 .
 ├── releases
 |   ├── 20140415234508
 |   └── 20140415235146
 ├── shared
 |   ├── sessions
 |   ├── source
 |   └── uploads
 └── current -> releases/20140415235146

The role

To write a role that performs those tasks, modeled after Capistrano, is not hard. If you map available Ansible modules to these steps it looks like this:

  1. git or synchronize + copy or template
  2. command or shell
  3. file
  4. command or shell
  5. file

There's a couple of things left to do that are not really part of the deploy, but are easy enough to build:

  • create a timestamp
  • cleanup old releases

The timestamp can be created with a command that you register into a local variable. The value will be in your_registered_variable.stdout (so anything that prints out the date would do really).

tasks:
  - command: date '+%Y%m%d%H%I%S'
    register: date_output

  - debug: msg={{ date_output.stdout }}

In fact, we use a timestamp but that's not really a requirement either - as long as you can identify a specific release without overlap (commit hashes could do for example, and prevent you from deploying the same version twice).

The cleanup command would require a bit more work. You'd need to list the contents of the remote releases folder - again using register - and loop over the contents to keep n releases and remove the rest.

tasks:
  - command: ls releases
    register: ls_output

  - debug: msg={{ item }}
    with_items: ls_output.stdout_lines

I know I can debug with 'var=ls_output.stdout_lines' but the point here is iterating over the list

Because of the multiple-task nature of these jobs, and because they're a lot easier to write in Python, we've added them in a module called "deploy" inside our role. This means we can do this to ensure the directory structure is in order, and receive the timestamp fact back from the module:

- name: Initialize
  deploy: "path={{ project_root }} state=present"

- debug: {{ deploy.new_release }}

And this to remove any release over the n count:

- name: Remove old releases
  deploy: "path={{ project_root }} state=clean"

It looks so much cleaner that way :-)

When the problems start

The problems arise when you also start trying to copy concepts from Capistrano from a re-usability perspective. For example, Capistrano allows you to write callbacks. So things like 'before_X' or 'after_Y'. Also, Capistrano allows you to write code for a rollback (when things go wrong). These are meant for you to alter the deployment procedure, when re-using things between different projects. Finally, in Capistrano you can create interaction at any point in the process - it's just Ruby after all. Asking for user input with a calculated default and manipulating before returning is trivial. But not so in Ansible.

So why are these concepts problematic? Because Ansible is NOT a programming language. Repeat that. three times. Out loud. You're not in Kansas anymore *grin*

We did think long and hard on what problems this would bring if we could not implement them in the same form. The rollback was the first to go.

What does the rollback do?

Well, when something goes wrong, you can return to the previous working state:

active release: "A-OK" ➙ failure deploying "BORKED" ➙ rollback ➙ active release: "A-OK"

But when exactly does the failure during the deploy become an issue? Wel.. only when you do destructive tasks, like DB schema updates (I'm guessing that's the most important reason why people would need a rollback). So if you already have a good DB up- and downgrade system in place then adding that in Ansible won't be a problem.

In that case, the rollback concept is only part of the DB update task, so you can use ignore_errors: True with register:

tasks:
  - command: sh -c 'exit 1'
    register: task_output
    ignore_errors: True

  - name: Rollback
    command: echo 'rollback'
    when: task_output|failed
    failed_when: True

The final failed_when: True is meant to stop the deploy after the rollback is finished. It's ugly, but it will get the job done.

As an alternative, you could check the return value from Ansible itself (but you won't know at what point Ansible failed so this solution would require more complexity).

ansible-playbook deploy.yml || ansible-playbook rollback.yml

What we concluded though, is that having a proper rollback in place - like DB downgrades - is major overkill in most cases. We use Doctrine migrations for example, and try to avoid destructive DB upgrades (like renaming columns). We create smaller steps that can be performed safely (like adding a new column and migrating the data), and do destructive tasks (like dropping the old column) one iteration later when the code is no longer 'attached' to that part of the DB.

In any case, we decided against having a global rollback concept. The rollback is a lie!

What we did need, is a way to determine a failed deploy. So now we have a task that adds a file called "BUILD_UNFINISHED" to the release in progress. While this file exists, that release is considered unfinished. If a release exists in the releases folder that still contains the file on start of a new build, that release is considered failed and the folder is removed (automatically, by the deploy module).

This has the added benefit over Capistrano that a failed release will still be availabe for inspection.

active release: "A-OK" ➙ deploying "BORKED"  ➙ fail
(the symlink is never replaced, so A-OK is still active)

We can see the "BORKED" folder in releases, and find any problems.

active release: "A-OK" ➙ deploying "FIXED"
(before we start, "BORKED" is removed from releases folder)

I can haz callback?

In order to make a deploy role re-usable between projects you either have extremely similar projects, or you have a flexible mechanism. Without the option to "inject" your own commands, you would need to copy the entire role for each change. The challenge here was to find that sweet spot with enough flexibility without creating a monster.

The solution for us was to add a variable containing a list of commands. The default setting for this variable is an empty list: project_post_build_steps: []

The task that runs this inside the role is just:

- name: Run post_build_commands
  command: "{{ item }} chdir={{ deploy.new_release_path }}"
  with_items: project_post_build_commands

This means you can now set up any commands you want, to be run after the dependency manager has run, and the shared symlinks have been created. It's not really a callback, but it does put the power back in your hands with regards to the flow of operations. For symmetry, we also added the option to run commands before the dependency manager has run with project_pre_build_steps: [].

UPDATE The following will *not* work in Ansible 1.6.8 and up for security reasons, and we have dropped the idea

We will be exploring the option of adding a task that uses the older action: notation of Ansible. This means you could run any Ansible module (pseudo code):

project_post_build_actions:
  - { module: '[some_module]', parameters: 'some_param=true some_other_param=some/value' }
  - { module: '[some_module]', parameters: 'some_param=false some_other_param=some/other/value' }

And the task would turn into something like:

- name: Run post_build_actions
  action: "{{ item.module }} {{ item.parameters }}"
  with_items: project_post_build_actions

This turns into a bit of meta-Ansible, but it's mighty useful if the rest of the role is a perfect fit. This technique could save you a lot of duplication.

UPDATE The previous will *not* work in Ansible 1.6.8 and up for security reasons, and we have dropped the idea

How to get the right version for deploy

For a deploy, we like to have an option to deviate from the release (tag) that is to be deployed. It's not always the latest tag, but at the same time it usually is so we'd like to suggest that as default.

This was surprisingly hard in Ansible. When you use the vars_prompt Ansible will ask the user for input. There is also the option of setting a default. But: when you use a variable or a lookup as the value, this will not be translated when the question is asked. So this:

  vars_prompt:
    - name: "release_version"
      prompt: "Product release version"
      default: "{{ lookup('pipe','latest_git_tag.sh') }}"

Will come out looking like this:

$ ansible-playbook playbook.yml
Product release version [{{ lookup('pipe','latest_git_tag.sh') }}]:

While the actual value in "release_version" will be set to the output of latest_git_tag.sh. This is not acceptable, as you can't notify the user about which version they are about to deploy. This problem proved quite stubborn. Even when you have multiple plays in your playbook, the vars_prompt part of the second play is called before the first play is run.

So, for this problem we just decided to wrap Ansible in a shell script. In certain projects we now have a bin/deploy script that asks the proper questions and calls Ansible with --extra-vars on the commandline. It looks something like this when used:

$ bin/deploy
Product release version [v1.2.3]:
Project environment (Staging/Production) [staging]:

Running: ansible-playbook deploy.yml --limit staging --extra-vars "environment=staging project_version=v1.2.3"

Where is my Maintenance mode!

One final thing we did not incorporate in the deploy role is a "maintenance mode" concept. Obviously, you would need one if you start doing destructive tasks but there is no universally accepted way of applying one. We just check for the existence of a file with Nginx, so the file module or even a command: touch maintenance would do just fine.

What usually happens, though, is that the potentially dangerous parts of a deploy are done in a separate role. This happens after step 4 (build tasks), but before step 5 (finalize).

We created our role to be open-ended, with a variable called project_finalize (default true) that determines if the BUILD_UNFINSHED file should be deleted, and the current symlink should be replaced. If you set it to false, projects can add their own additional role to the deploy procedure, and setting/removing maintenance mode would be the responsibility of that role.

If you want to start a deploy by setting maintenance-mode you could have a pre_tasks entry in your deploy playbook, these run before the deploy role starts. If you always want to remove maintenance mode, you could use the 'exit-code' technique described earlier.

Example usage of Ansible.project_deploy

I'll leave you with an example deploy playbook for SweetlakePHP, our user-group. It's a Symfony2 project and we use Assetic. Nothing really special there. All the work is done inside the role, and we use the exact same role to deploy multiple projects (not all Symfony2). The only thing that is different from other projects is the list of vars:

 ---
 - name: Deploy the application
   hosts: production
   remote_user: "{{ production_deploy_user }}"
   sudo: no

   vars:
     project_root: "{{ sweetlakephp_root }}"
     project_git_repo: "{{ sweetlakephp_github_repo }}"
     project_deploy_strategy: git

     project_environment:
       SYMFONY_ENV: "prod"

     project_shared_children:
       - path: "/app/sessions"
         src: "sessions"
       - path: "/web/uploads"
         src: "uploads"

     project_templates:
       - name: parameters.yml
         src: "templates/parameters_prod.yml.j2"
         dest: "/app/config/parameters_prod.yml"

     project_has_composer: yes

     project_post_build_commands:
       - "php vendor/sensio/distribution-bundle/Sensio/Bundle/DistributionBundle/Resources/bin/build_bootstrap.php"
       - "app/console cache:clear"
       - "app/console doctrine:migrations:migrate --no-interaction"
       - "app/console assets:install"
       - "app/console assetic:dump"

   roles:
     - f500.project_deploy

   post_tasks:
     - name: Remove old releases
       deploy: "path={{ project_root }} state=clean"

Ramon de la Fuente

Pointy haired boss