Thoughts on deploying with Ansible
July 27th, 2014
In order to simplify our deployment procedure, we wrote a role in Ansible (we where using Capistrano before). The role is reasonably complete now and we've begun using it for projects in production. But during the creation there was a bit of discussion on certain points, and I'd like to share some of the insights we've had with you.
What is a deploy?
First of all, I'll clarify a little bit what our definition of a "deploy" is. We're assuming that the user used to deploy is already created during 'provisioning', and that the proper privileges are in place.
We've divided the process into 5 steps:
- Update the codebase + configuration
- Install dependencies
- Preserve shared resources
- Build tasks
- Finalize
And we follow the Capistrano directory structure with a "current" symlink pointing to a release:
.
├── releases
| ├── 20140415234508
| └── 20140415235146
├── shared
| ├── sessions
| ├── source
| └── uploads
└── current -> releases/20140415235146
The role
To write a role that performs those tasks, modeled after Capistrano, is not hard. If you map available Ansible modules to these steps it looks like this:
- git or synchronize + copy or template
- command or shell
- file
- command or shell
- file
There's a couple of things left to do that are not really part of the deploy, but are easy enough to build:
- create a timestamp
- cleanup old releases
The timestamp can be created with a command
that you register
into a local variable. The value will be in
your_registered_variable.stdout
(so anything that prints out the date would do really).
tasks:
- command: date '+%Y%m%d%H%I%S'
register: date_output
- debug: msg={{ date_output.stdout }}
In fact, we use a timestamp but that's not really a requirement either - as long as you can identify a specific release without overlap (commit hashes could do for example, and prevent you from deploying the same version twice).
The cleanup command would require a bit more work. You'd need to list the contents of the remote releases folder
- again using register
- and loop over the contents to keep n releases and remove the rest.
tasks:
- command: ls releases
register: ls_output
- debug: msg={{ item }}
with_items: ls_output.stdout_lines
I know I can debug with 'var=ls_output.stdout_lines' but the point here is iterating over the list
Because of the multiple-task nature of these jobs, and because they're a lot easier to write in Python, we've added them in a module called "deploy" inside our role. This means we can do this to ensure the directory structure is in order, and receive the timestamp fact back from the module:
- name: Initialize
deploy: "path={{ project_root }} state=present"
- debug: {{ deploy.new_release }}
And this to remove any release over the n count:
- name: Remove old releases
deploy: "path={{ project_root }} state=clean"
It looks so much cleaner that way :-)
When the problems start
The problems arise when you also start trying to copy concepts from Capistrano from a re-usability perspective. For example, Capistrano allows you to write callbacks. So things like 'before_X' or 'after_Y'. Also, Capistrano allows you to write code for a rollback (when things go wrong). These are meant for you to alter the deployment procedure, when re-using things between different projects. Finally, in Capistrano you can create interaction at any point in the process - it's just Ruby after all. Asking for user input with a calculated default and manipulating before returning is trivial. But not so in Ansible.
So why are these concepts problematic? Because Ansible is NOT a programming language. Repeat that. three times. Out loud. You're not in Kansas anymore *grin*
We did think long and hard on what problems this would bring if we could not implement them in the same form. The rollback was the first to go.
What does the rollback do?
Well, when something goes wrong, you can return to the previous working state:
active release: "A-OK" ➙ failure deploying "BORKED" ➙ rollback ➙ active release: "A-OK"
But when exactly does the failure during the deploy become an issue? Wel.. only when you do destructive tasks, like DB schema updates (I'm guessing that's the most important reason why people would need a rollback). So if you already have a good DB up- and downgrade system in place then adding that in Ansible won't be a problem.
In that case, the rollback concept is only part of the DB update task, so you can use
ignore_errors: True
with register
:
tasks:
- command: sh -c 'exit 1'
register: task_output
ignore_errors: True
- name: Rollback
command: echo 'rollback'
when: task_output|failed
failed_when: True
The final failed_when: True
is meant to stop the deploy after the rollback is finished. It's ugly, but it will
get the job done.
As an alternative, you could check the return value from Ansible itself (but you won't know at what point Ansible failed so this solution would require more complexity).
ansible-playbook deploy.yml || ansible-playbook rollback.yml
What we concluded though, is that having a proper rollback in place - like DB downgrades - is major overkill in most cases. We use Doctrine migrations for example, and try to avoid destructive DB upgrades (like renaming columns). We create smaller steps that can be performed safely (like adding a new column and migrating the data), and do destructive tasks (like dropping the old column) one iteration later when the code is no longer 'attached' to that part of the DB.
In any case, we decided against having a global rollback concept. The rollback is a lie!
What we did need, is a way to determine a failed deploy. So now we have a task that adds a file called "BUILD_UNFINISHED" to the release in progress. While this file exists, that release is considered unfinished. If a release exists in the releases folder that still contains the file on start of a new build, that release is considered failed and the folder is removed (automatically, by the deploy module).
This has the added benefit over Capistrano that a failed release will still be availabe for inspection.
active release: "A-OK" ➙ deploying "BORKED" ➙ fail
(the symlink is never replaced, so A-OK is still active)
We can see the "BORKED" folder in releases, and find any problems.
active release: "A-OK" ➙ deploying "FIXED"
(before we start, "BORKED" is removed from releases folder)
I can haz callback?
In order to make a deploy role re-usable between projects you either have extremely similar projects, or you have a flexible mechanism. Without the option to "inject" your own commands, you would need to copy the entire role for each change. The challenge here was to find that sweet spot with enough flexibility without creating a monster.
The solution for us was to add a variable containing a list of commands. The default setting for this variable is an
empty list: project_post_build_steps: []
The task that runs this inside the role is just:
- name: Run post_build_commands
command: "{{ item }} chdir={{ deploy.new_release_path }}"
with_items: project_post_build_commands
This means you can now set up any commands you want, to be run after the dependency manager has run, and the shared
symlinks have been created. It's not really a callback, but it does put the power back in your hands with regards
to the flow of operations. For symmetry, we also added the option to run commands before the dependency manager
has run with project_pre_build_steps: []
.
UPDATE The following will *not* work in Ansible 1.6.8 and up for security reasons, and we have dropped the idea
We will be exploring the option of adding a task that uses the older action: notation of Ansible. This means you could run any Ansible module (pseudo code):
project_post_build_actions:
- { module: '[some_module]', parameters: 'some_param=true some_other_param=some/value' }
- { module: '[some_module]', parameters: 'some_param=false some_other_param=some/other/value' }
And the task would turn into something like:
- name: Run post_build_actions
action: "{{ item.module }} {{ item.parameters }}"
with_items: project_post_build_actions
This turns into a bit of meta-Ansible, but it's mighty useful if the rest of the role is a perfect fit. This technique could save you a lot of duplication.
UPDATE The previous will *not* work in Ansible 1.6.8 and up for security reasons, and we have dropped the idea
How to get the right version for deploy
For a deploy, we like to have an option to deviate from the release (tag) that is to be deployed. It's not always the latest tag, but at the same time it usually is so we'd like to suggest that as default.
This was surprisingly hard in Ansible. When you use the vars_prompt
Ansible will ask the user for input. There is
also the option of setting a default. But: when you use a variable or a lookup as the value, this will not be
translated when the question is asked. So this:
vars_prompt:
- name: "release_version"
prompt: "Product release version"
default: "{{ lookup('pipe','latest_git_tag.sh') }}"
Will come out looking like this:
$ ansible-playbook playbook.yml
Product release version [{{ lookup('pipe','latest_git_tag.sh') }}]:
While the actual value in "release_version" will be set to the output of latest_git_tag.sh
. This is not acceptable,
as you can't notify the user about which version they are about to deploy. This problem proved quite stubborn. Even
when you have multiple plays in your playbook, the vars_prompt
part of the second play is called before the
first play is run.
So, for this problem we just decided to wrap Ansible in a shell script. In certain projects we now have a bin/deploy
script that asks the proper questions and calls Ansible with --extra-vars
on the commandline. It looks something
like this when used:
$ bin/deploy
Product release version [v1.2.3]:
Project environment (Staging/Production) [staging]:
Running: ansible-playbook deploy.yml --limit staging --extra-vars "environment=staging project_version=v1.2.3"
Where is my Maintenance mode!
One final thing we did not incorporate in the deploy role is a "maintenance mode" concept. Obviously, you would need
one if you start doing destructive tasks but there is no universally accepted way of applying one. We just check for
the existence of a file with Nginx, so the file
module or even a command: touch maintenance
would do just fine.
What usually happens, though, is that the potentially dangerous parts of a deploy are done in a separate role. This happens after step 4 (build tasks), but before step 5 (finalize).
We created our role to be open-ended, with a variable called project_finalize
(default true) that determines
if the BUILD_UNFINSHED file should be deleted, and the current
symlink should be replaced. If you set it to false,
projects can add their own additional role to the deploy procedure, and setting/removing maintenance mode would be
the responsibility of that role.
If you want to start a deploy by setting maintenance-mode you could have a pre_tasks
entry in your deploy playbook,
these run before the deploy role starts. If you always want to remove maintenance mode, you could use the 'exit-code'
technique described earlier.
Example usage of Ansible.project_deploy
I'll leave you with an example deploy playbook for SweetlakePHP, our user-group. It's a Symfony2
project and we use Assetic. Nothing really special there. All the work is done inside the role, and we use the exact
same role to deploy multiple projects (not all Symfony2). The only thing that is different from other projects is the
list of vars
:
---
- name: Deploy the application
hosts: production
remote_user: "{{ production_deploy_user }}"
sudo: no
vars:
project_root: "{{ sweetlakephp_root }}"
project_git_repo: "{{ sweetlakephp_github_repo }}"
project_deploy_strategy: git
project_environment:
SYMFONY_ENV: "prod"
project_shared_children:
- path: "/app/sessions"
src: "sessions"
- path: "/web/uploads"
src: "uploads"
project_templates:
- name: parameters.yml
src: "templates/parameters_prod.yml.j2"
dest: "/app/config/parameters_prod.yml"
project_has_composer: yes
project_post_build_commands:
- "php vendor/sensio/distribution-bundle/Sensio/Bundle/DistributionBundle/Resources/bin/build_bootstrap.php"
- "app/console cache:clear"
- "app/console doctrine:migrations:migrate --no-interaction"
- "app/console assets:install"
- "app/console assetic:dump"
roles:
- f500.project_deploy
post_tasks:
- name: Remove old releases
deploy: "path={{ project_root }} state=clean"
Pointy haired boss
Do you have similar issues? Contact us at Future500
We can tackle your technical issues while you run your business.
Check out what we do or write us at info@future500.nl