Rolling Mongodb Upgrades Using Fabric and Puppet
I was tasked with the job of assisting my company with the operationalization of emerging technologies such as mongodb. Specifically how we can automate many of the manual steps involved with upgrades, backup and recovery and even deployment. Today, I want to talk about how to handle rolling upgrades of mongodb with fabric and puppet.
Fabric
Fabric typically is documented as a tool which is used for automated deployments most specifically django however fabric is a powerful system automaton tool. From the fabric website they give the definition “Fabric is a Python (2.5 or higher) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.”
Puppet
I’m sure most everyone knows what puppet is. Puppet is, from their own website, “Puppet Enterprise is IT automation software that gives system administrators the power to easily automate repetitive tasks, quickly deploy critical applications, and proactively manage infrastructure changes, on-premise or in the cloud”
Why not just puppet or just fabric?
While both tools offer similar features there is actually a very different approach to each tool. Puppet is a tool which should be used to continually enforce the configuration of your server infrastructure. Fabric is more of a orchestration tool which can run orchestrate a task which requires multiple services, devices and even operating systems to complete a specific task before the next task can be run.
Scenario : Rolling Upgrade Of MonogDB
If you are not currently leveraging these tools and you are on 1.8.5 and want to upgrade to 2.0.4 the process can be daunting to say the least. You need to either download binaries to each host or attempt to upgrade using the 10gen published packages for Debian or RedHat. While that sounds easy enough if you have multiple mongodb replica sets deployed in production this could take a long time, or if you have a very large sharded dataset. Not to mention the fact that the Debian package attempts to restart mongodb after the upgrade. A common mistake made by system engineers is attempting to upgrade all of the SECONDARY nodes at once and then upgrading the primary node, well if you do this the PRIMARY will step down if not enough SECONDARY nodes are up to give votes.
Solution : Step One Puppet
Currently I have a module I published on gihub called bcarpio-mongodb this guide will assume you have puppet aleast installed and running on your mongodb servers.
Simple checkout the bcarpio-mongodb module to your local puppet manifest :
git clone git@github.com:bcarpio/bcarpio-mongodb.git mongodb
This will download the mongodb puppet module to your desktop. Take a look at the params.pp file and make sure that they fit for your environment. Currently the version of mongodb which is the latest is 2.0.4 and is included when you checkout the puppet module.
This puppet module basically downloads the precompiled binaries to /opt (or a location of your choice) and untars them and puts down down a basic configuration and start script.
If your system is currently at 1.8.5 and you want to upgrade to 2.0.4 simply change the $mongodb_version to 2.0.4 which won’t do anything but put down the right binaries into the directory specific in the params.pp, we will use fabric to actually make sure that all the servers in the replica set are restarted properly causing zero downtime.
Solution: Step Two Fabric
We will now use fabric to orchestrate the actual rolling upgrade of mongodb. In your fabfile directory ~/fabfile/ make a new file called mongodb and create the following tasks:
@task def stop(): """ Stops MongoDB Process """ sudo ('stop mongodb')
This is a simple task that is going to simply stop the mongodb process. Next we want to create the tasks that will start mongodb and restart mongodb below are the two tasks for this:
@task def start(): """ Starts MongoDB Process """ sudo ('start mongodb') @task def restart(): """ Restarts MongoDB Process """ stop() start()
Again these are two simple tasks one simply starts mongodb and the other “restart()” calls the stop and start tasks.
Next we want to create a puppet.py fabric module with the following tasks:
@task def puppetd_test(): """ Runs puppetd --test Which Will Update Puppet Changes " env.warn_only = True sudo ('puppetd --test')
All this simple task does is runs puppetd –test which will make sure that the mongodb host actually has applied the puppet module which pulls down the right binaries.
Next, we will look at the fabric task that is really going to do all of our heavy lifting, this task is going to complete the following steps:
- Leverage pymongo (so you need to make sure pymongo is install on the system where you will be running your fabric task) to determine which hosts are master and which hosts are secondary
- It will leverage the puppet task defined above to actually apply the puppet version changes and deploy the new binaries to the first secondary in the replica set
- Next it will leverage the fabric mongodb task to restart the mongodb process with the new binaries on that single member of the replica set
- Then it will check the status of that replica set member and make sure the state returns to “2” the reason we check for state = 2 which tells us that the host is no longer recovering or catching up with the other nodes in a replica set, this is an important part of the task, if the replica set member has not yet recovered and a second member of the replica set is taken down there is a chance that the master will step down and cause the entire cluster to go offline
- Next we want to add a counter that prevents the fabric job from hanging indefinably in the event that an upgrade fails and the replica set member never recovers. If the counter is met the fab job will fail and report out that intervention is required.
- The final step after all secondary nodes have been upgraded is to finally upgrade the PRIMARY node. The steps are basically the same, stop mongodb, pull puppet changes, start mongodb.
Here is the actual fabric task that makes this happen:
https://gist.github.com/2252318
If anyone has any feedback as to how to make my python more clean I’m more then happy to hear it I’m really fairly new to python and any suggestions are greatly appreciated.
Solution: Tying It All Together
Up until now you have the puppet module for mongodb and some snippets of code for fabric. While you can go and research how to actually leverage these tasks, I’ll offer some the actual files.
In a directory called fabfile create a file __init__.py which contains the following:
https://gist.github.com/2252382
You need to be either using ssh keys to manage authentication to the hosts which you are managing or you need to create a users.py file that defines your user and pass.
Next we will create the actual mongodb.py file an example is here:
https://gist.github.com/2252396
and we need to create the puppet.py file. Once we have all of these files in place we simply type the following:
fab mongod.rolling_upgrade:mongod-hostname-01
The argument we specify after the mongod.rolling_upgrade is one of the hosts of the replica set it doesn’t have to be the master because the fabric task is intelligent enough to actually determine which host is the master and use that hosts to get information about the secondaries and to check their status in the cluster.
Nice stuff!
A couple of things:
1. The Gist links aren’t HTML links, might want to fix that.
2. For the Primary upgrade (https://gist.github.com/2252318 Line #50), you may want to use the mongodb step down command before stopping the service. This ensures that there is another replica up, and ready to take over as primary before this one stops.
Ref: http://www.mongodb.org/display/DOCS/Replica+Set+Commands#ReplicaSetCommands-replSetStepDown
Hey Mike,
Thanks for taking the time to look at my blog. Actually I did make the step down update on my internal fab task, but currently haven’t updated the blog document to reflect that change! Thanks for the feedback.