MongoDB Evolution
I wanted to write a blog post about my experience with how mongodb has evolved how I hated it and grew to love it. I think mongodb is a GREAT opensource product and even better the company behind it is amazing (hopefully no one takes my early criticism to harshly).
When I first came across mongodb I was working for a small company in Denver, Colorado which created a high performance policy management system specifically geared toward large international telecommunication companies. The company hired me on in late 2009 and at the time mongodb 1.4 had just been released.
My specific job function was to design the infrastructure around the policy management product which included everything from the virtualization layer to the load balancer to the monitoring system. I was also deeply involved with how our system scaled and how the high availability design actually worked.
1.4 High Availability Doesn’t Work
Well let me tell you 1.4 had its fair share of bugs one of the first bugs back in the 1.4 days was with fail over and high availability. At the time there was no idea of replica sets (well maybe 10gen had the idea but it certainly wasn’t available yet) and we had to configure our mongodb “clusters” as replica pairs. This concept was basically two nodes which functioned in an active / passive state. When I first started testing out how mongodb’s HA design worked I was horrified to say the least. One of the first bugs I discovered and reported was that when I powered off the “Master / Primary” mongodb replica pair member the secondary never promoted itself to master. At the time some guy Eliot Horowitz told me I should try and leverage an arbiter so I tried and mongodb failed with the same result. When the master mongodb server was powered off (simulating a crash scenario) the other member of the replica pair never promoted itself to master.
Soon this issue was fixed in 1.5.x which was a development release but we couldn’t deploy our application to production environments if fail over was broken so we where forced to leverage the development version in production (which actually ended up working fine for us). The fail over time was 30 to 60 seconds but considering we had no other options we where satisfied with the quick fix.
Next came what I am calling the failover failback bug. When the master crashes and the secondary promotes its self to master the old secondary could never be promoted to master again (thread). At this point I’m thinking what a nightmare product this is. It might work great for developers but having to be a systems engineer trying to support this thing I wanted to pull my hair out!
1.6 Replica Sets
Finally 1.6.x came out and we finally had the mystical replica sets but there was a catch you needed to have an odd number of mongodb members or two servers and an arbiter to prevent network splits and make sure the master node always had the majority of the votes. Many of our application deployments where multi-datacenter yet we didn’t want a network split between the two datacenters to cause the “wrong” side of the replica set to be promoted to master so we settled on a manual intervention scenario for most deployments (considering that even in the Oracle world that is still the “Best” option I recommend this same line of thinking to current employers).
Next comes the dreaded disk space optimization issue (I like how that sounds). Our application was probably significantly different then others who where leveraging mongodb, for instance, Craigslist who leverages mongodb for archive purposes and never deletes data. Our application was a high read / write / delete system where reclaiming disk space was a huge deal for us. we found that mongodb was really really bad at reusing disk space in a high delete environment, so while the actual database size might have been only 20G the size on disk could easily exceed 100G. So a developer and I posted on the mongodb google group about the issue and again this guy Eliot Horowitz was there with an answer. He explained that we could run a rolling compaction across our 3 node replica set and not incur downtime for our application. So I quickly scripted up a solution which ran at 4am every night and did exactly that, it repaired the non master members of the replica set and then finally stepped down the master and ran mongod with –repair only to find out that this solution required the same amount of free space as there was data in the database and that under high load when the master would step down no other member of the replica set would promote them selves as master. I complained but grinned and bared it I mean at least HA and failover work right!
Performance, Performance, Performance
So this blog needs to have some positive reinforcement about how great mongodb really is! So this company I was working for at the time was really excited to have a third party vendor report on how “wicked fast” our policy management solution was. I was tasked with flying out to San Francisco and working with HP, Ixia and EANTC to prove the performance of our application.
Using a single HP C7000 blade chassis with 16 blades we where able to prove our product was able to support 20 million active mobile user sessions, and support for 28,000 transactions per second! This test was really something, the performance just blew our competitors away (who leveraged Oracle RAC as their back end database). By this point I knew how to compensate for some of the operational issues associated with mongodb and I might be starting to actually like this thing.
MongoDB In Demand Skill Set
Shortly after the performance test I was looking for a change in work place scenery for no other big reason than I was bored with my current position having built most of the infrastructure around my employers product (monitoring, load balancing, etc..) there simply wasn’t much of a challenge for me, so I put my resume out on the internet and let me tell you I was shocked! Company after company was asking about my mongodb experience. Not to say my Linux and automation skills aren’t in high demand but add mongodb to the mix and I could tell my resume stood out from the rest. I had 3 job offers in hand when I decided on my final destination, and who says the job market is bad!
Support From People Who Know The Product
So a few times in my blog here I mention this guy Eliot Horowitz for years I had no clue who this guy was he was just the guy I figured was assigned to monitoring and helping people out with Google Groups (thanks for all of the support btw) but it turns out this guy is actually the CTO of the company! Maybe I shouldn’t be as impressed with that as I am but I have never worked for a company (and I’ve worked for large and small companies alike) that the CTO was actually assisting the community with support issues. In my opinion that is going above and beyond.
Currently I am working for a big multinational corporation who is attempting to adopt mongodb for an exciting new product in partnership with Google. I’m helping the system architecture group operationalize emerging technology specifically mongodb. Unlike some of the other companies I have worked for in the past they actually have money to pay for the support from 10gen. At first I didn’t really care, most companies that pay for support for an open-source product are just wasting their money. I don’t know how many companies I’ve contracted out to or worked for that spent hundreds of thousands of dollars on support from RedHat only for no one to ever call them. I figured 10gen had some noob help desk personal who had no real understanding of mongodb let alone the experience that I did with the product.
After meeting with our sales rep I was quick to learn that we would be assigned an engineer to our account that not only knew how mongodb worked but was also required to contribute to the code base or maintain a driver (correct me someone of I am wrong here). That this engineer would assist our company with issues such as backup and recovery strategies, index strategies and shard key strategies he won’t design them for you but this support engineer will certainly review proposed solutions and implementations. So far we are in the early stages of adopting support currently my company is looking to me to help with many strategies around mongodb so it will be exciting to see how well the support structure works at 10gen so I can move onto more bleeding edge technology.
A Growing Community
If you live in the Denver or Boulder metro area you are in a great community for mongodb. The landscape here continues to grow year after year. Recently I was at an event put on by 10gen and 3 companies I have worked for where attending the conference all of which sent developers I had personally worked with, it was a great time. The event was huge, I am not 100% sure exactly how many people attended but I’m sure there was a few hundred.
Next came the mongodb meetup which had a great turnout. Multiple 10gen folks where in attendance (I think due to a Drupal conference) and the free beer made it even better. Its great to see how the community is growing and how its obvious to me that this skill set will continue to set me apart from other systems engineers in the future.
Future
I hope mongodb has continued success (and that Oracle never buys them) because I feel that its a great database solution, its easy to use, its “Web Scale” and just kicks ass over all. Plus the community support is top notch. Thanks for reading.
Nice post. I have had a similar experience in terms of community in the DC area and google group.
Minor note:
Not that it reduces the impressiveness that he’s so active on the mailing list and in the community, but I believe Eliot is actually the “CTO & Co-founder”, rather than the CEO. http://www.10gen.com/team I had the pleasure of seeing a couple of his talks at the MongoDC conference last year. Excellent talks and event overall.
Hey Wes thanks for the comment and correction, you are right Eliot is the CTO!