REPLY: Goodbye MongoDB
Apparently an article written by a user with the handle MacYET is making its way around the Internet. The link is here if anyone wants to read it (link). While mongodb is a new and emerging technology it does have room for improvement. I’d like to look at what MacYET sees as the problems with mongodb and dissect if this rant is valid or just simply some childish rant because he was banned from the #mongodb channel on IRC.
To give everyone a little history on this situation MacYET has been a contributor to the IRC channel #mongodb for the last 6-9 months. I am not exactly sure how long he has been online attempting to help users of this emerging technology, but he seems to have whims of constructive and helpful feedback and whims of negative and insulting comments. His negative and nasty comments finally resulted in a ban here:
MacYET was kicked from #mongodb by christkv [You've been kicked due to nasty behavior yesterday calling a user a moron, speak to bill if you want to have access again.]
After this ban was in effect the blog post came as well as his post on Hacker News here (link). While there have been other people who have created blog posts about the reasons why they decided to migrate away from mongodb, none, as far as I know, have been due to the result of being banned from an IRC channel.
Goodbye MongoDB Reviewed
So I’d like to take the time to review each of the points in the Goodbye MongoDB blog post and determine which, if any, are valid points and which if any are simply the rant of a person who has anger problems, was confronted about his anger problem and decided to react in an unprofessional way through his blog. I am not going to repost each paragraph verbatim, but call out the key points:
- Memory Model. In the blog post the author goes on to explain how leaving memory management up to the OS is a “stupid architectural decision” specifically pointing out that in doing so it prevents users of mongodb from running any other services. The author fails to understand that in large scale environments database nodes are dedicated to only run the database services. I’ve never worked in a large scale environment where an Oracle RAC server, for instance, was also running a version of the application server. While it’s true that in Oracle environments you can and should specify the huge_pages size in sysctl.conf; however in most environments this setting is set as high as possible so that you don’t need to change the value at a later date because of the issues associated with changing this value on the fly. IMO there is no valid point made in this paragraph, there is nothing wrong with letting the OS manage memory, the Linux kernel designers know what they are doing and sometimes it’s best to leave certain tasks up to them to master.
- Locking. Yes mongodb does have a global write lock. There have been significant improvements in 2.x specifically yielding the lock when the process has to go to disk and a new feature that locks the database only. I can only imagine this will continue to improve to the collection level and beyond. That being said, lets look at where this might be an issue for you. Primarily I have implemented mongodb in web (SaaS) based services where the read to write ratio was about 90% / 10% so the global lock was the least of our problems. I also worked for a company that created a policy management system which leveraged mongodb as its back end database. In a performance test that was run in conjunction with EANTC, HP and IXIA we were able to prove our solution could scale to 28,000,000 cellular subscribers and handle 28,000 RPS across a single HP C7000 blade chassis with 16 blades. Our biggest competitor completed the same test with millions of dollars in Sun hardware running Oracle RAC and was only able to complete about 18,000 RPS. Our performance test leveraged mongodb 1.8 in 50 / 50 read/write scenario. We ran each mongodb instance at about 30% locking with no performance issues with the app on 15k SAS drives.
- Single Index Per Query. This is a reality of mongodb and something that 10gen doesn’t keep a secret. I do think, however, that there is much confusion about the single index per query. This doesn’t mean that only a single filed can be indexed and used in a query. Mongodb does support compound indexes which allows multiple fields to be indexed. While I do agree there is room for improvement here it’s far from an insane design. If you have spent any time in the DBA world you will know this has been part of the evolution of nearly every commercial database on the market today and some including MSSQL don’t have this working very well.
- JSON Query Language. Maybe I don’t understand the point here but this issue isn’t related to JSON as much as it’s related to nosql technologies in general. The technology is called NoSQL so why complain there is no sql functionality?
- Map-Reduce. While I do agree that the map-reduce function in mongodb is serious lacking, building in support for Hadoop was defiantly the right move. The author seems to not understand the “automate” everything methodology. As map-reduce ecosystems become more complex the need for automation becomes more necessary. Currently, I have a puppet module for Hadoop published on Puppet Labs — check it out.
- Sharding. I completely disagree with this point. Sharding mongodb is not complex as the author makes it sound. Yes, you need to deploy the mongodb binaries to your app servers and configure your apps to speak locally to the local mongos process. You also need to deploy three config servers, again with automation this is a trivial task. Ever try to shard an existing Oracle database? How about MySQL? Everything in this world is relative. Compare sharding in mongodb to the big players on the market and you will quickly realize that sharding in mongodb is a no brainier. Maybe the author hasn’t worked on complex, large Oracle or MySQL databases which all of a sudden needed to be sharded.
- Data-center awareness. This is a new feature that will be coming out in 2.2. While I haven’t had a chance to play with it yet, I have high hopes for this new feature as the company I currently work for will be leveraging it extensively. The fact is no commercial database technology supports this because of the complexity involved. For 10gen to take a stab at it is an amazing feat!
- “Safe” mode is off by default. As with a lot of technologies, decisions have to be made by the software company to deliver software in an easy-to-use manner; a lot of times security and data reliability are secondary considerations. This isn’t a lacking feature of the software, it’s a complex architectural decision which needs to be discussed and tested by the company leveraging the technology. Sometimes it doesn’t make sense to turn on “safe” mode and sometimes it does. I remember when Linux was distributed without shadowed passwords and you needed to install and configure the shadowed password suite. Did this mean Linux was not production ready? I also remember when telnet and rsh were industry standards for remote administration of Solaris servers, and Solaris was widely deployed in the biggest companies around the world.
In conclusion, I agree with the author’s statement “Building large-scale systems requires knowledge, experience and expertise.” This is a no-brainer for any company leveraging any technology. Yes, companies need smart people to be successful; not just smart technology people, but smart business leaders, accountants, lawyers, etc. This is not an argument against mongodb. Due to the simple nature of this technology you can allow your skilled technical leaders to focus on things that will make the business a success, but every business needs smart technologists to be successful.
One gotcha you may not have ntoecid yet is that Mongo doesn’t index strings larger than 1kb. Something goes into the log, but that is the only indication. Even worse is that when doing a query that uses an index the non-indexed items are not returned they may as well have disappeared.On the positive side the 10gen team are phenomenal. When people post things to the mailing list, they are diagnosed quickly. Issues are fixed quickly. They are really on the ball and seem to work 24 hours a day 7 days a week.