MongoDB's Eliot Horowitz: The Database Renaissance Has Begun
NoSQL technologies are giving the database landscape a new look as they steadily push a shift from the relational database model. Young entrants to the alternative technology, such as MongoDB, have been gaining traction despite an admitted need to mature with more needed features.
"For MongoDB as a part of the NoSQL space, it is a matter of maturity. The model is very new -- maybe five or six years from day one of source code. In the database space, that is sort of infantile. Oracle has been around for some 35 years. It is mature. MongoDB has a long way to go to add a lot of the features people need. So there is a lot of work to do to get to where we need to be," Eliot Horowitz, CTO and cofounder of MongoDB (formerly 10gen), told LinuxInsider.
Until a name change in late August, 10gen was the company behind MongoDB, but having a company name different from the database it created caused product confusion. So 10gen refocused its branding to better cash in on MongoDB's ranking as No. 6 by DB-engines, which ranks databases according to Internet chatter, job listings, LinkedIn mentions and Google Trends results.
MongoDB has surpassed Microsoft Access in the rankings, DB-engines recently announced.
Perhaps one of the most beneficial traits NoSQL provides over relational databases is its scalability. The relational database model is only capable of scaling up. So database admins are forced to buy bigger servers as database load increases.
The NoSQL model allows scaling out by distributing the database across multiple hosts as load increases -- but it does much more.
In this interview, LinuxInsider talks to Horowitz about the growing strength of the NoSQL market and the issues involved with alternatives to the traditional relational database model.
LinuxInsider: Why are developers turning to NoSQL solutions?
Eliot Horowitz: Relational databases were designed for certain use cases in mind. A lot of those you can not even do today. How people are using them today is very different from what they were designed for. So people are trying to make databases that are easier for developers to work with and are structured in a way that [is] more comfortable for modern applications. They are also trying to make the data model a little more intuitive and, in my mind, have a little bit more ... real-world support.
For example, if you start using a profile in a relational database, you have one collection that has the call information with first name/last name. Then you have another collection that has their address, and another with rights, authors, etc. This all works well from a storage standpoint, but when you want to create an entire view, you have to join a collection of tables and mash everything together.
LI: How does the NoSQL approach solve that joining and mashing?
Horowitz: In a document database you store everything about a user in a single place. You can easily edit it and add to it. So developers like it. Another thing about NoSQL databases is they let you work in a normal schema. It is easier to scale. It saves time for the developers. It saves time in the operation. Developers can actually spend time developing the product, which is what they should be doing, rather than working on scaling solutions.
LI: Is it now a case of the relational database becoming so outdated that it no longer serves its original purpose?
Horowitz: Relational databases still work very well for the things they were designed to do. What is happening is that they work so well for those purposes people continue to use them for everything. Relational databases are 40 years old. People are trying to do things with them that were never thought of for the original purpose. From a use case, there are just better choices available today.
LI: What other kinds of modern-day uses do NoSQL technology serves?
Horowitz: They work well with any kind of content management system. Things built around mashup, for example, documents that have similar attributes but are otherwise totally different. To further that point, at MetLife they have 72 different data stores all pumping data into MongoDB so they can put everything about a single user in a single place, even though every person has different types of information. This approach is great for combining lots of data that is a little bit different.
LI: Is there a limit to the types of unstructured data NoSQL databases can handle?
Horowitz: Let's say you want to have a media collection. A person wants to upload pictures, video and anything that pertains to him or her. One big problem is that photos have different attributes than videos. You want to be able to store everything in the same collection but have different indexes for videos and song titles and photos, each with different fields but everything mashed all together. That is a pretty typical use example for what NoSQL databases like MongoDB can do.
LI: In developing MongoDB, why was it important that it be an open source project?
Horowitz: I think at this point in the world, a closed-source database does not have much of a chance. Open source has proven to be a right model for this kind of software. There are lots of business models about how to have open source software and a successful company. So much open source software is out there today that you can use for free unless you want to pay money for support. That model has proven effective. The big companies have shown that they are willing to pay so it has actually worked pretty well. A closed source database to make its way into the market at this point would be nearly impossible.
LI: Given the growing success of this trend away from relational database technology, what factors distinguish one NoSQL database product from another?
Horowitz: One of the key things is the data model. Different models include document, key model and graph. I think the key is that the document model is the right one for developers.
LI: What was the reasoning behind using a document database versus key value or graph for MongoDB?
Horowitz: The document model does a lot of things that make it feel like a traditional relational database. In our case, we did not try to change things that did not need to be changed. For example, indexing in MongoDB works almost the same as it does in a relational database. A lot of the tools and the methods are basically similar. The document model is really the most confortable and effective model for developers.
LI: How else do these different data models differ from each other?
Horowitz: Let's say I have a collection of addresses for New York state. Each one has a street location and a phone number. Searching for a particular record requires that the database have a degree of flexibility. This requires that the database has a way to look into the document, a language for searching them and a language for updating them.
LI: What trends do you see affecting the database market today?
Horowitz: There is a ton more data now than there was before. People are collecting from the data; they are trying to aggregate it. The amount of data flowing now is massive. People do not throw anything away.
Also, online requires that there be no down time. I remember 10 or 15 years ago, a bank's access would go down for maintenance a few hours on Saturday morning and nobody cared. Today, Facebook can't go down; Twitter can't go down for a few hours every Saturday morning just to do maintenance.
Another trend is for developers who can innovate quickly. [It's] a huge advantage for companies who can do it. A lot of times a developer would conceive an application and plan to have it out in three years. Today, the development model is changing. Now when I want to build a perfect product today, I want to get a version out quickly in a few months, see how it works, see how people like it and update it right away. That kind of model is much more suited to a document database than a relational database.
The hardware business has also changed a little bit. The notion of bigger has slowed down a little bit. ... Things are growing more parallel.
LI: What impact are these trends having?
Horowitz: All of these trends are hitting together. I think that is why we are seeing a renaissance in the database world.
LI: Does the migration to the cloud have any impact or is that an unrelated trend?
Horowitz: I think they are relatively different, but there are some things connected there. For instance, vertical scaling in the cloud is not really possible. If you want to play in EC2, you are limited to whatever the largest EC2 instance is. You can not grow bigger than that.
LI: Is the limitation to only horizontally scale with relational data in the cloud a drawback?
Horowitz: The problem is that with the relational database, the data model is not really suited for it. When you join data machines, you can get really bad performance. They are very complicated, and it requires you to use the database in a way that it was not designed.
LI: So the NoSQL design avoids a lot of those complications?
Horowitz: Yes. The data model makes it simpler to scale. If there are pieces of the query or things in the database that you want to do that tend not to scale, then we try not to add them. Then the things that you can do will scale.