Graphs are everywhere. You find them on websites adding social capabilities. Telecommunications companies use graphs to personalize customer services. Innovative bioinformatics researchers, and other organizations are adopting graph databases to model and query connected data.
Neo Technology has pioneered graph databases since 2000 and has been instrumental in bringing the power of the social graph to customers such as Adobe, Cisco and Deutsche Telekom.
Emil Eifrem, CEO of Neo Technology, cofounded the company and developed the software with cofounder and CTO Johan Svensson. Eifrem started working on graph database technology while serving as CTO at Windh, a small startup in Sweden.
He and his team of engineers faced a constantly growing challenge building and maintaining a sophisticated enterprise customer management (ECM) system, which he describes as a huge file system that you make available right away. He built the first generation based on a relational database from off-the-shelf commercial products.
That in-house proprietary software fueled Eifrem’s idea for a graph database to better control the processes relational databases struggled to manage. Now in production for nine years, the resulting open source Neo4j is a leading graph database with one of the largest ecosystem of partners and tens of thousands of successful deployments worldwide, according to Eifrem.
In this interview, LinuxInsider talks to Emil Eifrem about how he developed a new database system and why he sees the open source business model he uses being replaced by newer technology.
LinuxInsider: What problems with the relational data base influenced you to begin working on the next big thing?
Emil Eifrem: The problem was that a lot of the data that we actually processed was very connected. The ECM system is basically a big file system. As such, the file system was very hierarchical. It had folders which reached into folders that had their own folders and then files.
Once you start marrying that big hierarchical system with user groups with read or write access to parts of that tree, all of a sudden you have a big, sophisticated, connected data structure. That’s a really poor fit for relational tables.
LI: So ECM systems needed a different type of database or was it just a problem with making the coding more resilient?
Eifrem: I had 20-something engineers working under me at that time. Ten of them spent the majority of their time just wrestling with the relational database. After a while we realized that all of this data was superconnected, and that slowed the relational database.
We started wondering why we just couldn’t create a database that works with collected data using a graph structure. Of course, this was before others popularized the notion of the social graph. Back then, the social graph was not very well known in the main stream. But it is an ancient mathematical concept invented in the 17th Century.
Everything in the data is a graph with nodes with relationships connecting with different nodes. Everything is building up to a large connected data structure. If we had that as a database, it would be perfect.
LI: How innovative was your software program for use beyond your own existing ECM platform at Windh?
Eifrem: There was nothing out there doing something like this. There were some tangential things, but nothing that was like what we did.
LI: How did your new software platform compare to existing products?
Eifrem: We originally built a proprietary tool. We saw that in early 2000 when we looked at the industry, we thought the relationship database was excellent for some things but not for venture things. We did not see general market acceptance for that notion.
Back in those days the industry was just about ready to come over to the object databases. When they came out, everybody thought the relational databases were going to die and we were all going to use object databases. That, of course, failed spectacularly.
LI: So the product at that point only offered relief for your company’s own database issues?
Eifrem: We did not think there was any room in the market for any new kind of database. So we built it completely proprietary just for our in-house use. It became an in-house tool that we used to get a competitive advantage. What ended up happening was as the years moved on, building a large database is a pretty substantial technology. We truly built it from scratch, and it took a while.
LI: What turned the situation around so your new database style had more potential for use outside your own company?
Eifrem: We saw that the product was becoming more and more mature. By 2006 or 2007, the industry discourse had changed. Google had published papers asserting that they only way it was surviving with the scale of data being handled was by going away from the relational database. And Amazon published a paper saying that its key to survival was not throwing away the relational database altogether but by augmenting and complementing it with a new system called Dynamo, a new type of database Amazon designed.
All of a sudden discussion in programmers’ circles was about the inevitable death of the relational database. It was five years later. The pendulum swings quickly in our industry.
LI: Did you decide to seek distribution as open source or a commercial product?
Eifrem: At that point we felt that it was time. We had this unpublished gem that we were building for many years internally. It was pretty damn solid by then (’06 or ’07). It was time to release it to the world. We just had the previous company whose business was enterprise customer management, not to sell databases.
We quickly realized that if we were to build a commercial venture around this open source software, it would have to be a separate company. So we took a classic spinout with a small ownership stake for Windh, open sourced it and wrapped a new company around it.
LI: With little competition for Neo Technology’s new type of database software, was it clear sailing as a startup open source company?
Eifrem: It is never clear sailing. From my view, if it is clear sailing, that means you are not aiming big enough. If you are aiming for something valuable, other people are going to aim for that as well. That is what happened to us. When we started out, we were the first company to have a graph database. We did not coin the term academically. That happened in the 1980s. We were the ones who gave it the existing shape and popularized the term 4J.
LI: What was the initial response to Neo4J?
Eifrem: The first year or so we were completely alone in the field. No one else talked about this. We toyed with various different names for the technology. We settled on calling it a graph database, and that is what the market resonates.
LI: What were the driving factors pushing you to go open source? You started with a product that no one else had. How did you monetize that technology?
Eifrem: There are several classic drivers for open source. One is that you are a big company whose core business is something that is similar to the open source software you released. That is very different from what we did. Another classic driver for open source involves commoditizing an existing space. You have an equivalent product solving the exact same problems with some different angles to it. So you release that with an open source license and commoditize that. We are not that either.
LI: Why then did you decide to release Neo4J as open source?
Eifrem: We are in the third category. We are going to define a completely new market and a movement. I have to educate the audience before I can sell to them. I may or may not sell to them, but I have to get them to use my product. That is the difference between commoditizing an established market and building a new market.
LI: What hurdles does that third category place in front of you?
Eifrem: In order to get developer adoption, you have to be open source today. You can try to do it without developer adoption, but very few people have been successful that way. We found that the way for us to get into the big enterprises was through massive developer adoption. We wanted to build not just a company but a movement. For all those things to happen, we have to be open source.
LI: What do you see for the future of open source?
Eifrem: Open source is going to go away and completely die and become irrelevant. We are still going to use it. There will be a lot of open source software available. The real advantage that open source offers to companies is frictionless distribution. That is why I am using open source. A much better mechanism is available to us now that was not available even five years ago. Cloud is the new open source in that sense.
LI: What will happen to today’s open source?
Eifrem: Open source will still be around. But it will become increasingly irrelevant as the cloud becomes increasingly relevant. For example, my goal is to make my software available to lots and lots of people for free so that I can convert a small percentage of those users into paying customers in order to fund development of my software. Open source will continue to serve that purpose. But there will come a time when the cloud will become a much smarter way for releasing software to get that free adoption.
LI: So you are seeing open source morphing into a form of Software as a Service via the cloud?
Eifrem: Yes, exactly. If you look at most open source as a business model today, there are only two ways people charge for the software on the open source side. It is in the monitoring and management ofsoftware or it is monitoring and managing that software. The first one means selling tools for monitoring and managing software. The second one means running the software, which basically people call a cloud service. If you do that, it gives you all of the benefits of open source.
LI: Isn’t that precisely what you are doing now to monetize your own software?