A new open source project, PredictionIO, is building the MySQL of prediction.
The young company recently released version 0.7.3 of its open source machine learning server. Unlike typical prediction algorithms and open source libraries, PredictionIO is based on the concept of making machine learning available to software developers.
PredictionIO cofounder and CEO Simon Chan sees a gaping hole in open source tools to connect database programmers and software developers. The new project’s goal is to make it easier and more reliable for devs to use their database content to create predictive features.
Typically, developing functionality like personalization, recommendations and content discovery are very time-consuming. Chan is determined to make those processes simpler and faster with PredictionIO.
Chan assembled a team of developers and entrepreneurs with a background in computer engineering at Google, the University of California Berkeley and elsewhere. He is tapping the open source community to enlist more than 5,000 contributors in the project.
In this exclusive interview, LinuxInsider discusses with Simon Chan the viability of the PredictionIO project and the need to make machine learning more developer-friendly.
LinuxInsider: Why is the concept of an open source machine learning server so vital to you?
I became more focused on this mission critical predictive software development while working on Ph.D. research in London. We started prototyping something on Github and worked with some large-scale system engineering. I became more active in a mission to make machine learning more accessible to more developers.
As that idea evolved, what was a research group became a company. We raised some money to enable everyone to work on the project full time. We added a data scientist from Google and an engineer from Oracle. So they all became our core team.
LI: How viable is this new project in building an open source machine learning server for developers?
On Github, we have more than 5,000 developers engaged in the project. We have contributors building different kinds of components to run on top of the PredictionIO system for integration into other systems. So all sorts of things are going on within the open source ecosystem.
LI: What are the developmental hurdles involved in making this project sustainable?
When we first started working on PredictionIO, the intelligence models took our development team a few months to build a really primitive predictive model. That does not include the troubles we had when we worked on moving out that model to production on the distributive environment.
For instance, we built a new recommendation engine. It worked perfectly in the development environment, but when we deployed it on production, it took two days just to update the predictive model. This illustrates the degree that machine learning challenges software engineers and programmers.
LI: Is PredictionIO a revolutionary effort or just the latest newcomer in the process?
Nowadays there are a lot of open source products like Hadoop that are great tools for data scientists. You still need a few months and a team of data scientists and engineers to build a simple solution on top of those products.
LI: How competitive is the prediction software market?
There are many prediction products out there. They are great technology tools. But talking about developer needs, we do not see a lot of application-level machine learning open source solutions. Most of the open source solutions are on the data processor layer. There are also a lot of great open source learning libraries as well. But we do not see anyone focusing on developers for the application-level products.
LI: Why is PredictionIO unique?
The two key elements are open source and developer usage. Taken together, these two elements make it possible for us to bridge the gap between data science and software development. This will make it easier for the two groups to work together.
LI: So Prediction software can lessen the need for data scientists?
We are not trying to replace data scientists with our product. It is more like because it is open source, everyone can focus on their own components. Data scientists can share their recommendations. Developers can deploy them on their own applications. We do not see models like that currently on the market.
LI: Does Prediction IO work with any MySQL-based database, or is its use more restrictive?
Currently it does not connect to MySQL directly out of the box. The model gives you an API interface where you can stream in the data to PredictionIO. Because this is an open source project, you can actually modify that so you do not have to actually use the data stored in PredictionIO. Instead, you can get the data stored directly in MongoDB and MySQL. It is very open as to what it can plug into. We built the product with developers in mind.
LI: Are you enhancing the machine learning process or just repurposing the existing prediction software model?
We built the product from a developer’s point of view. We are looking at what makes it useful for developers when building mobile and Web applications and Internet of Things. We made it easy for users to point it to their own existing data stores.
LI: How has your production team tapped into the cooperative nature of open source in developing PredictionIO?
We were fortunate to be selected by the Mozilla organization to participate in its WebFWD program. This program selects open source projects globally and supports them. We spent several months at their headquarters working with their open source developers. They helped us shape the product and make it more developer-friendly.
Other great organizations helped us develop PredictionIO. We are also a part of the 500Startup Network. Basically they support really early stage ideas. They helped us transform the idea for the technology into a sustainable business around the product. The goal is to continue supporting development of the technology.
Recently we started using the coworking space of Stanford’s StartX Program in California as one of the participants in the current session. This is a nonprofit business incubator associated with the university. That association gave us a chance to work with many great companies to see how they want to use machine learning and how we can shape PredictionIO to fit those needs.
LI: How are you monetizing PredictionIO to generate funds for the company?
I do not see that as our focus. Like many other open source products, I do not think revenue is the main focus, especially at this phase. We still think that the problem we are working on is challenging enough for us to focus on it in the coming years. The challenge of machine learning is more than the challenge of the database server. In database, if you have the SQL language you can do most of the tasks. in machine learning, every prediction problem is unique.
LI: How so?
For example, product recommendation for a fashion company is totally different than for a video company. For the fashion company, the goal is to increase revenue. For a video company, the goal is to increase user engagement — like getting people to spend more time on the platform. So you need different algorithms and different parameters and different business logic as well.
For a fashion company, you might care about inventory. There is a limited number of products in the clothing line, so when I make recommendations I have to take stock inventory into account. Not so for the video company. The video company might have to take into account how much time you play each video.
LI: How does that affect the problem solution process?
I am trying to illustrate the challenge the prediction problem is facing. There are some open source prediction algorithms on the market — but not many developers can use them straight away out of the box because of all the various issues. With the help of all our contributors and the open source community, we can solve all of these problems.
LI: Why is that a factor in monetizing a revenue stream?
Before having a useful — as in fully developed — product, I do not see how we should focus on revenue. But having said that, the good thing about machine learning is that it is so mission critical. If you are running a company that is trying to use PredictionIO to understand customer behavior to make smart business decisions in real time, you can not afford any down time.
We have received inquiries from companies asking us if we can offer enterprise support or provide enterprise editions with more security or resource management features. So certainly there is great potential for our building a revenue model as a real company behind the product, but I think we are in too early a stage for that right now.
LI: Do you see actual business success or only ongoing community development for PredictionIO?
I built three startup companies in the last 10 years. These were consumer-facing software applications for social networking and mobile apps. I have an engineering background and am heavily involved in building products. I got involved in the concept of predicting behavior based on data analysis.
In my earlier startup ventures, I was more concerned with building a great team than with making money. I think the lesson I learned from that is revenue is like oxygen for a great product. You need revenue to keep great people working together to get the product to market and educating potential users.
We definitely need to build a profitable company behind a great product. We are fortunate enough to have funding support from investors right now. We are building up our engineering team. We have a little bit of time to just focus on the product at this moment.