What's New in Open Source Search?
By Jack M. Germain
LinuxInsider
Part of the ECT News Network
09/05/07 4:00 AM PT
Some critics of existing search engine products say there is a growing need for alternatives to the proprietary search companies and the big business associated with sponsored information and ad revenue from search results. A few innovators are conducting a quest for new search engines and an alternative to the influences of ranking done by proprietary search platform.

Free WiFi Hotspot Locator from TechNewsWorld
Wondering where to find the nearest publicly available WiFi Internet access? Our global directory of more than 100,000 locations in 26 countries is a terrific tool for mobile computer users.
Unrest Grows
Some critics of existing search engine products say there is a growing need for alternatives to the
proprietary search companies and the big business associated with sponsored information and ad revenue
from search results. A few innovators are conducting a quest for new search engines and an alternative to
the influences of ranking done by proprietary search platforms.
For instance, take the experience of Matt Burkhardt, chief executive officer of Impari Systems, as an example of the
growing user need for new search engine options. Impari Systems is a startup focusing on bringing open
source software to schools.
Burkhardt is unhappy with his efforts to disperse his information displayed on Google news feeds. He put
out two press releases only to find that soon after posting, they disappeared. Even worse, his notices
seemed to be replaced with competing information that was two years old.
That experience and others convinced Burkhardt that search is broken on the Internet. He is hoping that
something better comes along.
"Existing open source caters to [a] vertical market. We need something more mainstream," he told LinuxInsider.
Different Strokes
Search engines such as Google, Yahoo and MSN differ in their methodologies and search algorithms. Search engine technology is mostly secret, given the proprietary nature of their platforms.
Preferences for one search engine over another sometimes reach fanatic status, as users rely on a favorite
search platform to find content. One of the leading search product alternatives, according to Mindbridge's
Christian, is Apache Lucene.
Most open source searching involves a component embedded into a larger project, he noted. Similarly, most of the open source projects using full text search are built with Lucene as the basis.
These alternative open source search projects include both desktop technologies and server-side
technologies, alone or in combination, he explained.
The Lucene Model
Apache Lucene is an open source, full-featured text search engine library written in Java
that is
compatible with cross-platform searching. It is available for free download.
Its June update includes new features that include a payloads package for query mechanisms. This new
version is able to boost a search term's relevancy score based on the value of the payload located at that
term.
Lucene is now able to use "point-in-time" searching over NFS (network file system) structures. It also has a new API (application programming interface) for pre-analyzed fields.
A Starting Point
Using the Lucene platform as a basis for new open source search products may offer more choices. It is
capable of integrating current technology.
"From a programmer's perspective, Apache Lucene has a robust API and .net and Java compatibility. Lucene is the basis for a number of search platforms," said Christian.
NET Framework is a software component developed by Microsoft (Nasdaq: MSFT)
that is included in the Microsoft Windows
operating system. It provides a large library of pre-coded instructions. Java is a programming language
developed by Sun Microsystems (Nasdaq: JAVA)
.
Inherent Problems
Developing new search engine strategies, for both Internet and intranet use, runs the risk of other
problems for potential users, warned Christian.
For example, one problem with using an alternative search product is that components may not talk to all
data containers. Another problem is that most people are not good at managing metadata (mechanisms that help define the structure of various document types).
"We need to search multiple indexes and return results in a cohesive fashion. We see some companies just
beginning to explore this. We need a search vehicle that will pull everything together," Christian said.
New Approach
Perhaps one of the most promising new open source search offerings will become available by the end of
this year by Wiki.com, which recently completed a purchase of the Grub Web crawler tool from LookSmart
.
Until now a proprietary search engine, Jimmy Wales, Wikia chairman and Wikipedia founder, told
LinuxInsider he will release the Grub code as open source.
Grub is a Web crawler that creates an index of the World Wide Web by borrowing the processing power
donated by volunteer computers, similar to the SETI@home project, which looks for extraterrestrial life.
This will allow Wales to jumpstart his new search product without having to develop its own computer
network to crawl the Web to build and maintain a catalog of content.
"We plan to build all the software needed for free licensing for searching. I want to make all content
available license free. Nothing like this exists today," Wales said.
Wikia Search
Wales' plan for a new open source based search engine calls for an expansion of previous open source
efforts begun by projects such as Lucene. His goal is to create an open and transparent search tool that
does not mask its methodologies and search algorithms.
"There were several open source search projects. They were a start. Some of the pieces have existed. Now we are trying to give it full support," he said.
Wales plans to release some form of a very rough first cut of his new search offering by the first of the
year. He will use an ad-based model for the Web site but is not sure about the rest of the business model
yet.