Last year, when Wikipedia founder Jimmy Wales announced plans to launch a new search engine in the first half of 2007, everyday users of this now ubiquitous tool wondered what Wales could do that Google couldn’t. However, the search engine community knew better.
Of course, Google reset the benchmark for search several years ago, which led to Merriam-Webster’s dictionary listing of the company’s name as a transitive verb.
Taking On Google
There is little doubt, though, among researchers and entrepreneurs in this space that one day Google will be — if not completely preempted — competing with other providers.
“It is inevitable,” Erik Hansen, president of SiteSpect, a provider of search engine marketing and Web optimization technology, told TechNewsWorld.
“It cannot stay on top forever. Yes, it is in the spotlight now, but a lot of smart and well-funded companies are developing technology that is different from what Google offers,” he noted. “Many of these firms are focused on solving niche or smaller or industry-specific issues. They are not necessarily big enough for Google to focus on now, but perhaps down the road they will be developed for general use.”
This current search engine research covers several areas, but can be broadly grouped in taxonomy, video or image search, and social networking categories.
‘Sifting’ the Web
Not surprisingly, some of the most interesting research is conducted in university settings.
For example, Larry Kerschberg, a professor at George Mason University, received a patent last year for his WebSifter software, which works with current search engines to maximize the users results.
The WebSifter system allows individuals and companies to specify a taxonomy tree of concepts relevant to their line of work and/or search needs. Rather than issuing a keyword search query directly to Google, a user first expands the search concepts with synonyms, and then submits them to Google and other search engines.
WebSifter subsequently generates many keyword-based search requests to search engines such as Yahoo, Google and others. Each search engine provides, say, the 50 best-ranked results for each search request.
WebSifter then ranks the aggregated results based on its patented ranking algorithm and presents them to the user, who provides feedback. In this way, WebSifter learns the user’s preferences and adapts to provide more precise and relevant results over time.
Kerschberg also has created a second system for online research, based on WebSifter, called Knowledge Sifter. This application addresses a problem search engine users frequently cite: Rarely, if ever, can a user find one site that is able to address a complicated problem or question.
Current search technology indexes information on the Web and then matches the key word against that metadata. It then directs users to pages on the Web where the data can be found. The algorithms developed by the various search engines such as Google and MSN and Yahoo process the query against the metadata.
“But let’s say someone is researching the best ways to cure a specific form of cancer of which he has been diagnosed,” Kerschberg explained. “He wants to know what treatments are available, what is the current research, what has shown to work the best. Right now, no search engine can respond to a request that says ‘What is the best way to treat my cancer?'”
To address this issue, the search engine has to understand the user’s preferences, the problem the user wants to solve and why he is asking the question in the first place, he noted.
Middleware that would include an advanced search engine or a search engine combined with a reasoning engine could take all that information and use it to match the needs of the user, Kerschberg stated.
“That is where the search technology is going in the future,” he predicted, adding, “as more and more resources are put on the Web, people will be looking not only for that one piece of information, but also a whole body of information that can be used to create knowledge.”
Six Degrees of Separation
Virginia Tech computer science professor Naren Ramakrishnan and his colleagues have developed a “creative discovery” search engine that also emphasizes the connections in data that a user might not know to ask about initially.
Called “Storyteller,” it discovers connections between information that at first appear dissimilar — a sequence of events or relationships to create a chain of concepts between specified start and end points.
Ramakrishnan likens it to a reporter chasing down leads for an article. “Some may prove to be false, some will turn out to be true. Our underlying algorithm takes up these leads and starts pursuing these multiple tracks in parallel,” he stated.
A good example of this is making a connection between Sen. John McCain and a bill about Iraq, Ramakrishnan noted. “What you do is start the search from McCain and the bill and try to see if the two meet in between. You might find out that McCain did not contribute anything directly to the bill, but that he participated in a conference where some of the proposals were discussed.”
Right now, the search engine is used primarily by biotechnology researchers, primarily because the computer power required to support a subject or subjects of larger scale is beyond the capability of the university’s resources.
Virginia Tech biochemistry faculty members Richard Helm and Malcolm Potts have used the application to discover a novel protein associated with chemical stress by searching abstracts of 140,000 publications about yeast. Keywords were developed from 3,756 abstracts containing the keywords “yeast” and “stress.”
Last August, this research was published in the proceedings of the ACM (Association for Computing Machinery) SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining) International Conference on Knowledge Discovery and Data Mining, according to the university.
“This will be a functionality that general users will be requesting from search engines in a few years,” Ramakrishnan predicted, noting that the key to making it generally available is not further research and development on the algorithms, but rather providing the resources for the necessary scale.