Welcome | Sign In
TechNewsWorld.com
Operating Systems

IBM to Open Source Conceptual Search

Print Version
E-Mail Article
Reprints
IBM to Open Source Conceptual Search

The ability to hunt through corporate data that is not stored within easily searched databases, called unstructured data, is becoming more and more important as employees communicate and conduct business through e-mail, word processing, Excel and PowerPoint.


IBM (NYSE: IBM) Research will turn over its data search technology to the open source community, the company said today. The Unstructured Information Management Architecture (UIMA) searches store data not through keywords, but by analyzing the data within documents to see if they fit the concepts and facts the user is researching.

It will be made available through SourceForge, a repository for open-source code, by the end of the year, IBM said.

Which Rock?

Nelson Mattos, IBM distinguished engineer and vice president strategy, WebSphere Information Integration Solutions, used the example of the word "rock," which can mean a stone, a type of music or to move back and forth. Searching for the keyword "rock" will yield documents with all those definitions, but the UIMA search will be able to sort out the irrelevant data.

The ability to hunt through corporate data that is not stored within easily searched databases, called unstructured data, is becoming more and more important as employees communicate and conduct business through e-mail, word processing, Excel and PowerPoint.

"Employees spend about one-third of their time looking for relevant information to get their job done," Mattos told TechNewsWorld. "Eight-five percent of data stored in corporate repositories today is unstructured. Only 15 percent is things you can represent as rows and columns and it is that 15 percent that companies use business intelligence to analyze."

Many Practical Uses

Gathering and analyzing the vast majority of business data can drastically change how companies relate to their clients, because, for instance, they will be able to extract and analyze call center information much more quickly, Mattos said.

The technology has applications beyond enterprises. For example, government agencies could search through all available data, and medical researchers might be able to aggregate information on patients and/or medications and spot patterns earlier.

UIMA, which took four years from concept to inception, is incorporated into IBM's WebSphere Information Integrator Omnifind Edition, WebSphere Portal Server and Lotus Workplace. IBM also has the support of Attensity, ClearForest, Cognos (Nasdaq: COGN), Endeca, Factiva, Kana, Inquira, iPhrase, Inxight, nStein, QL2, SAS, Schemalogic, Semagix, SPSS (Nasdaq: SPSS) and Temis, making UIMA a standard framework for searching and analyzing unstructured data.

"The framework will have broad applicability once you have companies building applications on it," Mattos said about the decision to open source. Google (Nasdaq: GOOG), Microsoft (Nasdaq: MSFT) and Yahoo (Nasdaq: YHOO) -- the major search engine competitors -- all offer a desktop search feature, but they are driven by keywords. However, the potential is there, with UIMA being open-sourced, that any one of these companies could take the framework and build new search strategies onto it.


Print Version E-Mail Article Reprints More by Susan B. Shor


Talkback: Join the Discussion.
Nice thoughts, but already implemented in InfoCodex.
zrr
Posted 2006-08-09
Thanks for the interesting article. Once again IBM is giving us a great vision about the future ...

More by Susan B. Shor

Salesnet President Jonathan Tang Ready to Take On Salesforce.com
February 07, 2006
"We think it's Salesnet's time now. We've been around since the beginning, we've been lying low, but you're going to start to see more of us. We've done it through organic growth and happy customers. We continue to focus on customers."
Comcast Follows Time Warner in Offering 'Family' Programming Tier
December 23, 2005
"The demand for this type of tier is coming from the FCC and Christian conservatives. It has nothing to do with legitimate consumer demand," Todd Chanko, senior analyst at Jupiter Media, told the E-Commerce Times.
High-Risk Flaw Found in Symantec's Software
December 22, 2005
"Part of the significance of this vulnerability announcement is that your machine can be exploited without you needing to do anything at all. You don't even have to open an e-mail or attachment, and this happens with the default configuration of the product," said Forrester Research senior analyst Michael Gavin.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network