Big Data's Big Challenges for Content Management
As developers often love to hate legacy technology and like to be innovators who are reinventing things, many might advise that traditional relational database applications be done away with; they may begin to lobby for the adoption of new technologies, such as NoSql-based document storage, to build their applications on from scratch. From an architecture point of view, this is not the best approach.
01/25/12 5:00 AM PT
As many know, content is getting bigger -- way bigger -- and this is scary to many technologists. At the same time, it's also getting smarter. Applications are growing more complex, challenging IT pros as never before.
How will these changes impact content management technologies? It's difficult to predict exactly, but there are insights to be found and used to plan for the future.
Bigger by the Minute
If there's one topic that keeps cropping up when it comes to content management, it's the runaway growth of data and content. Accelerated growth, combined with new requirements and new sets of tools and technologies, is a direct consequence of enterprise software's move to the Web.
The sheer numbers, covered in most enterprise content management (ECM) analyst reports, also extend to all aspects of the information technology sector, prompting developers to create a new generation of software and technology or distributed computing frameworks in an effort to cope with this scalability phenomenon.
Not all content management practitioners and information management professionals know about "Big Data" and "NoSQL." However, developers are paying close attention. So are IT decision makers, many of whom now question their commitment to specific technology providers.
Content growth is everywhere. From traditional data warehouses to new consolidated big data stores, IT infrastructure must be ready for this continuing scale; it impacts the entire IT industry, especially ECM.
Smarter by the Second
ECM technology is evolving toward a platform-based approach, enabling organizations to make their own content-centric and content-driven applications smarter. Analysts, vendors and users all agree: The time for "out-of-the-box" CMS applications has passed. Now each project can meet specific needs and individual requirements.
One thing to note is that content and data, more often than not, come with embedded intelligence -- whether through adding custom metadata and in-text information or by leveraging attached media and binary files -- and it can be utilized, whether structured or unstructured.
This can be observed on many different levels across various domains. For instance, the arrival of what some have started to call "Web 3.0": the semantic Web and the related technology that promotes intelligence out of raw content through advancements like semantic text analysis, automated relations and categorization, sentimental analysis, etc. -- effectively, giving meaning to data.
More traditional ECM components, such as workflow, content lifecycle management and flexibility, demonstrate much of the same. Smart content architecture -- along with intelligent, adaptive workflow and processed or deep integration with the core applications within information systems -- are all making enterprise content-centric applications smarter and are refining the way intelligence is brought to content.
In short, content is getting smarter on the inside as much as on the outside.
It's an Evolution, Not a Revolution
Some preconceived and simplistic notions must be left behind if technologists are to proceed effectively among these developments. It's an exciting time to watch the evolution of technologies such as NoSql databases and other systems that relate to Big Data.
As developers often love to hate legacy technology and like to be innovators who are reinventing things, many might advise that traditional relational database applications be done away with; they may begin to lobby for the adoption of new technologies, such as NoSql-based document storage, to build their applications on from scratch.
From an architecture point of view, this is not the best approach -- and it is not the way content management technology should evolve.
For years, relational databases have been developed based on real business requirements, and the same is true for Web application frameworks and content management systems. They have all implemented functionalities for specific use cases that are all still valid but are simply evolving.
In fact, such disruptive phenomena as Big Data or the new semantic technology on the scene are huge opportunities for enterprise content management solutions, not seisms yielding destruction. They are bringing new solutions and possibilities in business intelligence, semantic text analysis, data warehousing and caching that require integration into existing content-centric applications, all without rewriting them.
As a result, Big Data and smart content will push more of enterprise content management toward technical features such as software interoperability, extensibility and integration capabilities.
These developments will also demand a clean and adaptive architecture that is flexible enough to evolve as new standards arise -- such as the Stanbol project, which bridges CMS and semantic technologies, as well as connectors, to a back-end storage system like the NoSQL Document Stores or connectors with text-analysis solutions.
This underscores the advancements made in the development of modular and extensible platforms for content-centric applications. Taking the traditional approach of employing large enterprise content management suites that rely on older software architecture will make it harder to leverage these new and nimble opportunities.
In order to get the most value out of smart content and refine methods of dealing with Big Data, enterprise content management architects must incorporate a modern and well-designed content management platform upon which to build -- one that not only looks at end-user features but stays true to the development side. Enterprise content management will not be reinvented; Big Data and smart content are evolutions, not revolutions, in the industry.