NoSQL, Part 2: Grappling With Big Data
Jan 14, 2014 5:00 AM PT
Back in 2009, SQL databases began emerging, mainly to handle modern Web-scale databases. They could handle massive amounts of data; tackle the exponential growth of newly created digital content from social media sites and websites; and help build value around data by connecting the dots -- no small task when a plethora of data is continuously being created.
Those very features led Twitter to move from shared MySQL + Memcached to the Apache Cassandra project in 2010.
NoSQL tools really came into their own with the advent of Big Data -- data collections so large that they cannot be effectively managed or exploited using conventional data management tools such as traditional relational database management systems.
The Pain of Big Data
Social media contribute 90 percent or so of the data available today, while the use of geographical information systems, including location-based data systems, is growing rapidly.
In 2012, about 2.5 quintillion bytes of data were being produced every day. Meanwhile, the size of data sets was increasing, from a comparatively paltry few terabytes to several petabytes and, now, to several terabytes.
Businesses want to analyze the data because they yield important customer information. Walmart, for instance, is reported to have exhaustive consumer data on more than 145 million Americans that it shares with more than 50 third parties.
Further, companies are increasingly using predictive analytics to increase customer profitability and reduce customer churn, and that again requires managing Big Data.
However, analyzing such large quantities of data requires improvements in queries, the accuracy of responses to those queries and the speed of those responses.
NoSQL Databases Make Simplicity a Virtue
The relatively simple architecture of NoSQL databases is simply better-suited than that of RDBMSes to handling Big Data.
In an RDBMS, data for a given record is spread across many tables, requiring joins and careful coordination of a transaction across them all, Kelly Stirman, director of product marketing at MongoDB, told TechNewsWorld. That means transactions must be very sophisticated and be able to address a variety of failure scenarios.
Performing joins and transactions across tables becomes increasingly difficult as an RDBMS scales.
"The schema flexibility and distributed nature of most NoSQL databases make them complementary in various types of Big Data environments," Nick Heudecker, a research director at Gartner, told TechNewsWorld.
For example, NoSQL databases using document store technologies such as Couchbase let users address the documents through unique keys that represent each document. Many also offer an application programming interface or query language that lets users retrieve documents based on their contents, however, while others allow for retrieval using MapReduce.
Global travel and tourism industry player Amadeus is running a pilot using Couchbase as "a very efficient key-value store," Dietmar Fauser, its vice president of Architecture, Quality & Governance divisions for Research and Development, told TechNewsWorld. "The document store aspects of Couchbase will be used in a second step."
Cutting Costs With Commodity Hardware
Another plus for NoSQL databases is that they are developed to run on clusters of commodity hardware, which is inexpensive to source and replace. That also means they are distributed and have no single point of failure.
"NoSQL is about building the next generation of operational databases that have to deal with a large data set that is semi-structured and needs a flexible data schema; a distributed scale-out architecture that provides elasticity and easy scaling; high performance and low latency for billions of users at Internet scale; and an always-on architecture that allows for upgrades and maintenance of a system on-the-fly with no maintenance downtimes," Rahim Yaseen, the company's senior vice president of engineering, told TechNewsWorld.
For example, Couchbase is a scale-out topology with "a true Shared Nothing cluster architecture" so there is no contention for centralized resources, Yaseen explained. To scale up, users just add nodes to the Couchbase cluster.
Is ACID Necessary?
By being distributed, fault-tolerant and run on clusters of commodity servers, NoSQL databases made a trade-off over ACID -- Atomicity, Consistency, Isolation and Durability -- properties and other issues.
Some NoSQL databases offer ACID while others don't; yet others offer partial ACID support.
Couchbase does not support full ACID because "for modern Internet applications with data and users at Internet scale, and with flexible schema data, it is more important to focus on consistency, durability and atomicity," Yaseen said.
MongoDB "provides strong consistency and guarantees ACID operations at the document level, which tends to be sufficient for most applications," the company's Stirman said.
"With very few exceptions, first-generation NoSQL databases do not support ACID transactions and therefore do not support SQL," Nick Lavezzo, cofounder of FoundationDB, told TechNewsWorld. "The lack of distributed ACID transaction support and, therefore, lack of perfect data consistency has been the biggest barrier holding NoSQL database technologies out of mission-critical tasks at large enterprises."
Some database vendors that do not support ACID transactions "confuse the issue by attempting to redefine ACID to mean some weaker set of guarantees than it has traditionally meant," Lavezzo stated.
Moving Towards Acidity
ACID transactions let FoundationDB support multiple data models on top of a single storage engine architecture, Lavezzo said.
New-generation distributed database technologies such as Google Spanner F1 and FoundationDB both support data models typical of NoSQL and also support true SQL, which requires ACID support, he pointed out.
However, the lack of ACID support will not be easily remedied.
"Having spent four years building FoundationDB -- and it apparently took Google about four years to build Spanner -- we know how hard this problem is," Lavezzo remarked. "Because of this, I don't see any NoSQL technologies adding true ACID transactions any time in the foreseeable future."