Processing Big Data Once You've Stored It

The amount of data we generate on a daily basis in the 21st century is unimaginable by the standards of 10 or 20 years ago. With some entities generating upwards of 600 terabytes per day, there's seemingly no end to the data stream. Despite the massive scale of big data today, IT professionals still have to devise an efficient method for processing such large data sets. Generating and collecting the data, as it turns out, is only the beginning of the challenge.

A Heap of Data

For many, the solution to big data processing is the Hadoop platform. A Java-based program that runs via Apache, Hadoop was first released in 2011. As such, it certainly wasn't the first, or last, system for processing big data.

In fact, there are a number of Hadoop alternatives when it comes to storing and processing big data. MapReduce, or MPI-MR, was pioneered by Google in a fashion that uses parallel processing in order to handle enormous sets of data. Other versions of MapReduce exist as well, including Plasma MapReduce, Mapredus and more. Minceat gives python users the ability to implement the functionality of MapReduce into their own frameworks.

There are other alternatives, too. Elastic Phoenix was originally developed by the IT experts at Stanford University, while a company known as Gearman offers a proprietary system of their own. HPCC relies on high-performance computing and parallel processing, while CloudCrowd is meant for Ruby users. Those who prefer an open-source framework might find interest in Condor. Additional big data processing systems include Amazon Redshift, Spark, Ceph, R3, HPCC, Stratosphere, QFS, Storm, GoCircuit, HaLoop and more.

Understanding and Embracing Big Data

As more professionals begin to realize the importance of big data analytics in their industry, and as more companies take an interest in big data processing, even more big data processing engines are showing up. There are currently dozens of solutions to the issue of big data processing, and that number continues to rise on a day-to-day basis.

Bill Loconzolo, vice president of data engineering with Intuit, spoke about this trend and the Hadoop platform specifically in a recent interview. He is quoted as saying: "The reality is that the tools are still emerging, and the promise of the platform is not at the level it needs to be for business to rely on it. In the past, emerging technologies might have taken years to mature. Now people iterate and drive solutions in a matter of months — or weeks."

Big Data and Deep Learning Capabilities

The emerging concept of deep learning could have a profound impact on the future of big data processing. Introduced to the mainstream public through IBM's Watson, the Jeopardy-playing computer, the concept of deep learning lets computers process big data, recognize specific information and even learn how to better respond to such data in the future.

Brian Hopkins, current analyst with Forrester Research, spoke about the future of big data by saying: "Big data will do things with lots of diverse and unstructured text using advanced analytic techniques like deep learning to help in ways that we only now are beginning to understand."

J.R. Johnivan 09/24/2016

Comments

No comments yet. Sign in to add the first!

Data Backup Digest

Do-It-Yourself Windows File Recovery Software: A Comparison

Topics

Processing Big Data Once You've Stored It

Comments