Infographic: Big Data- The Hype And The Reality
The rise of social media platforms, the internet of things and multimedia has led to a record rate of data creation. The concept of “big data” has only become widespread as of 2011, referring to the staggering amount of new data generated per minute accounting to the amount of 204 million emails sent, 1.8 million Facebook likes, 278 thousand tweets, and 200 thousand photos uploaded. Indeed, the worldwide business intelligence and analytics industry is valued today at US$16.9 billion in 2016.
While the definition of data analytics and big data might seem ubiquitous, several research papers and industry reports define big data by the different components that it encompasses. The four V’s has emerged as a common framework to describe big data:
1.Volume refers to the size of the data and is the defining characteristic of big data. In terms of scale, the “digital universe” is expected to grow from 4.4ZB to 44ZB in 2020.
2.Velocity refers to the rate at which data is created, but also the speed at which it should be examined. As of 2016, 2.5 quintillion bytes of data are created everyday.
3.Variety of the dataset may be understood as the different types of data recorded: structured (e.g. sensor data and databases), semi-structured (XML) and unstructured data (social media).
4.Veracity is a component that IBM has added into the framework to define the quality of the data collected; poor quality data costs the US economy $3.1 trillion annually.
Indeed, big data’s application has been widespread across industries and we can see the fruition of results that were up until very recently impossible for man to compute. For instance, in the agriculture sector, farmers are able to better monitor crop life, increase crop yields and optimize spending by acquiring data from seeds sensors tractors and GPS. In entertainment, IBM scientist and the movie studio 20th Century Fox used machine learning and NLP to create a movie trailer based on past data. In healthcare for example, a developed machine-learning algorithm predicted cardiac arrests based on crosschecked data of 133,000 patients and 72 medical parameters.
While eliciting a lot of excitement by the media, academics, governments and companies, it is very hard however to distinguish nowadays what is truly happening in the industry, and whether or not we are able to cope with the increasing growth rate of data, relative to the supporting technologies and capabilities. What we do know however is that analytics have revolutionized decision-making processes by shifting the unknown to the known, rather than reacting, companies are now predicting.