Big Data

Big Data: What is it? Why is it important?

Big data is more of a resource than it is a technology. The simplest way to think of big data is to think of it like the new oil for the digital economy. Not because it provides power in the traditional sense like oil but because it is used to extrapolate and understand systems with large and/or complex data sets in our everyday world. Big Data powers our ability to gain insights and structure any system that can be quantified using structured or unstructured data. The data sets being generated today however are too big and complex for traditional data processing systems to handle both in terms of time to process and ability to process.

But what separates big data from other kinds of data beyond being huge data sets? Big data can be easily defined as follows:

  • Volume: each of us produce hundreds of gigabytes of data every year in structured and unstructured form (I will explain structured/unstructured in a moment.) Companies produce even more quantities of data from their employees, customers, operations and other business related activities. Most small companies have 100 terabytes or more of data and it's growing everyday.
  • Variety: Videos, photos, tweets, posts, text messages, email, documents, pdfs, etc. All the various forms of digital data which we produce, use and save.
  • Veracity: How reliable is the data? Is it accurate? Uncorrupted? Up to date? Clean? These are important issues surrounding the data which is collected, stored and later used for any number of processes. Bad inputs will always generate bad outputs. Hence the importance of collecting and storing data correctly. Companies in the United States alone lose over $3 billion a year due to poor data quality.
  • Velocity: data which streams constantly 24/7/365 needs to be analyzed analyzed in real time to provide individuals, companies and governments accurate information. As of 2017 there were more than 20 billion network connections transmitting data every day and the number will only grow.

How is Big Data Generated?

Big Data is generated through all the activities we do on any device connected either directly or indirectly to the internet. Computers, cell phones, cars, atm transactions and other wearable devices all contribute to generating data that be used by us personally, by companies, governments or other organizations to tack, analyze or otherwise look for insights into the data that has been generated.

This grouping of internet connected data producing devices are also called the Internet of Things (IoT). I've written up a separate section on the Internet of Things and how it is shaping the future of work, technology, machine learning and artificial intelligence.

How Big Data Powers Machine Learning and Artificial Intelligence

Prior to effective machine learning systems being developed in 2007 and on, the majority of the massive amounts data companies had available were unproccessable using any of the exisiting technology. Machine Learning has made the understanding of big data possible and at the same time encouraged the creation of more data sets.

Machine Learning is a primary way to train, operate and gain insights using artificial intelligence systems. This has created a symbiotic relationship between AI, ML and Big Data as increased used one also increases the uses and advancements of the others.