What is big data, anyway? And how is it impacting people’s livelihoods in 2022?

Pollicy
5 min readAug 30, 2022

“If someone sends you a 20MB dataset file and claims they have sent you big data, send them a link to this blog post” Why? That might be a large dataset but it isn’t big data. So what is big data then? Big data refers to data that is so large in volume that it’s difficult or impossible to process with traditional methods. It is data that is complex and continues growing exponentially with time. We have had numerous techniques for storing and analyzing large datasets for quite some time now but this concept of big data only became popular in the early 2000s. The origin of the word is quite unclear but often times when describing big data, people always refer to its 4V’s i.e. Volume, Variety, Velocity, and Variability.

Volume

The size of the data is crucial in determining whether it can be considered big data or not. In the past, data storage was often a challenge and costly and people only stored what they thought they needed. Today, cheaper data storage options such as data lakes, data warehouses, and the cloud have eased this burden.

Variety

The characteristic of variety looks at the different forms of data. While initially spreadsheets and databases were the main sources considered by most applications, we currently have an unstructured data format that is not arranged according to a pre-set schema. This includes text documents, sensor data, images, audio files, etc. This variety of this unstructured data is what makes it difficult to store, process, and analyze with traditional methods.

Velocity

The characteristic of velocity looks at the speed at which the data is generated. Ever wondered what happens online in 60 seconds? Over 4 million google searches, over 120 hours of video are uploaded on YouTube, over 41,000 Instagram photos are uploaded, over 204 million emails are sent, over 50 billion WhatsApp messages are sent, over 3.3 million posts on Facebook, over 350,000 tweets and a lot more activity on other platforms. This data flow is massive and continuous i.e. big data.

Variability

This characteristic looks at the inconsistency and uncertainty of the data. Being user-generated especially data from social media, videos, blogs, etc, it is often difficult to clean, transform and lick data across systems. Where data is machine-generated e.g. sensor data, it may be invalid due to various reasons such as broken sensors, broken programs, etc.

We have talked about the difficulties associated with storing, processing, and analyzing big data and one may be wondering why they should go through all this trouble associated with big data.

Why mine big data anyway?

In order to understand the importance of big data, i will utilize a very common example of big data at play. One of the most popular search engines Google is a core example showcasing how big data works.

Any keyword search in Google returns a bunch of results. On this results page, one can be able to spot big data being used in results ranking to show a user the top most relevant websites. For a news website that has got news topics changing by the minute, big data plays a phenomenal role in showing you which news articles are most relevant during a particular period of time. Let’s not forget these search engine results will also be different based on your previous search history or other features such as the country you are living in.

Going back to the google results page, we can spot big data being utilized in analyzing user behavior, what did other users search for? Image understanding, what images are associated with your particular search term? etc, all these and more are questions that big data can help answer depending on your application.

Aside from improving efficiency in online platforms such as search engines, social media sites, etc, the key strength of the big data is considered to be its influence on improving people’s lives. Many governments and other sectors have embraced digital transformation which has in turn magnified the amount of data generated per pay. Around 90% of the world’s digitized data was captured over the last two years with the global annual data growth rate being projected at 40%. As a result, even governments today are embracing this big data in addressing real-world challenges. Here are some real-world applications;

Big data and Education

In education, big data can be used in building customized programs, understanding performance as well as reducing the number of dropouts. To address these challenges such as high dropout rates, big data can be utilized to perform predictive analysis aimed at understanding how students might perform in the future and this way predict if they might drop out. Purdue University in the USA utilized predictive modeling on student data such as class prep, academic performance, engagement, etc, and was able to predict which students were at risk of dropping out. This way early interventions could be performed to tackle these challenges.

Big data and smart cities

With population growth, challenges such as pollution, congestion, traffic problems, etc, are bound to happen. To address these challenges, governments can embrace smart traffic lights and sensor or signal data to monitor and counter these challenges in real time. Smart traffic lights and signals can be interconnected across traffic grids to offer analytical insights into traffic patterns. Understanding traffic flow and predicting these transport challenges can be one step towards building smarter cities. Traffic21 is a real-world example of where big data is currently being used to develop and deploy intelligent transportation systems for the twenty-first century and beyond.

Big data and Misinformation

Big data can be utilized to fight misinformation and curb fake news. With social media analytics and network analysis, governments can be able to trace the origin of fake news and the key culprits behind the spread of misinformation and this can be countered with correct information. In 2020, the DFRlab was able to utilize social media analytics to influence and sockpuppet Twitter networks that were amplifying pro-government content that was aimed at suppressing and delegitimizing the #EndSARS protests in Nigeria.

As mentioned above, big data will undoubtedly play a key role in improving people’s livelihoods. As more and more governments modernize their services availing more data for analytics, every sector including healthcare, agriculture, business, transportation, etc will be impacted.

Written by Arthur Kakande (Data Products Lead at Pollicy)

--

--