Data creation seems to be a natural human inclination; From the cave paintings of Lascaux in France to the 80 million photos that are published daily on Instagram, it seems to be a basic necessity to leave a trace of our existence. Data and information have been around since the beginning of civilization, what has changed significantly is the volume.
The large volumes of data we collect and analyze today have consequences on people’s lives.
The volume is the first of the “three Vs” (volume, velocity, and variety) used by experts to define the concept of Big Data. We are talking about massive volumes of data that simply cannot be processed by traditional methods; This is where units such as Terabytes (one trillion bytes) and Petabytes (one thousand trillion bytes) are used.
These numbers are no surprise considering that all of our digital activities generate data. Every second that we are using computers and mobile devices we are generating data, a variety of data to be more exact. Variety is the second of the Vs of Big Data. Data is not just numbers; They are photos, videos, audios, social interactions, etc.
Everything seems to be happening faster and faster; speed is the third characteristic of Big Data. Data is generated and needs to be processed very quickly, sometimes even in real-time. However, what can get lost in the volumes, speed, and variety is the humanity behind the data.
“Most definitions of Big Data do not take into account its inherent humanity, nor do they significantly address its implications in the relationship between technology and the changing ways in which we define ourselves.”
Rebecca Lemov, Professor of History of Science at Harvard
But even worse than forgetting the human element that makes up the data, is using it against people.
Big data comes with big responsibilities
As with any technological advance, Big Data has the potential for both good and bad. Huge amounts of data help businesses and organizations identify areas of opportunity, reduce costs, and improve decision making. But all these benefits come at a price.One of the main concerns associated with the Big Data issue is privacy.
The constant monitoring and analysis of each of our activities and interactions can be somewhat concerning. And this will continue to grow! Experts predict that more and more devices and objects will be connected to the network, which means more invasions of our privacy.
We cannot deny the importance of addressing the issue of privacy in the age of Big Data, but there are also other, perhaps less obvious, issues that can be much more disturbing and harmful. Another danger that we must explore is hidden in the processes used to analyze the massive amount of data, the algorithm.
An algorithm uses defined and ordered rules and instructions to execute an activity. We usually think of algorithms as perfect and incorruptible instruments, but the reality is that algorithms are defined by humans and therefore susceptible to biases and errors.
Blind faith in algorithms
Cathy O’Neil, a Ph.D. in Mathematics at Harvard University and author of the book Weapons of Math Destruction, has witnessed the devastating effects of Big Data. O’Neil worked as an analyst on Wall Street but decided to leave the world of finance and become an activist after realizing the damage caused by statistical modeling during the US mortgage disaster.
The author argues that algorithms are not objective and impartial and that they can be codified, consciously, or unconsciously, with biases that can promote inequality. Because the general population does not possess high mathematical knowledge, most simply accept algorithms as irrefutable truths and blindly trust them.
An example that O’Neil gives us where we can perceive the biases of algorithms is the following:
Let’s imagine that a company is having a high turnover of engineers and therefore decides to use an algorithm to improve its recruitment process. When creating the algorithm the company must define what it considers to be a successful result; in this case, success is defined as an engineer who lasts more than two years in the company and receives at least one promotion during this period.
Finally, the historical data of the company is used to “train” the algorithm. Suppose that during the history of the company no female engineer has lasted more than two years in her position, the algorithm will automatically discard all CVs of women based on this pattern.
“Since algorithms are considered ‘mathematically sophisticated’, and people are not considered ‘mathematically sophisticated’, they feel they have no right to question these things.”
This example shows us a fundamental truth: algorithms detect patterns but do not understand them. What O’Neil proposes then is a bit of skepticism about the Big Data phenomenon and that we do not get carried away by its promises.
Besides, she also invites us to question algorithms (even if we do not have a doctorate in mathematics) and to seek to understand the statistical models that are being used, especially when they can have a direct impact on our lives.
Learning to live with Big Data
Given the negative evidence that we find against the use of Big Data, what path can we take? Experts say that Big Data will not disappear, on the contrary, it will become even more omnipresent. Antonio Conde, Director of IoT and Digital Transformation at Cisco Spain, explained that data is “the new oil” and that data is becoming key pieces of society and the economy.
Big Data seems to be inevitable, but this fact should not fill us with pessimism. Our salvation, argues Professor Rebecca Lemov, is to raise awareness about Big Data and never forget its human component.
“It is necessary to keep in mind that data is not only generated about individuals but is also made of individuals. They are human data ”.
For her part, Cathy O’Neil advises all people who work with Big Data to take ethical aspects very seriously, even if this represents a great challenge. For the general public, the advice is to have the courage to question algorithms when necessary and to remember that correlation in the data does not necessarily imply causation.
“All computer science majors and all academic data science programs must include ethics.”