It Does You Good
The flavour of the month may be data, especially in its ‘big’ form. But are we deluding ourselves into believing that big is always beautiful? Sure, big data identifies trends; helps to better understand and target customers; recognise and optimise business processes; and improve mechanical performance. It also has a role in public health, scientific research, and financial trading.
But, should we show caution when it extends unchallenged into security and law enforcement, or the ‘optimisation’ of cities and countries? It cannot be assumed that all data will ultimately be used for social good. Sometimes projects based on mass data increase inequality, and consequently harm those they were designed to help.
Bigger the Better
In 1907, Charles Darwin’s cousin Sir Francis Galton asked 787 villagers to guess the weight of an ox at a country fair. None of them got the right answer, but when Galton averaged their guesses, he arrived at a near perfect estimate. Beating not only most of the individual guesses but also those of alleged cattle experts. Thus the ‘wisdom of the crowds’ was born.
Groups of people pooling their abilities to demonstrate collective intelligence and average judgement converging on the right solution. It’s a pleasing theory and tempting to apply to all sorts of decision-making processes. Until, that is you realise that the crowd is far from infallible. Good crowd judgement only arises when people’s decisions are independent of one another. Influenced by other’s guesses, there’s more chance that they will drift towards a misplaced bias. In other words groups, when fed with information, tend towards a consensus to the detriment of accuracy. Witness the recent election polling predictions.
Analysing the Detail
Nothing in doing data analysis is neutral. How data is collected, cleaned, stored. What models are constructed, and what questions are asked. All tend towards discrimination.
As Dana Boyd, in her excellent article, ‘Toward Accountability’ asks, “How do we define discrimination? Most people think about unjust and prejudicial treatment based on protected categories. But discrimination as a concept has mathematical and economic roots that are core to data analysis. The practices of data cleaning, clustering data, running statistical correlations, etc. are practices of using information to discern between one set of information and another. They are a form of mathematical discrimination. The big question presented by data practices is: Who gets to choose what is acceptable discrimination? Who gets to choose what values and trade-offs are given priority?”
Even so, making data available to the public must be a good thing – it’s democratizing – right? But what if it’s not? For instance, what happens when big data is used in conjunction with a computer algorithm to predict crime? In theory analysing large amounts of crime data should spot patterns in the way criminals behave. Resources could then be deployed more effectively in the areas of predicted criminal activity. Result!
Or, what happens when parents are encouraged to select their children’s school places on the basis of an education data ‘dashboard’. Benchmarking every aspect of a school’s performance against the mean should tell you everything you need to know to make a rational decision about your child’s future. Simple!
Lastly, how good would it be if, when you applied for a job online, you were swiftly shortlisted for interview on the basis of your merits? Your CV having been analysed against the qualities of those who had previously succeeded in that role. Brilliant!
But wait! Critics of this kind of data analysis raise a number of ethical concerns. They claim predictive policing, for instance, leads to victimisation, and unnecessary stop and searches in areas with high crime rates; displacement of crime elsewhere; gathering of sensitive data, leading to invasions of privacy; and lastly, that it ignores the social, economic and cultural factors that cause crime. Advocates, on the other hand, argue that a variety of policing approaches are necessary; that research has found no evidence of victimisation; and that it makes police decision-making less biased.
Surely no one can argue that giving parents access to school data is a bad thing? But what data? What constitutes a good school? Is it test scores, student makeup, parent ratings, or facilities? Presented with the data, does every parent have the time, language skills, and ability to interrogate the statistics? And, if they do, is everyone equally able to act upon their findings by dint of wealth or mobility?
Oh yes, that job you applied for! Being filtered for interview on the basis your abilities is one thing. But what about your gender, ethnicity, or sickness record? You’ll never know, because you won’t get the chance to explain. Not that anyone would be so crass as to filter on that basis. But subtle clues, like blips in your career timeline or post-code may result in unwarranted inferences. Combine these factors with feed-back loops and machine learning and before you know it you may never work for a large company again.
“Data scientists”, said Mike Loukides, VP of O’Reilly Media, “are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” So, I remain conflicted on the benefits of big data. It has its uses. But, rather than thoughtlessly surrender ourselves to its machinations – in the belief that the outcome will always serve the interests of humanity – we should remain sceptical, questioning, and downright belligerent. Especially when told that it’s for ‘our own good’. I plan to keep in mind a quote from Ronald Coase, winner of the Nobel Prize in Economics, when he said, “Torture the data, and it will confess to anything.”
Sources / Further Reading: