Monday, 23 May 2016

Dislike This: Facebook’s experimental ethics

Dhiraj Murthy is a Reader of Sociology at Goldsmiths, University of London. Dhiraj Murthy’s current research explores social media, virtual organizations, and big data quantitative analysis. His work on social networking technologies in virtual breeding grounds was funded by the S. National Science Foundation, Office of Cyber Infrastructure. Dhiraj also has a book about Twitter, the first on the subject, that is published by Polity Press. His work on innovative digital research methods has been cited widely. For further information, visit his website .

The Facebook Psychology ‘experiment’ which manipulated the emotional content of nearly 700,000 users provides evidence that corporations need to have review procedures in terms of ethics that universities of been developing for some years surrounding social media research. In a university context, Institutional Review Boards (IRBs) are responsible for monitoring the ethics of any research conducted at the University. The US government’s Department of Health and Human Services publishes very detailed guidance for human subjects research. Section 2(a) of their IRB guidelines states that “for the IRB to approve research […] criteria include, among other things […] risks, potential benefits, informed consent, and safeguards for human subjects”. Most IRB’s take this mission quite seriously and err on the side of caution as people’s welfare is at stake.
The reason for this is simply to protect human subjects. Indeed, part of IRB reviews also evaluate whether particularly vulnerable populations (e.g. minors, people with mental/physical disabilities, women who are pregnant, and various other groups depending on context) are not additionally harmed due to research conducted. Animal research protocols follow similar logics. Before University researchers conduct social research, the ethical implications of the research are broadly evaluated within ethics and other criteria. If any human subject is participating in a social experiment or any social research, most studies either require signed informed consent or a similar protocol which informs participants of any risks associated with the research and allows them the option to opt out if they do not agree with the risks or any other parameters of the research.
Therefore, I was tremendously saddened to read the Proceedings of the National Academy of Sciences (PNAS) paper co- authored by Facebook data scientist Adam D. I. Kramer, Jamie E. Guillory of University of California, San Francisco and Jeffrey T. Hancock of Cornell University titled ‘Experimental evidence of massive-scale emotional contagion through social networks’. The authors of this study argue that agreement to Facebook’s ‘Data Use Policy’ constitutes informed consent (p. 8789). The paper uses a Big Data (or in their words ‘massive’) perspective to evaluate emotional behavior on Facebook (of 689,003 users). Specifically, the authors designed an experiment with a control and experimental group in which they manipulated the emotional sentiment of content in a selection of Facebook users’ feeds to omit positive and negative text content. Their conclusion was that the presence of positive emotion in feed content encouraged the user to post more positive emotional content. They also found that the presence of negative emotion in feed content encouraged the production of negative content (hence the disease metaphor of contagion). In my opinion, any potential scientific value of these findings (despite how valuable they may be) is outweighed by gross ethical negligence.
This experiment should have never gone ahead. Why? Because manipulating people’s emotional behavior ALWAYS involves risks. Or as Walden succinctly put it ‘Facebook intentionally made thousands upon thousands of people sad.’
In some cases, emotional interventions may be thought to be justifiable by participants. But, it is potential research subjects who should (via informed consent) make that decision. Without informed consent, a researcher is playing God. And the consequences are steep. In the case of the Facebook experiment, hundreds of thousands of users were subjected to negative content in their feeds. We do not know if suicidal users were part of the experimental group or individuals with severe depression, eating disorders, or conditions of self-harm. We will never know what harm this experiment did (which could have even lead to a spectrum of harm from low-level malaise to suicide). Some users had a higher percentage of positive/negative content omitted (between 10%-90% according to Kramer and his authors. Importantly, some users had up to 90% of positive content stripped out of their feeds, which is significant. And users stripped of negative content can argue social engineering.

To conduct a psychological experiment that is properly scientific, ethics needs to be central. And this is truly not the case here. Facebook and its academic co-authors have conducted bad science and give the field of data science a bad name. PNAS is a respected journal and anyone submitting should have complied with accepted ethical guidelines regardless of the fact that Facebook is not an academic institution. Additionally, two of the authors are at academic institutions and, as such, have professional ethical standards to adhere to. In the case of the lead author from Facebook, the company’s Data Use Policy has been used as a shockingly poor proxy for a full human subjects review with informed consent. What is particularly upsetting is that this was an experiment that probably did real harm. Some have argued that at least Facebook published their experiment while other companies are ultra-secretive. Rather than praising Facebook for this, such experiments cast light on the major ethical issues behind corporate research of our online data and our need to bring these debates into the public sphere.

Monday, 9 May 2016

Putting data science in the service of social science

Carl Miller (@carljackmiller), Centre for the Analysis of Social Media, Demos

The rise of social media has been important; that is no great revelation. It has wrought profound social change, buffeted our institutions and altered, for many of us, our way of life. New identities, dialects, cultures, affiliations and movements have all bloomed and spread across the digital world, and spilled out of it into mainstream public life.

Back in 2012, we at Demos could see that social media was changing research too. The transfer of social activity onto digital spaces was ‘datafying’ social life. Huge new datasets were being routinely created that we saw as treasure troves of behavioural evidence: often very large, in real-time, rich, linked and unmediated. It was a massive new opportunity to learn about how people and society worked.
Unlocking these datasets presented an enormous challenge. The sheer scale of social media data also meant that conventional social research methods couldn’t cope. Powerful new analytical techniques - modelling, entity extraction, machine learning, algorithmic clustering - were needed to make sense of what was happening. However, the true challenge wasn’t a technological one alone. It was how to deploy the new tools of data science in the service of social science. Getting better at counting people is not the same as getting better at understanding them.

We established the Centre for the Analysis of Social Media that brought together social and policy researchers at Demos, and technologists from the University of Sussex with the explicit aim of confronting this challenge. The first layer of the challenge has been the technology itself. The tools of big data analysis needed to be put into the hands of non-technical researchers: the subject matter experts who have long understood social science, and now needed to be able to do it in a new way. We built a technology platform, Method52, which allowed non-technical users to use a graphical user interface, and drag-and-drop components to flexibly conduct big data analytics, rather than be faced with a screen full of code. Especially important was to make accessible a vitally important technique called natural language processing. Coupled with machine learning, it is one of the crucial ways of understanding bodies of primarily text-based data (like Tweets or Facebook posts) that are too large to manually read.

However, any technology - even one that learns - is just a tool and the second layer has been to learn how to slot all the technology into a broader social scientific methodology. We’ve just concluded a major study with the pollsters Ipsos MORI, on how to use tools like natural language processing within a broader framework that stands up to social scientific scrutiny. Much of this has been to develop a process of big data analysis that cares about the same things that social science cares about: the introduction of possible biases in how the data is sampled and collected; the non-representative skews in who uses social media; the danger of analyst pre-conceptions and bias in how the data is measured and handled; the difficulty of measuring at great scale the textured complex utterances of people in specific social contexts and the importance of interpreting the results in the light of the norms, cultures, languages and practices of social media itself.

But even beyond this, the third layer has been get social science to govern the whole endeavour: the questions that are asked, the implications that are drawn, how the research is used, and, of course, the ethical frameworks that control its use.

The big data revolution will not slow down, it will only gather pace. The scales of data will only increase, and the technologies and techniques to harness data are becoming more capable and powerful at a bewildering rate. To my mind, this means that social science - qualitative as well as quantitative - has never been more important. It has never been more crucial to point out the inherent difficulties in studying people in all their messy and chaotic complexity, all the pitfalls of reducing human behaviour into something that can be counted and aggregated, and of how understanding society doesn’t stop with a series of raw metrics, however large they are.

This article was originally published in the National Centre for Research Methods’ Newsletter 2016:2 -


1 More information on its work is available at:
2 For more information on Method52, see Jamie Bartlett, Carl Miller, Jeremy Reffin, David Weir, Simon Wibberly, ‘Vox Digitas’ (Demos: 2014): Vox_Digitas_-_web.pdf?1408832211
3 For a further description of natural language processing, see Jeremy Reffin, ‘Why Natural Language Processing is the most important technology you’ve never heard of’, Demos Quarterly 8, Sprint 2016, http://quarterly.
4 See ‘the wisdom of the crowd’, Ipsos MORI, ourexpertise/digitalresearch/sociallistening/wisdomofthecrowd.aspx
5 For more information on this work, see representivity_final.pdf?1441811336

Further Reading
On the current work of the Centre for the Analysis of Social Media at Demos,
A technology edition of Demos Quarterly, Issue 8, Spring 2016,