Laura Griffith, University of Birmingham
Big data is not a totally new concept, but the sheer amount of data being processed is increasing and the ways that it is processed is changing, and this is especially the case in relation to the health service’s potential use of the so-called care.data.
As with data collected from a range of sources from speed cameras to social media, traditional methods of analysis cannot process the level of data that is being produced. The Caldicott review, published in April 2013, acknowledged public concern over the sharing of patient data on a large scale. It also looked into the ways that detailed information-sharing on a larger scale could help improve patient care and move towards the better integration of health and social care – a well-known and long-standing weakness of the care system.
Anyone who has ever tried to get a piece of research through an NHS ethics committee will know that the NHS does not easily grant access to its data and its records even when the public benefits of doing secondary analysis on existing data sets are relatively clear. So the question remains: will the huge potential benefits of consolidating a large patient record system and using this as a dataset outweigh the risks that will arise from this operation? The risks are becoming clearer, yet few people are aware of the potential benefits. As Ben Goldacre has pointed out, the project has been sold to the public via two contradictory axes: “we will use your data for lifesaving research, and we will give it to the private sector for commercial exploitation, creating billions for the UK economy”. This dual use of patient data remains a concern for many.
Care.data – what is it?
Few members of the public actually know what care.data is, despite leaflets being sent out to households about it. But they are fast becoming aware of the controversy surrounding it. The public have an option to opt out, rather than to opt in. The press coverage has been almost universally sub-standard, and this has led Lord Darzi to speak out in the HSJ about the way the issue has been covered, and to speak up for the potential of the data-sharing project.
- Since the 1980s the NHS has been systematically collecting data about hospital admissions nationwide, and this information has been used for planning acute care, commissioning services and monitoring the quality of those services. This has, in part, led to the uncovering of substandard care, as in the Mid-Staffordshire example
- However, data will now be extracted from GP practices and consolidated. The information will include details such as family history, vaccinations, diagnoses, referrals, and biological information (such as blood pressure, BMI and cholesterol)
- Some conditions will be coded as ‘sensitive data’ – including details of infertility, abortions, gender identity matters and abuse, and will remain with the GP
- Once extracted, this data would be used by the Health and Social Care Information Centre to analyse trends in patient care pathways, amongst other things.
- The first data extraction was due to take place in Spring 2014. However, at the time of writing, this was postponed by six months due to concerns voiced by the BMA, The Royal College of GPs and Healthwatch England. Members of the cross party health select committee then delivered strong criticism of the Department of Health’s handling of the care.data project.
Much of the debate about care.data has been focused around the security of individual records. Recently it was revealed that The NHS Information Centre, the predecessor to the Health and Social Care Information Centre, had already handed over 13 years’ worth of data, covering 47 million patients, to the Institute and Faculty of Actuaries. According to NHS England, handing over data to insurers would be illegal, but it has in fact already happened. The handover of pseudonymised data – so called amber data (see below) – means that the controversy around care.data is no longer about communicating properly with the public, but is instead about transparency and the law.
Care.data is coded in three different categories: green, amber and red. Green Data is aggregated anonymous data depicting average values for large numbers of patients. Amber data is ‘pseudonymised data’ – data which is stripped of personal identifiers such as name, date of birth, NHS number, postcode. Red data is ‘personal confidential data’ and would only be made available when there was a legal requirement to do so, such as in an epidemic.
Amber data has huge importance in relation to planning and commissioning health care services, most of which currently rely on disjointed data. Care.data has the potential to show how patients move around the system, and better data on care pathways has enormous potential. This information is useful in showing how particular populations, such as those with chronic conditions, use the health service, and it might help the NHS devise services aimed at prevention as well as treatment. It would also show at what point patients enter the system. For instance, some patients are repeatedly admitted to hospital but rarely visit their GP. There have been concerns that data would be sold to private companies however this is not strictly true as the charge levied would represent the cost of data extraction. Serious concerns remain, however, about how this data may be used in the future: what exceptions there might be to breaches of data confidentiality (for example in relation to the criminal justice system) and about how data security could be regulated in practice.
Secondary data use
It is probably simpler to start with thinking of secondary data use on a smaller scale in healthcare research, before dealing with the implications of scale, ownership and security in the case of care.data. The cost of primary data collection in the health services is vast compared to the obvious savings from using data already collected. In addition, primary research can be intense and a burden on both patients and the public. (I did the research for my PhD in Tower Hamlets, and at one point I did wonder whether there were more social scientists wandering the streets than members of the populations we were allegedly studying). The use of care.data, i.e. data that is already routinely collected, seeks to maximise the potential of data which has already been collected rather than commissioning expensive new research. Crucially it is the reframing of patient data as a (commodified) ‘data set’ that is proving problematic for some. The potential problems of secondary analysis can be demonstrated the extreme case of Philip Morris International, the makers of Marlboro cigarettes: they put in an anonymous Freedom of Information request to the University of Stirling to see the raw data from a study composed of thousands of interviews about children’s attitudes to smoking. This illustrates how companies could gather data from other studies when, if they conducted such sensitive primary research themselves, they would get into trouble.
Introducing markets into the research agenda
A longer term risk, and one that is often missed when discussing data security, is that of the marketisation of the research agenda and the resulting influence over which public health issues should attract the most attention. Outside the NHS, care.data will also be made available to think tanks, universities, charities, and private companies, as if they were all equivalent entities with equivalent motivations. Dr. Geraint Lewis, NHS England’s Chief Data Officer, said that “we think it would be wrong to exclude private companies simply on ideological grounds; instead, the test should be how the company wants to use the data to improve NHS care”. The distinction between public and private goods is tricky in the NHS as the term public is not used consistently. Although the NHS is a public sector organisation it largely provides a private good i.e. individual healthcare. Public services however have to split their attention between increasing the quality of private goods (such as the individual quality of care) and public goods – such as issues of equity in the health services, creating healthy communities and controlling contagion. Whilst it is conceivable that care.data could be used relatively equally between public, private and third sector organisations to increase the quality of individual services and treatments, larger public goods like ensuring the equality of outcomes in healthcare should not move out of sight.
The lack of public ownership of NHS data sets carries with it the same risks as the commercialisation of academic research and teaching. NICE (National Institute for Health and Care Excellence) demonstrates the importance of an independent and public body to assess the effectiveness of particular treatments across the board, in the face of market relations and sector-based interests. A research agenda dominated by market relations may exacerbate existing public health inequalities. Despite assurances that decisions on the availability of data will be made on a case by case basis in relation to whether the organisation wants to improve patient care, there is the possibility that – structurally and not individually – market interests would distort this rich data set which could be used to address populations that often fall through the cracks. Public health research should be able to represent the needs of population affected by structural inequalities. Already particular conditions such as Alzheimer’s disease, with its associated expensive drug treatments and powerful campaigning lobby, draws in more NHS and research money and specialist services than research and service provision for older people with complex needs who may not fit easily into a diagnostic category. Likewise in mental health, huge amounts of money is focused on biomedical treatments and genetic research, whereas comparatively little is spent on improving people’s experience of mental health services, or tackling the complex problems that arise from poverty and discrimination – something that service user-led research into mental health repeatedly shows would make the biggest difference to people’s lives.
The marketisation of NHS data sets, even if individual organisations have good intentions, means that key public goods such as equality are at risk.
Laura Griffith is a lecturer at the Health Services Management Centre in Birmingham. Her research concerns the experience of health and illness, mental health, and inequalities in health – particularly ethnic inequalities. More recently her work has involved the application of qualitative research to health services improvement and health policy.