Going by what’s been playing out over the last couple of months, care.data definitely seems to be in the ’embarrassing stumble’ category. However, that other giant leap started out that way as well. The first US attempt to launch a satellite into space, on a Vanguard rocket, ended after four feet of ‘flight’ when the rocket caught fire and the satellite, blasted off the top, rolled behind some bushes. Moscow sent their condolences.
Pale blue dot – taken on Apollo 11 on the way to the moon. Picture courtesy of NASA
Despite that bumpy start, we’re celebrating the 45th anniversary of Neil Armstrong’s small step this week. It took a while for the US to really get into the space race, but they did manage to fly a man to the moon. The big data race has been going on for while now. While companies have realized it can be very lucrative to monetize the lust for information that was previously considered boring, governments have been late to the data party. Perhaps rightly so, seeing the current debate on whether government should be allowed to sell its citizen’s data, or use it for purposes other than the reason it was collected at all.
Care.data was to be England’s giant leap, not only to catch up with what has been going on the Scandinavian countries since the 70s, but to take the lead. Unfortunately, the communication on care.data has been abysmal. Much has been written about it already, so I won’t add to that ever growing canon. The debate about how to move forward, if at all, is only getting started though. Having an information source such as care.data would be an amazing impulse for the next couple of years of public health science, epidemiology, health informatics and a whole host of other disciplines, but we should look further than that.
To come back to the space race, NASA choose to make comprises in getting to the moon. As Kennedy had given them a deadline, they choose options that would get them there whitin that time, rather than options that could keep them going into deep space for longer. The results: a very successful Apollo programme, but a rather disappointing 45 years of flying around Earth to follow (despite the very, very cool space plane used for those flights).
The Health and Social Care Information Centre (HSCIC) have that same choice to make now: go for quick fixes (pseudonymisation at source* for instanrce) and get care.data rolled out soon, or think long term and build the trust and support systems that science can build on for decades to come. The problems surrounding care.data are already making it difficult for researchers to do their job: HSCIC is not giving out any data, pending an internal review of how they have been working and how they should work. This means studies that were funded and approved by ethics committees (and assigned deadlines) have be put on hold because there is no data to work with. This is particularly sour for people on short term contracts (like me at the moment) or students who are suddenly left without a project.
It would be great to have care.data as a data source for research. But I’m also just starting out as a researcher, and I although I would like to fly to the moon, I also want to go beyond and have an academic career rather than one shining moment. As Michael Collins**, the man who circled the moon while Aldrin and Armstrong landed on it, said: “Man has always gone where he has been able to go, it is a basic satisfaction of his inquisitive nature, and I think we all lose a little bit if we choose to turn our back on further exploration.” So let’s get working on making that giant leap, but make sure we don’t lose sight of where we may want to land in the future.
*Pseudonymisation at source would turn any identifiable data (e.g. a date of birth) into a string of letters and numbers that look like nonsense (this string is called a ‘hash’). There’s no way to get back to the original date of birth. Different organisations (GP practices, hospitals etc) would use the same programme and same key to create the same hash, so HSCIC can still link records together. Problem solved? Not so much. Although no identifiable data would leave the practice, the records can still be linked longitudinally. This means that if you would know that a male of a certain age was admitted on a specific date to a specific hospital with a heart condition, you’ll still have a good chance of finding him. A bigger problem would be that either everyone involved in administrative data would have to use the same programme, same key, have data in the exact same format (dates of birth that are saved as 09-01-87 and 09/01/1987 would turn into a different hashes, and you wouldn’t be able to recognise them as being the same) and not make any typos. This is a big limitation and would severely limit the chances of linking data to other sources.
**His autobiography, Carrying the Fire, is amazing. I’d recommend it for some summer reading.