Header picture by Ken Teegardin
Archives (I plan to have some soon)
Trying to make sense of nonsense
Going by what’s been playing out over the last couple of months, care.data definitely seems to be in the ’embarrassing stumble’ category. However, that other giant leap started out that way as well. The first US attempt to launch a satellite into space, on a Vanguard rocket, ended after four feet of ‘flight’ when the rocket caught fire and the satellite, blasted off the top, rolled behind some bushes. Moscow sent their condolences.
Despite that bumpy start, we’re celebrating the 45th anniversary of Neil Armstrong’s small step this week. It took a while for the US to really get into the space race, but they did manage to fly a man to the moon. The big data race has been going on for while now. While companies have realized it can be very lucrative to monetize the lust for information that was previously considered boring, governments have been late to the data party. Perhaps rightly so, seeing the current debate on whether government should be allowed to sell its citizen’s data, or use it for purposes other than the reason it was collected at all.
Care.data was to be England’s giant leap, not only to catch up with what has been going on the Scandinavian countries since the 70s, but to take the lead. Unfortunately, the communication on care.data has been abysmal. Much has been written about it already, so I won’t add to that ever growing canon. The debate about how to move forward, if at all, is only getting started though. Having an information source such as care.data would be an amazing impulse for the next couple of years of public health science, epidemiology, health informatics and a whole host of other disciplines, but we should look further than that.
To come back to the space race, NASA choose to make comprises in getting to the moon. As Kennedy had given them a deadline, they choose options that would get them there whitin that time, rather than options that could keep them going into deep space for longer. The results: a very successful Apollo programme, but a rather disappointing 45 years of flying around Earth to follow (despite the very, very cool space plane used for those flights).
The Health and Social Care Information Centre (HSCIC) have that same choice to make now: go for quick fixes (pseudonymisation at source* for instanrce) and get care.data rolled out soon, or think long term and build the trust and support systems that science can build on for decades to come. The problems surrounding care.data are already making it difficult for researchers to do their job: HSCIC is not giving out any data, pending an internal review of how they have been working and how they should work. This means studies that were funded and approved by ethics committees (and assigned deadlines) have be put on hold because there is no data to work with. This is particularly sour for people on short term contracts (like me at the moment) or students who are suddenly left without a project.
It would be great to have care.data as a data source for research. But I’m also just starting out as a researcher, and I although I would like to fly to the moon, I also want to go beyond and have an academic career rather than one shining moment. As Michael Collins**, the man who circled the moon while Aldrin and Armstrong landed on it, said: “Man has always gone where he has been able to go, it is a basic satisfaction of his inquisitive nature, and I think we all lose a little bit if we choose to turn our back on further exploration.” So let’s get working on making that giant leap, but make sure we don’t lose sight of where we may want to land in the future.
*Pseudonymisation at source would turn any identifiable data (e.g. a date of birth) into a string of letters and numbers that look like nonsense (this string is called a ‘hash’). There’s no way to get back to the original date of birth. Different organisations (GP practices, hospitals etc) would use the same programme and same key to create the same hash, so HSCIC can still link records together. Problem solved? Not so much. Although no identifiable data would leave the practice, the records can still be linked longitudinally. This means that if you would know that a male of a certain age was admitted on a specific date to a specific hospital with a heart condition, you’ll still have a good chance of finding him. A bigger problem would be that either everyone involved in administrative data would have to use the same programme, same key, have data in the exact same format (dates of birth that are saved as 09-01-87 and 09/01/1987 would turn into a different hashes, and you wouldn’t be able to recognise them as being the same) and not make any typos. This is a big limitation and would severely limit the chances of linking data to other sources.
**His autobiography, Carrying the Fire, is amazing. I’d recommend it for some summer reading.
If you’ve been following the Olympics in Sochi a bit (like I have), you might have seen the occasional person in orange fly by. Us Dutchies are not particularly well equipped when it comes to the winter Olympics, but we usually win a couple of medals in our favourite sport: long track ice speed skating. Sure, we’ll lose out on the occasional medal because some American inline skater decided their sport wasn’t going to turn Olympic any time soon, but we win our fair share. Until Sochi.
With our previous record being a total of 11 medals at the Nagano winter games in 1998 (all speed skating), the aim for the 2014 Games was to win about 9 medals. We’ve won a spectacular 22 so far, all but one in our favourite sport: speed skating (and the one not in speed skating, we won in short track speed skating). We’ve done so well, we’ve won almost 75% of speed skating medals:
We’re doing so well, that we’ve even managed to make it to the top 10 of the overall medal table*:
As a small country (we like to emphasise that we’re small when we lose, and even more when we win), we’re doing pretty well for ourselves. So you might think our skaters would be happy with their medals. Well not all of them (prepare for Tumblr style end of post).
*As of 19/02 – will update results at the end of the games
There are a lot of scary things to face when doing a PhD: supervisor’s ideas of ‘normal’ working hours, reviewers whose sole aim in life is to reject as many papers as possible, or the experimental equipment that only works when the right amount of blu-tack is in the right place and you karate chop the on-button. But possibly the scariest of all is the journalist.
This is why sense about science has set up their Standing up for Science media workshop: a one day workshop, specifically for early career scientist that gives a bit of insight into how science gets translated into news. It’s a great workshop that combines a session of scientists talking about successful (and less successful) experiences with journalists, with a session of journalists talking about what they actually do during their busy days. But most of all, it gets us early career scientists away from our lab benches for a day to talk about why we think it is so scary in the first place.
Most of us grad students (and scientists in general) are funded by public money, so it is a reasonable expectation that we try to feed our results back to the public. That’s easier said than done though. As scientists, we spend a lot of time getting the right results, and even more so, getting them just right on paper. Even though a scientific article might be only 3,000 words, it has to represents years of blood, sweat and tears.
So it might be understandable that we can be a bit hesitant when we have to hand this over to a journalist not familiar with our particular brand of science. We’ll just have to stand by while they condense it into a catchy headline and accompanying article that is often shorter than any summary we could write ourselves. Everyone knows someone for whom this has gone horribly wrong. Stories are abound about how a basic science paper on cells in a petri dish ended up promising to have found the cure-all pill for cancer, or how bacon is apparently responsible for doubling our (already 100%) risk of death.
It’s great to hear from experienced people like Dr Deirde Hollingsworth and Prof Stephen Keevil that talking to media gets easier after a while, and that the mess-ups are rarely remembered by anyone but yourself. Even talking to a news outlet with a reputation like Fox News can be a good experience, according to Dr Emily So who talked to them live on air after the Fukushima earthquake and tsunami.
In the Q&A session afterwards, Dr Hollingsworth advises us not be afraid of silence (unless you’re The Doctor, in which case you’re right to be afraid). It’s up to the journalist to ask questions, and if you try to fill the void you might end up saying things you didn’t intend to.
The journalist session is equally enlightening. Jason Palmer (BBC), Richard Van Noorden (Nature) and Jane Symons (former health editor at the Sun) assure us they’re not out to get us: they want to get the science right as much as we do. However, they do have a product to sell and a deadline to make (not to mention a mythical sleepy granny to keep awake), so it would be helpful to them if we do pick up the phone when they call. If we don’t, they might go for someone even less qualified to answer their question.
Helpfully, sense about science has provided a booklet with some easy tips (and even a checklist) on talking to the media. Sense about science are organising the Standing up for Science workshop again in September (London) and in November (Glasgow).
Happy international women’s day! If last week’s article proved anything, it’s that there are lot of extraordinary statisticians who also happen to be women out there (keep those names coming!). So what better way to celebrate today than with the first lady of statistics?
Gertrude Cox didn’t intend to become a statistician. After graduating from high school in 1918, she decided she wanted to be a deaconess in the Methodist Episcopal Church. Thinking that some knowledge of psychology and craft could be useful in her chosen career, she enrolled at Iowa State University to study these subjects. However, she chose to major in mathematics as that subject had come natural to her in high school. In order to pay her college expenses, she landed a job in the computing lab of her calculus professor, George Snedecor. Encouraged by this experience, she went on to study statistics, receiving Iowa State’s first Master’s degree in statistics a couple of years later.
Read more at Significance
We read about statistics every day: be it the predicted winner of a football league, the association between the weather and mortality, or a newly discovered link between an inanimate object and cancer. Statistics are everywhere. And perhaps even more so this year, as 2013 has been hailed as the International Year of Statistics. Despite all this attention for numbers, we generally don’t know a lot about the people hiding behind their computers churning them out. With media attention for people like Nate Silver and Hans Rosling, some are now able to name at least one statistician, but, stepping it up a level, could you name a female statistician?
Statistics is definitely not the only branch of STEM subjects suffering from a lack of distinguished women. Just take a look at the list of Nobel Prize winners (44 out of 839 Laureates), fellows of the Royal Society (currently 5%), or scientists on television. This is not due to a lack of women in statistics, there are many. So with this being the year of statistics, I thought it might be the perfect timing to highlight some of the women who work(ed) in statistics.
Dr Janet Lane-Claypon: epidemiologic pioneer
Dr Janet made quite a few important contributions to epidemiology by using and improving its use of statistics. Born in 1877 in Lincolnshire, she moved to London to study physiology at the London School of Medicine for Women (today part of UCL). She spent a few years there collecting an impressive list of titles: a BSc, DSc and MD, making her one of the first, irrespective of gender, Doctor-doctor’s.
All very exciting of course, but what has she got to do with statistics? Her run-in with statistics started in 1912, when she published a Report to the Local Government Board upon the Available Data in Regard to the Value of Boiled Milk as a Food for Infants and Young Animals. It’s an impressive report (available at the British Library, in case you’d like to leaf through it on a rainy Saturday afternoon), and the first of its kind. In it, Lane-Claypon compares the weights of infants fed on breast and cows’ milk, to find whether the type of milk had an effect on how fast babies grew. To answer this question, she used, for the very first time, a retrospective cohort study, description of confounding, and the t-test.
Before she started she study, Dr Janet realised she would need a large number of healthy babies who had been fed cows’ milk and a similar number of babies on breast milk. More importantly, she realised that in order to compare the two groups, she would need the babies to be “as far as possible” from the same social environments. She ended up travelling to Berlin, where babies from the working classes regularly attended Infant Consultations where their diet and weight was registered, resulting in the perfect dataset to answer her question.
This visit resulted in data on just over 500 infants making up the first retrospective cohort study (many others would follow, but not till some 30 years later), which was ready to be analysed. However, although all babies came from working class parents, Dr Janet realised that their social environments could still be slightly different, leading to different rates of weight gain between the groups. She explains:
“It does not, however, necessarily follow that the difference of food has been the causative factor, and it becomes necessary to ask whether there can be any other factor at work which is producing the difference found. The social class of the children seemed a possible factor, and it was considered advisable to investigate the possible significance of any difference which existed between the social conditions of the homes.”
Dr Janet compared the wages from the fathers of the infants, for the first time taking confounding into account, and found that they looked the same for the two groups. Still not satisfied whether the difference she had found between breast and cows’ milk fed children was real, she decided to use a new complicated technique that had been published 4 years earlier, but hadn’t been used in epidemiology up till then: Student’s t-test. Chances are that you’ve heard about this test, as it is now one of the most commonly used tests in any branch of science. Although it was developed to monitor the quality of stout by W.S. Gosset, Janet Lane-Claypon was the first to use it in medical statistics.
Dr Janet’s pioneering didn’t stop there. She went on to conduct the first ever case-control study in 1926, which possibly used the first ever questionnaire to gather health data (so think about her next time you see a pop-up window/email asking if you’ve got a few minutes to spare) on the causes of breast cancer. Her results were used by two other famous statisticians: Nathan Mantel and William Haenszel. They developed the Mantel-Haenszel test to adjust results for confounding. Her findings included most of the currently recognised risk factors for breast cancer. Dr Janet continued to work till 1929, when she had to retire at 52 due to the silly reason that married women weren’t allowed to work in the civil service.
Some further reading on Dr Janet:
Lane-Claypon JE. Report to the Local Government Board upon the Available Data in Regard to the Value of Boiled Milk as a Food for Infants and Young Animals. 1912
Lane-Claypon JE. A Further Report on Cancer of the Breast with Special Reference to its Associated Antecedent Conditions. Reports on Public Health and Medical Subjects. 1926
Winkelstein W. Vignettes of the history of epidemiology: Three firsts by Janet Elizabeth Lane-Claypon. American Journal of Epidemiology 2004;160(2)97
Winkelstein W. Janet Elizabeth Lane-Claypon: a forgotten epidemiologic pioneer. Epidemiology 2006;17(6)705
So last week a pretty interesting looking study appeared in the BMJ. With a title as ‘Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study‘ (and breathe…) I wouldn’t be too surprised if many people just skipped over it. Nevertheless, it has some pretty interesting results.
But first we’ll go on a journey back in time to 1997. That year, the BMJ dedicated an entire issue to the topic of meta-epidemiology. Specifically, it looked at meta-analyses, the branch of epidemiology that combines the results from all relevant studies to try to come to some form of agreement on a particular question. Meta-analyses are regarded as the highest form of evidence, being able to pool all available evidence into a final answer.
However, it turned out that this form of analysis wasn’t as infallible as some liked to believe. There was a problem we had been trying to ignore: publication bias. Studies with interesting results and large effect sizes were more likely to be published than studies that didn’t find anything. While these ‘negative trials’ gathered dust in researchers’ drawers, the people meta-analysing studies were lulled into thinking that the treatments they were evaluating were more effective than they actually were.
These results had a big impact on the way meta-analyses were viewed and performed, bringing publication bias and the importance of unpublished studies to the fore. This new study tries to shine a similar light on how we try to assess whether a new treatment works.
As the title of the study suggest, it’s looking at the difference between surrogate and final patient relevant outcomes. While patient relevant outcomes (such as, does this pill I’m taking for heart disease actually make me live longer, or does it lower my chance of a heart attack?) are what we’re really interested in, often trials will look at surrogate outcomes. For instance, while statins are prescribed to lower the chance of heart disease (which could require years of following very large groups of patients), trials often measure whether they lower cholesterol (which requires a couple of months) as we know this is related to future heart disease.
Looking at surrogate or intermediate outcomes makes trials shorter, smaller, and importantly, a lot cheaper. Instead of having to wait ten years to find out whether a drug has an effect, we can find out in a year. With the budget for health research getting ever smaller, it would be great if we could exchange patient relevant outcomes for equally valid surrogate outcomes. Whether that is possible is exactly what this new study is researching.
The researchers compared 84 trials using surrogate outcomes with 101 patient relevant trials published in six of the highest rates medical journals in 2005 and 2006. They found that trials using surrogate outcomes tend to find larger treatment effects: the drugs tested in these trials appeared to be about 47% more effective than trials using patient relevant outcomes. This was true over all the fields of epidemiological research they included, and couldn’t explained by any of the factors explored such as whether the size of the trial or whether it was funded by Big Pharma.
So why does this matter? Although trials using either type of outcome found different effect sizes, they still came to the same overall conclusion: either the drug worked or it didn’t. Other studies have found the same for other drugs that got licensed based on (mainly) data on surrogate outcomes. Unfortunately, the opposite has also happened. A drug for non-small cell lung cancer (a particular type of lung cancer), Gefitinib, was licensed by the FDA based on surrogate outcomes. When the data on patient relevant outcomes became available (whether the drug makes people live longer in this case), it turned out that it didn’t work.
As the paper concludes, policy makers and regulators should be cautious when the only data available on a new drug is on surrogate outcomes, as it could turn out that the drug they’re trying to evaluate is a lot less effective than the research seems to imply. And in rare cases, it might even not work at all.
With only a couple of days left until presents are expected to magically appear under trees, here are a few (affordable) suggestions for gifts for that special epidemiologist in your life.
Naturally, you could get him/her a John Snow mug (though beware true coffee/tea addicts: the mug is a bit on the small side), Florence Nightingale, or a brain-eating amoeba, or perhaps a cuddly, but evil Poisson distribution (oh, it promises to be discreet, but as soon as you say something negative it bails on you). There’s even some stuff if you want to be more traditional and go with jewellery: a π necklace for instance, or a necklace spelling out ‘I am star stuff’ in amino acids (the shop is closed at the moment unfortunately). And best of all, there’s the Sciencegrrl calendar – and tote bag, badges and memory sticks – which is pretty awesome and features epidemiology girl Soozaphone as April.
But hey, if you’re anything like me, you’re planning to spend the entire Christmas break reading on your parents’ couch, so here are my favourite three 2012 books vaguely related to epidemiology:
3. Ben Goldacre: Bad Pharma
A good book on an important topic, that happens to partially coincide with my PhD, so I’m probably a bit biased. It’s not a book to read in one go, if only because your blood will boil, and as the trials on blood pressure drugs are a bit dodgy, that might not be a good thing.
The title of the book might be a tad bit misleading though as Big Pharma isn’t inherently bad, we (regulators, academics, governments, patient groups, the public) just let them get away with it. Google, Amazon and Starbucks are ‘morally wrong’ in trying to pay the least possible amount of tax, but we don’t put the sole blame on them. The same principle goes for Big Pharma: we let them do it. Let’s change that.
So why only third place? Well, there happened to be two even more awesome books vaguely related to epidemiology published this year (that, and I can’t figure out the braille joke on the cover, which has been bugging me for weeks).
2. Jon Ronson: The Psychopath Test
A mystery package from Sweden arrives in an academic’s pigeonhole in London. There is no return address. Inside the package is a book of which every other page is blank, the pages with words on them have words cut out, and it is written by a ‘Joe K’. Intrigue follows: many academics all over the world, in distant corners such as Tibet and Iran, have received the exact same package. The London academic decides to enlist Jon Ronson to find out what’s going on and a journey into the madness industry follows.
The book might be a particularly good read with DSM-5 coming up in 2013. Psychology Today has a nice overview of everything that might be wrong with this new edition of the ‘bible of mental health disorders’ (calling it that for one). Perhaps everything will be all right and the new DSM will just create more psychiatric atheism among those wielding the power to diagnose, but with normal behaviour such as grieving longer than two weeks being classed as a mental disorder,
1. David Quammen: Spillover
“If you look at the world from the point of view of a hungry virus, or even a bacterium – we offer a magnificent feeding ground with all our billions of human bodies, where, in the very recent past, there were only half as many people. In some 25 or 27 years, we have doubled in number. A marvellous target for any organism that can adapt itself to invading us.” William H. McNeill – historian
I’ve grown up in a part of the world that was hit by epidemics almost every other year, or so it seemed at the time. It was horrible. Going to school every morning and not knowing who would be victimised next. Luckily, they weren’t epidemics affecting humans, but livestock. We had classical swine fever in 1997/98, foot-and-mouth disease in 2001, and blue tongue in 2006/07. During those epidemics, the farms of my school friends would be hit one by one. They had to stand by as professionals came in to kill off thousands of animals which they loved and were their families’ only source of income. Later that same day it would all be repeated on the TV during the eight o’clock news, and the next day the trucks would pull up at their neighbours’. It was hard. And it became even harder after 2007, when it turned out that one of those epidemics, Q-fever, was affecting humans.
When we think about where the Next Big One might come from, a rural village in the Netherlands doesn’t tend to be high on the list. Nevertheless, it features in ‘Spillover’ as one of the places where a spillover, the transmission of an infectious disease from animal to human, happened recently. The Dutch story might not be as thrilling as capturing bats potentially infected with a deadly virus (Marburg), tailing gorillas who could be the host of the elusive Ebola virus or tracking down stories on the origins of SARS, HIV and Nipah. The latter, though relatively unknown, caused an outbreak in Malaysia when it spread from fruit bats, via pigs to humans. A million pigs had to be killed. “There’s no easy way to kill a million pigs,” notes Dr Hume Field, one of the experts followed by David Quammen in the book. Later he corrects himself: It was in fact 1.1 million pigs. The difference might seem like just a rounding error, he tells Quammen, but if you ever had to kill an “extra” hundred thousand pigs and dispose of their bodies in bulldozed pits, you’d remember the difference as significant.
Spillover is, without doubt, the most intriguing book I’ve read all year.
*But perfectly timed for my birthday in January 😉
Sport is a man’s world. At least that’s the impression I get when I watch any. Reporters are (mostly) men, reporting on (mostly) men, except where beach volleyball is concerned, and then it’s still seemingly just for men to look at. Not surprising then, the Olympics were a breath of fresh air this summer. Everyone cheered when Jess Ennis finally won that gold medal, when Lizzie Armitstead was the first Brit to step onto the victory stage, and when the aquatics centre exploded after Ellie Simmonds made it to the finish first. After all those Olympic success stories, you might expect women to get a bit more attention on the BBC’s sports pages*. However, as this tweet in @EverydaySexism’s timeline made clear, it doesn’t seem to have happened. Out of three months of sport’s coverage highlighted on the BBC’s Facebook page, only 5.4% of posts covered women’s sports. Abysmal seems an appropriate description here.
So how does the sports coverage measure up? The BBC Sport’s Facebook page highlights only some of its sports coverage, so there is a chance that women are covered, but they just don’t make it onto the Facebook page. Unfortunately, the Beeb isn’t very good at archiving their sports material, so apart from collecting data from Facebook, there doesn’t seem to an easy way to retrospectively see how often they covered women in sport**. Luckily, the Dutch public broadcaster – the NOS – does archive all their sports coverage by date on their website. And to make this international comparison complete, I also added a bit of Belgian (or rather: Flemish – my French isn’t what it used to be) sports coverage by adding their public broadcaster’s sport Facebook page (Sporza). For all three broadcasters, I gathered all their (highlights of) sport coverage for the whole of November.
So how do these three countries’ public broadcasters compare on reporting on men and in sport? Well, not very good. The BBC had no posts relating to women’s sport at all, while the Belgians only specifically covered women in sport once: a story on professional female cyclists appearing in a ‘sensual’ calendar. The Dutch, who contributed all stories not just highlights, score ever so slightly better. Though with only 11% of articles covering women, it’s hardly an improvement.
But hey, what about confounders, things that might mess this quick analysis up? Maybe there’s one particular sport (let’s call it football) that’s skewing these results. Women’s football is famous for its lack of coverage, so maybe these nations’ football obsession is partly to blame for the lack of coverage of women in sports? Well, football does make up the majority of posts and articles, especially when looking at Facebook highlights.
So what happens if we just ignore football (something we ought to do more often)? Well, nothing much really. Women are still massively underrepresented on the sports pages, with 1 in 5 articles focussing on them at best, and even fewer articles mentioning both men and women.
What about those other sports? The UK and Belgian Facebook highlights didn’t include too many sports besides football or other male-dominated sports such as Formula 1. The Dutch broadcaster did include a lot of other sports though, so surely there must be some sports where reporting is more equal?
And yes: there seems to be some good news: for skiing, swimming and long-track ice speed skating*** gender balance looks a lot more promising with generally equal coverage. There are also some more disappointments though with cycling and tennis, which I expected to be more balanced.
Especially for cycling I personally expected better. Mostly because coverage tends to be biased towards people or teams who win, and the current women’s world and Olympic road cycling champion happens to be Dutch. The men in Dutch cycling on the other hand, have hardly anything to boast about this year, apart from being mentioned in relation to Lance Armstrong just a bit too often. Still, only one single story covered women’s cycling.
Broadcasters aren’t the only ones to blame though. Especially in the case of road cycling, there just isn’t that much to broadcast. While the men will be plastered over the television every weekend from March till September, there are just two women’s races that tend to be covered live on Dutch television: the world championships and the Olympics. For almost all other classic road cycling races such as the Tour de France, Paris-Roubaix or Milan-San Remo, there are no races for women. The men race, amateurs often get a chance to go on the course, and there might even be a special race for under-23s, but not for women.
This seems to lead to a vicious circle: there are fewer events for female athletes, leading to less media coverage, which in turn makes women’s sport less interesting for sponsors who’d like some air time by plastering their logo onto some sporty people, resulting in less money to actually put on those events.
It has to change. Sports like skiing seem to make it work. Lindsey Vonn is so far ahead of the rest of the field that she has asked to compete with the men. She’d stand a fair chance. A mixed gender relay was introduced in swimming so men and women can compete together rather than having separate events. We loved seeing our women competing in the Olympics as much as we did the men. Now let’s get them back on their screens.
*And after the amazing Paralympics you might also expect some more attention for those athletes. However, the BBC Disability Sport page still seems to be stuck in September.
**Do let me know if I missed something obvious – the overwhelming onslaught of male pheromones and yellow banners screaming off the page may have messed up my brain.
***i.e. the greatest sport on earth: athletes reach speeds of up to 70km/h (43mph) while basically wearing nothing but razor sharp knives under their feet and a tight fitting body suit – no helmets or any other type of protection is used. Its popularity is sadly restricted to Holland and Holland alone – though we like it that way as we can scoop up lots of Olympic medals without anyone else noticing.
By predicting the outcome of the US elections correctly in 50 out of 50 states (after an already impressive 49/50 in the 2008 elections), Nate Silver of the NY Times’ FiveThirtyEight blog has managed to convince even the most sceptical data deniers of his prediction models. So much so that his perfect prediction started a twitter trend (#natesilverfacts) and led to him being labelled a witch. So how impressive was this feat really? Is Nate Silver really a wizard from the future aiming for world domination through the power of numbers? Let’s use some stats to assess his stats!
Let’s start by toning down Silver’s amazing feat of predicting the election outcomes in 50 separate states. In most US states, the outcome of the election didn’t need complex prediction models to come to a reliable estimate of the election outcome: some results, such as in the District of Columbia where over 90% of the population voted Obama, were uncontested. The same goes for other red Obama-voting states as California (59%), Hawaii (71%), Maryland (62%) or New York (63%) or blue Romney states as Oklahoma (67% voted GOP), Utah (73%), Alabama (61%) or Kansas (60%).
Only in swing states, that could go either way, Nate Silver would have needed his number crunching to decide on a future winner. If we go by the NY times’ numbers, only 7 states were a toss-up between the Democrats and Republicans: Colorado, Florida, Iowa, New Hampshire, Ohio, Virginia and Wisconsin. Treating those 7 states as coin tosses – each outcome has an equal 50% probability – we can test the hypothesis that Nate Silver is a witch, Hwitch, against the competing hypothesis that he is a completely non-magical human being, Hmuggle. If Nate is a witch, we assume he predicts each state’s election results correct, witches having a perfect knowledge of all future events. The probability for this happening is expressed in a fancy maths equation like this: p(7 right|Hwitch) – read the equation as: probability of Nate getting 7 right, given that he is a witch. The probability in this case is 100% or 1. But even if Nate is devoid of magical abilities, there is still a small chance he would guess all 7 election results correctly. We can calculate this probability: p(7 right|Hmuggle) = 1/27= 1/128. If we take the ratio of the two, 1/(1/128), it seems that is about 128 more likely that Nate is a witch than him being a muggle.
Whatever the truth is about Nate Silver, it appears he’s pulled off something pretty extraordinary. Unfortunately for him, he’s still one step removed from being the world’s best predictor as Paul the psychic octopus managed to correctly predict the outcomes of 8 football matches at the 2010 World Cup. World’s best human predictor will have to do for now then.
However, as with Paul, Nate wasn’t the only person making predictions. Paul only gained the street cred necessary to be taken seriously as a clairvoyant cephalopod after a bout of predicting Eurocup results (and getting one wrong), and the same could be said for Nate Silver. If he hadn’t pulled off a similar feat in the previous elections, no one would have paid much attention to his blog this time round. His 2008 prediction was perhaps even more impressive than his latest one: he might have missed Indiana, but got the results for the remaining 10 swing states right.
As polls get about the same amount of coverage (if not more) as the actual elections, there are a lot of people who try to pitch in. Let’s take a guess and say there were 50 people trying to predict the state-by-state 2008 election outcomes. Chances that at least someone would get at least 8 of the 11 swing states correct (assuming this would be the threshold to attract the attention of witch hunters) are 1-(255/256)50= 0.18 (for the reasoning behind this calculation, read David Spiegelhalter’s blog on the numbers behind Paul being a completely normal, if not slightly lucky, octopus). So there was an about 1 in 5 chance of at least someone coming up with some remarkably correct predictions.
So we now know that frequentist statisticians would label Silver as a witch, but what about much cooler Bayesians? (no bias at all here…) Bayesian statistics differ from frequentist statistics in that it takes prior knowledge into account when putting a probability on an event. Or : Bayesian statistics is probably a cool branch of stats, but if you know XKCD thinks so too, it’s suddenly a lot more probable to be true (the coolness of a specific branch of statistics is conditional on XKCD endorsement).
To calculate the posterior probability of Nate Silver being a witch, we need to know a few things:
Now that we know all this we can fill out the formula for calculating posterior probability:
That’s pretty slim, though at 5%, we can’t be sure he isn’t a witch. However, going back to the 2008 elections, there were already some suspicions of Nate Silver’s potential Wiccan background. If we start with the 0.18 probability we arrived at earlier, the posterior probability of Nate Silver being a witch rises to 0.96 or 96%. So yes, Nate Silver is probably a witch. Alternatively, you could of course exchange ‘witch’ with ‘statistician’ and conclude with 96% confidence that he’s just very good at his job.