Trying to make sense of nonsense

Dutch Dominance & Sankey Diagrams

I haven’t been writing a lot lately – partly because I have been busy finishing up my PhD thesis, partly because writing said thesis has drained all the joy from writing for me (hopefully temporarily). So instead, I’ve been tinkering with data and graphs, which led me to discover Sankey Diagrams earlier this week. They’re pretty neat, and easy to put together in Excel with the spreadsheet over on Excel Liberation. Unfortunately, they use some Javascript that WordPress blogs don’t allow, so for now, I’ve just taken some screen shots of the diagrams I’ve been putting together.

If you’ve been following the Olympics in Sochi a bit (like I have), you might have seen the occasional person in orange fly by. Us Dutchies are not particularly well equipped when it comes to the winter Olympics, but we usually win a couple of medals in our favourite sport: long track ice speed skating. Sure, we’ll lose out on the occasional medal because some American inline skater decided their sport wasn’t going to turn Olympic any time soon, but we win our fair share. Until Sochi.

With our previous record being a total of 11 medals at the Nagano winter games in 1998 (all speed skating), the aim for the 2014 Games was to win about 9 medals. We’ve won a spectacular 22 so far, all but one in our favourite sport: speed skating (and the one not in speed skating, we won in short track speed skating). We’ve done so well, we’ve won almost 75% of speed skating medals:

Olympic medals - long track speedskating

Sochi Olympic medals in long track speedskating

We’re doing so well, that we’ve even managed to make it to the top 10 of the overall medal table*:

Sochi overall medal table

Overall Sochi medal table as of 19/02 showing medals won by each country, and sports they won them for (this was supposed to be a fancy Javascript enabled graph that would allow you to hover over the lines to see where they’re going, but alas, WordPress doesn’t allow Js scripts).

As a small country (we like to emphasise that we’re small when we lose, and even more when we win), we’re doing pretty well for ourselves. So you might think our skaters would be happy with their medals. Well not all of them (prepare for Tumblr style end of post).

Disappointed Dutch Olympic silver medallists:

Ireen Wüst – didn’t manage to defend her 1,500 meter gold medal she won in Vancouver (though she has won a gold medal on the 3km, and two more silver medals on the 1,000 meters and 5km) – Picture by de Volkskrant

Sven Kramer (left) – did not win his favourite distance: the 10km. In Vancouver, he missed the gold medal after being disqualified for forgetting to change lanes. Won a further gold medal on the 5km (picture by het Parool).

Jan Smeekens – was 0.012 seconds removed from an Olympic Gold medal (picture by schaatsupdate.nl)

Koen Verweij – 0.003 seconds short of an Olympic gold medal on the 1,500 meters (picture by schaatsen.nl)

*As of 19/02 – will update results at the end of the games

Normal service will resume shortly

Standing up for Science: beating the silence

The Silence: not to be feared by doctors (only The Doctor) – picture courtesy of geekygirlnyc

There are a lot of scary things to face when doing a PhD: supervisor’s ideas of ‘normal’ working hours, reviewers whose sole aim in life is to reject as many papers as possible, or the experimental equipment that only works when the right amount of blu-tack is in the right place and you karate chop the on-button. But possibly the scariest of all is the journalist.

This is why sense about science has set up their Standing up for Science media workshop: a one day workshop, specifically for early career scientist that gives a bit of insight into how science gets translated into news. It’s a great workshop that combines a session of scientists talking about successful (and less successful) experiences with journalists, with a session of journalists talking about what they actually do during their busy days. But most of all, it gets us early career scientists away from our lab benches for a day to talk about why we think it is so scary in the first place.

Most of us grad students (and scientists in general) are funded by public money, so it is a reasonable expectation that we try to feed our results back to the public. That’s easier said than done though. As scientists, we spend a lot of time getting the right results, and even more so, getting them just right on paper. Even though a scientific article might be only 3,000 words, it has to represents years of blood, sweat and tears.

So it might be understandable that we can be a bit hesitant when we have to hand this over to a journalist not familiar with our particular brand of science. We’ll just have to stand by while they condense it into a catchy headline and accompanying article that is often shorter than any summary we could write ourselves. Everyone knows someone for whom this has gone horribly wrong. Stories are abound about how a basic science paper on cells in a petri dish ended up promising to have found the cure-all pill for cancer, or how bacon is apparently responsible for doubling our (already 100%) risk of death.

It’s great to hear from experienced people like Dr Deirde Hollingsworth and Prof Stephen Keevil that talking to media gets easier after a while, and that the mess-ups are rarely remembered by anyone but yourself. Even talking to a news outlet with a reputation like Fox News can be a good experience, according to Dr Emily So who talked to them live on air after the Fukushima earthquake and tsunami.

In the Q&A session afterwards, Dr Hollingsworth advises us not be afraid of silence (unless you’re The Doctor, in which case you’re right to be afraid). It’s up to the journalist to ask questions, and if you try to fill the void you might end up saying things you didn’t intend to.

The journalist session is equally enlightening. Jason Palmer (BBC), Richard Van Noorden (Nature) and Jane Symons (former health editor at the Sun) assure us they’re not out to get us: they want to get the science right as much as we do. However, they do have a product to sell and a deadline to make (not to mention a mythical sleepy granny to keep awake), so it would be helpful to them if we do pick up the phone when they call.  If we don’t, they might go for someone even less qualified to answer their question.

Helpfully, sense about science has provided a booklet with some easy tips (and even a checklist) on talking to the media. Sense about science are organising the Standing up for Science workshop again in September (London) and in November (Glasgow).

International Women’s Day: Gertrude Cox – the first lady of statistics

Happy international women’s day! If last week’s article proved anything, it’s that there are lot of extraordinary statisticians who also happen to be women out there (keep those names coming!). So what better way to celebrate today than with the first lady of statistics?

Gertrude Cox's advice for starting a career in statistics

Gertrude Cox’s advice for starting a career in statistics

Gertrude Cox didn’t intend to become a statistician. After graduating from high school in 1918, she decided she wanted to be a deaconess in the Methodist Episcopal Church. Thinking that some knowledge of psychology and craft could be useful in her chosen career, she enrolled at Iowa State University to study these subjects. However, she chose to major in mathematics as that subject had come natural to her in high school. In order to pay her college expenses, she landed a job in the computing lab of her calculus professor, George Snedecor. Encouraged by this experience, she went on to study statistics, receiving Iowa State’s first Master’s degree in statistics a couple of years later.

Read more at Significance

Can you name a female statistician?

We read about statistics every day: be it the predicted winner of a football league, the association between the weather and mortality, or a newly discovered link between an inanimate object and cancer. Statistics are everywhere. And perhaps even more so this year, as 2013 has been hailed as the International Year of Statistics. Despite all this attention for numbers, we generally don’t know a lot about the people hiding behind their computers churning them out. With media attention for people like Nate Silver and Hans Rosling, some are now able to name at least one statistician, but, stepping it up a level, could you name a female statistician?

Statistics is definitely not the only branch of STEM subjects suffering from a lack of distinguished women. Just take a look at the list of Nobel Prize winners (44 out of 839 Laureates), fellows of the Royal Society (currently 5%), or scientists on television. This is not due to a lack of women in statistics, there are many. So with this being the year of statistics, I thought it might be the perfect timing to highlight some of the women who work(ed) in statistics.

Dr Janet Lane-Claypon: epidemiologic pioneer

Dr Janet made quite a few important contributions to epidemiology by using and improving its use of statistics. Born in 1877 in Lincolnshire, she moved to London to study physiology at the London School of Medicine for Women (today part of UCL). She spent a few years there collecting an impressive list of titles: a BSc, DSc and MD, making her one of the first, irrespective of gender, Doctor-doctor’s.

Dr Janet on a board with distinguished London School of Medicine for Women students (currently in the hall of the UCL medical school at the Royal Free Hospital)

Dr Janet on a board with distinguished London School of Medicine for Women students (currently in the hall of the UCL medical school at the Royal Free Hospital). The scholarship she won mentioned on the board was the first ever MRC (British Medical Society at the time) scholarship awarded to a woman.

All very exciting of course, but what has she got to do with statistics? Her run-in with statistics started in 1912, when she published a Report to the Local Government Board upon the Available Data in Regard to the Value of Boiled Milk as a Food for Infants and Young Animals. It’s an impressive report (available at the British Library, in case you’d like to leaf through it on a rainy Saturday afternoon), and the first of its kind. In it, Lane-Claypon compares the weights of infants fed on breast and cows’ milk, to find whether the type of milk had an effect on how fast babies grew. To answer this question, she used, for the very first time, a retrospective cohort study, description of confounding, and the t-test.

Before she started she study, Dr Janet realised she would need a large number of healthy babies who had been fed cows’ milk and a similar number of babies on breast milk. More importantly, she realised that in order to compare the two groups, she would need the babies to be “as far as possible” from the same social environments. She ended up travelling to Berlin, where babies from the working classes regularly attended Infant Consultations where their diet and weight was registered, resulting in the perfect dataset to answer her question.

This visit resulted in data on just over 500 infants making up the first retrospective cohort study (many others would follow, but not till some 30 years later), which was ready to be analysed. However, although all babies came from working class parents, Dr Janet realised that their social environments could still be slightly different, leading to different rates of weight gain between the groups. She explains:

“It does not, however, necessarily follow that the difference of food has been the causative factor, and it becomes necessary to ask whether there can be any other factor at work which is producing the difference found. The social class of the children seemed a possible factor, and it was considered advisable to investigate the possible significance of any difference which existed between the social conditions of the homes.”

Dr Janet compared the wages from the fathers of the infants, for the first time taking confounding into account, and found that they looked the same for the two groups. Still not satisfied whether the difference she had found between breast and cows’ milk fed children was real, she decided to use a new complicated technique that had been published 4 years earlier, but hadn’t been used in epidemiology up till then: Student’s t-test. Chances are that you’ve heard about this test, as it is now one of the most commonly used tests in any branch of science. Although it was developed to monitor the quality of stout by W.S. Gosset, Janet Lane-Claypon was the first to use it in medical statistics.

Dr Janet Lane-Claypon, leading the way for stats in medicine 100 year ago

Dr Janet’s pioneering didn’t stop there. She went on to conduct the first ever case-control study in 1926, which possibly used the first ever questionnaire to gather health data (so think about her next time you see a pop-up window/email asking if you’ve got a few minutes to spare) on the causes of breast cancer. Her results were used by two other famous statisticians: Nathan Mantel and William Haenszel. They developed the Mantel-Haenszel test to adjust results for confounding. Her findings included most of the currently recognised risk factors for breast cancer. Dr Janet continued to work till 1929, when she had to retire at 52 due to the silly reason that married women weren’t allowed to work in the civil service.

Some further reading on Dr Janet:

Lane-Claypon JE. Report to the Local Government Board upon the Available Data in Regard to the Value of Boiled Milk as a Food for Infants and Young Animals. 1912

Lane-Claypon JE. A Further Report on Cancer of the Breast with Special Reference to its Associated Antecedent Conditions. Reports on Public Health and Medical Subjects. 1926

Winkelstein W. Vignettes of the history of epidemiology: Three firsts by Janet Elizabeth Lane-Claypon. American Journal of Epidemiology 2004;160(2)97

Winkelstein W. Janet Elizabeth Lane-Claypon: a forgotten epidemiologic pioneer. Epidemiology 2006;17(6)705

Meta-epidemiology: the science of taking a step back

So last week a pretty interesting looking study appeared in the BMJ. With a title as Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study‘ (and breathe…) I wouldn’t be too surprised if many people just skipped over it. Nevertheless, it has some pretty interesting results.

But first we’ll go on a journey back in time to 1997. That year, the BMJ dedicated an entire issue to the topic of meta-epidemiology. Specifically, it looked at meta-analyses, the branch of epidemiology that combines the results from all relevant studies to try to come to some form of agreement on a particular question. Meta-analyses are regarded as the highest form of evidence, being able to pool all available evidence into a final answer.

However, it turned out that this form of analysis wasn’t as infallible as some liked to believe. There was a problem we had been trying to ignore: publication bias. Studies with interesting results and large effect sizes were more likely to be published than studies that didn’t find anything. While these ‘negative trials’ gathered dust in researchers’ drawers, the people meta-analysing studies were lulled into thinking that the treatments they were evaluating were more effective than they actually were.

These results had a big impact on the way meta-analyses were viewed and performed, bringing publication bias and the importance of unpublished studies to the fore. This new study tries to shine a similar light on how we try to assess whether a new treatment works.

As the title of the study suggest, it’s looking at the difference between surrogate and final patient relevant outcomes. While patient relevant outcomes (such as, does this pill I’m taking for heart disease actually make me live longer, or does it lower my chance of a heart attack?) are what we’re really interested in, often trials will look at surrogate outcomes. For instance, while statins are prescribed to lower the chance of heart disease (which could require years of following very large groups of patients), trials often measure whether they lower cholesterol (which requires a couple of months) as we know this is related to future heart disease.

Looking at surrogate or intermediate outcomes makes trials shorter, smaller, and importantly, a lot cheaper. Instead of having to wait ten years to find out whether a drug has an effect, we can find out in a year. With the budget for health research getting ever smaller, it would be great if we could exchange patient relevant outcomes for equally valid surrogate outcomes. Whether that is possible is exactly what this new study is researching.

The researchers compared 84 trials using surrogate outcomes with 101 patient relevant trials published in six of the highest rates medical journals in 2005 and 2006. They found that trials using surrogate outcomes tend to find larger treatment effects: the drugs tested in these trials appeared to be about 47% more effective than trials using patient relevant outcomes. This was true over all the fields of epidemiological research they included, and couldn’t explained by any of the factors explored such as whether the size of the trial or whether it was funded by Big Pharma.

So why does this matter? Although trials using either type of outcome found different effect sizes, they still came to the same overall conclusion: either the drug worked or it didn’t. Other studies have found the same for other drugs that got licensed based on (mainly) data on surrogate outcomes. Unfortunately, the opposite has also happened. A drug for non-small cell lung cancer (a particular type of lung cancer), Gefitinib, was licensed by the FDA based on surrogate outcomes. When the data on patient relevant outcomes became available (whether the drug makes people live longer in this case), it turned out that it didn’t work.

As the paper concludes, policy makers and regulators should be cautious when the only data available on a new drug is on surrogate outcomes, as it could turn out that the drug they’re trying to evaluate is a lot less effective than the research seems to imply. And in rare cases, it might even not work at all.

Have yourself a merry epi-Christmas: gift ideas for epidemiologists (and possibly a few statisticians)

With only a couple of days left until presents are expected to magically appear under trees, here are a few (affordable) suggestions for gifts for that special epidemiologist in your life.

Naturally, you could get him/her a John Snow mug (though beware true coffee/tea addicts: the mug is a bit on the small side), Florence Nightingale, or a brain-eating amoeba, or perhaps a cuddly, but evil Poisson distribution (oh, it promises to be discreet, but as soon as you say something negative it bails on you). There’s even some stuff if you want to be more traditional and go with jewellery: a π necklace for instance, or a necklace spelling out ‘I am star stuff’ in amino acids (the shop is closed at the moment unfortunately).  And best of all, there’s the Sciencegrrl  calendar – and tote bag, badges and memory sticks – which is pretty awesome and features epidemiology girl Soozaphone as April.

Evil Poisson distribution, conniving its evil plans to discreetly take over the world of distributions

But hey, if you’re anything like me, you’re planning to spend the entire Christmas break reading on your parents’ couch, so here are my favourite three 2012 books vaguely related to epidemiology:

3. Ben Goldacre: Bad Pharma
A good book on an important topic, that happens to partially coincide with my PhD, so I’m probably a bit biased. It’s not a book to read in one go, if only because your blood will boil, and as the trials on blood pressure drugs are a bit dodgy, that might not be a good thing.
The title of the book might be a tad bit misleading though as Big Pharma isn’t inherently bad, we (regulators, academics, governments, patient groups, the public) just let them get away with it. Google, Amazon and Starbucks are ‘morally wrong’ in trying to pay the least possible amount of tax, but we don’t put the sole blame on them. The same principle goes for Big Pharma: we let them do it. Let’s change that.
So why only third place? Well, there happened to be two even more awesome books vaguely related to epidemiology published this year (that, and I can’t figure out the braille joke on the cover, which has been bugging me for weeks).

2. Jon Ronson: The Psychopath Test
A mystery package from Sweden arrives in an academic’s pigeonhole in London. There is no return address. Inside the package is a book of which every other page is blank, the pages with words on them have words cut out, and it is written by a ‘Joe K’. Intrigue follows: many academics all over the world, in distant corners such as Tibet and Iran, have received the exact same package. The London academic decides to enlist Jon Ronson to find out what’s going on and a journey into the madness industry follows.

The book might be a particularly good read with DSM-5 coming up in 2013. Psychology Today has a nice overview of everything that might be wrong with this new edition of the ‘bible of mental health disorders’ (calling it that for one). Perhaps everything will be all right and the new DSM will just create more psychiatric atheism among those wielding the power to diagnose, but with normal behaviour such as grieving longer than two weeks being classed as a mental disorder,

1. David Quammen: Spillover

@david_dobbs’ copy. Mine pretty much looks to the same though less colourful as I just dog eared the whole book

“If you look at the world from the point of view of a hungry virus, or even a bacterium – we offer a magnificent feeding ground with all our billions of human bodies, where, in the very recent past, there were only half as many people. In some 25 or 27 years, we have doubled in number. A marvellous target for any organism that can adapt itself to invading us.” William H. McNeill – historian

I’ve grown up in a part of the world that was hit by epidemics almost every other year, or so it seemed at the time. It was horrible. Going to school every morning and not knowing who would be victimised next. Luckily, they weren’t epidemics affecting humans, but livestock. We had classical swine fever in 1997/98, foot-and-mouth disease in 2001, and blue tongue in 2006/07. During those epidemics, the farms of my school friends would be hit one by one. They had to stand by as professionals came in to kill off thousands of animals which they loved and were their families’ only source of income. Later that same day it would all be repeated on the TV during the eight o’clock news, and the next day the trucks would pull up at their neighbours’. It was hard. And it became even harder after 2007, when it turned out that one of those epidemics, Q-fever, was affecting humans.
When we think about where the Next Big One might come from, a rural village in the Netherlands doesn’t tend to be high on the list. Nevertheless, it features in ‘Spillover’ as one of the places where a spillover, the transmission of an infectious disease from animal to human, happened recently. The Dutch story might not be as thrilling as capturing bats potentially infected with a deadly virus (Marburg), tailing gorillas who could be the host of the elusive Ebola virus or tracking down stories on the origins of SARS, HIV and Nipah. The latter, though relatively unknown, caused an outbreak in Malaysia when it spread from fruit bats, via pigs to humans. A million pigs had to be killed. “There’s no easy way to kill a million pigs,” notes Dr Hume Field, one of the experts followed by David Quammen in the book. Later he corrects himself: It was in fact 1.1 million pigs. The difference might seem like just a rounding error, he tells Quammen, but if you ever had to kill an “extra” hundred thousand pigs and dispose of their bodies in bulldozed pits, you’d remember the difference as significant.
Spillover is, without doubt, the most intriguing book I’ve read all year.

A Sciencegrrl Christmas!

A Sciencegrrl Christmas!

*But perfectly timed for my birthday in January ;)

It’s a man’s world: gender imbalance in sports reporting

Sport is a man’s world. At least that’s the impression I get when I watch any. Reporters are (mostly) men, reporting on (mostly) men, except where beach volleyball is concerned, and then it’s still seemingly just for men to look at. Not surprising then, the Olympics were a breath of fresh air this summer. Everyone cheered when Jess Ennis finally won that gold medal, when Lizzie Armitstead was the first Brit to step onto the victory stage, and when the aquatics centre exploded after Ellie Simmonds made it to the finish first. After all those Olympic success stories, you might expect women to get a bit more attention on the BBC’s sports pages*.  However, as this tweet in @EverydaySexism’s timeline made clear, it doesn’t seem to have happened. Out of three months of sport’s coverage highlighted on the BBC’s Facebook page, only 5.4% of posts covered women’s sports. Abysmal seems an appropriate description here.

So how does the sports coverage measure up? The BBC Sport’s Facebook page highlights only some of its sports coverage, so there is a chance that women are covered, but they just don’t make it onto the Facebook page. Unfortunately, the Beeb isn’t very good at archiving their sports material, so apart from collecting data from Facebook, there doesn’t seem to an easy way to retrospectively see how often they covered women in sport**. Luckily, the Dutch public broadcaster – the NOS – does archive all their sports coverage by date on their website. And to make this international comparison complete, I also added a bit of Belgian (or rather: Flemish – my French isn’t what it used to be) sports coverage by adding their public broadcaster’s sport Facebook page (Sporza). For all three broadcasters, I gathered all their (highlights of) sport coverage for the whole of November.

So how do these three countries’ public broadcasters compare on reporting on men and in sport? Well, not very good. The BBC had no posts relating to women’s sport at all, while the Belgians only specifically covered women in sport once: a story on professional female cyclists appearing in a ‘sensual’ calendar. The Dutch, who contributed all stories not just highlights, score ever so slightly better. Though with only 11% of articles covering women, it’s hardly an improvement.

Public broadcasters' articles on sports

Public broadcasters’ articles on sports

But hey, what about confounders, things that might mess this quick analysis up? Maybe there’s one particular sport (let’s call it football) that’s skewing these results. Women’s football is famous for its lack of coverage, so maybe these nations’ football obsession is partly to blame for the lack of coverage of women in sports? Well, football does make up the majority of posts and articles, especially when looking at Facebook highlights.

Sports reporting - sports by country

So what happens if we just ignore football (something we ought to do more often)? Well, nothing much really. Women are still massively underrepresented on the sports pages, with 1 in 5 articles focussing on them at best, and even fewer articles mentioning both men and women.

Ignoring football, there is still massive gender bias

Ignoring football, there is still massive gender bias

What about those other sports? The UK and Belgian Facebook highlights didn’t include too many sports besides football or other male-dominated sports such as Formula 1. The Dutch broadcaster did include a lot of other sports though, so surely there must be some sports where reporting is more equal?

And yes: there seems to be some good news: for skiing, swimming and long-track ice speed skating*** gender balance looks a lot more promising with generally equal coverage. There are also some more disappointments though with cycling and tennis, which I expected to be more balanced.

Coverage of men and women by sport (all the more reason to like long track ice speed skating!)

Coverage of men and women by sport (all the more reason to like long track ice speed skating!)

Especially for cycling I personally expected better. Mostly because coverage tends to be biased towards people or teams who win, and the current women’s world and Olympic road cycling champion happens to be Dutch. The men in Dutch cycling on the other hand, have hardly anything to boast about this year, apart from being mentioned in relation to Lance Armstrong just a bit too often. Still, only one single story covered women’s cycling.

Broadcasters aren’t the only ones to blame though. Especially in the case of road cycling, there just isn’t that much to broadcast. While the men will be plastered over the television every weekend from March till September, there are just two women’s races that tend to be covered live on Dutch television: the world championships and the Olympics. For almost all other classic road cycling races such as the Tour de France, Paris-Roubaix or Milan-San Remo, there are no races for women. The men race, amateurs often get a chance to go on the course, and there might even be a special race for under-23s, but not for women.

This seems to lead to a vicious circle: there are fewer events for female athletes, leading to less media coverage, which in turn makes women’s sport less interesting for sponsors who’d like some air time by plastering their logo onto some sporty people, resulting in less money to actually put on those events.

It has to change. Sports like skiing seem to make it work. Lindsey Vonn is so far ahead of the rest of the field that she has asked to compete with the men. She’d stand a fair chance. A mixed gender relay was introduced in swimming so men and women can compete together rather than having separate events. We loved seeing our women competing in the Olympics as much as we did the men. Now let’s get them back on their screens.


*And after the amazing Paralympics you might also expect some more attention for those athletes. However, the BBC Disability Sport page still seems to be stuck in September.

**Do let me know if I missed something obvious – the overwhelming onslaught of male pheromones and yellow banners screaming off the page may have messed up my brain.

***i.e. the greatest sport on earth: athletes reach speeds of up to 70km/h (43mph) while basically wearing nothing but razor sharp knives under their feet and a tight fitting body suit – no helmets or any other type of protection is used. Its popularity is sadly restricted to Holland and Holland alone – though we like it that way as we can scoop up lots of Olympic medals without anyone else noticing.

Is Nate Silver a witch?

Tentative evidence of how Nate Silver was able to make a perfect prediction (image via TechCrunch)

By predicting the outcome of the US elections correctly in 50 out of 50 states (after an already impressive 49/50 in the 2008 elections), Nate Silver of the NY Times’ FiveThirtyEight blog has managed to convince even the most sceptical data deniers of his prediction models. So much so that his perfect prediction started a twitter trend (#natesilverfacts) and led to him being labelled a witch. So how impressive was this feat really? Is Nate Silver really a wizard from the future aiming for world domination through the power of numbers? Let’s use some stats to assess his stats!

Let’s start by toning down Silver’s amazing feat of predicting the election outcomes in 50 separate states. In most US states, the outcome of the election didn’t need complex prediction models to come to a reliable estimate of the election outcome: some results, such as in the District of Columbia where over 90% of the population voted Obama, were uncontested. The same goes for other red Obama-voting states as California (59%), Hawaii (71%), Maryland (62%) or New York (63%) or blue Romney states as Oklahoma (67% voted GOP), Utah (73%), Alabama (61%) or Kansas (60%).

Only in swing states, that could go either way, Nate Silver would have needed his number crunching to decide on a future winner. If we go by the NY times’ numbers, only 7 states were a toss-up between the Democrats and Republicans: Colorado, Florida, Iowa, New Hampshire, Ohio, Virginia and Wisconsin. Treating those 7 states as coin tosses – each outcome has an equal 50% probability – we can test the hypothesis that Nate Silver is a witch, Hwitch, against the competing hypothesis that he is a completely non-magical human being, Hmuggle. If Nate is a witch, we assume he predicts each state’s election results correct, witches having a perfect knowledge of all future events. The probability for this happening is expressed in a fancy maths equation like this:  p(7 right|Hwitch) – read the equation as: probability of Nate getting 7 right, given that he is a witch. The probability in this case is 100% or 1. But even if Nate is devoid of magical abilities, there is still a small chance he would guess all 7 election results correctly. We can calculate this probability:  p(7 right|Hmuggle) = 1/27= 1/128.  If we take the ratio of the two, 1/(1/128), it seems that is about 128 more likely that Nate is a witch than him being a muggle.

Whatever the truth is about Nate Silver, it appears he’s pulled off something pretty extraordinary. Unfortunately for him, he’s still one step removed from being the world’s best predictor as Paul the psychic octopus managed to correctly predict the outcomes of 8 football matches at the 2010 World Cup. World’s best human predictor will have to do for now then.

However, as with Paul, Nate wasn’t the only person making predictions. Paul only gained the street cred necessary to be taken seriously as a clairvoyant cephalopod after a bout of predicting Eurocup results (and getting one wrong), and the same could be said for Nate Silver. If he hadn’t pulled off a similar feat in the previous elections, no one would have paid much attention to his blog this time round. His 2008 prediction was perhaps even more impressive than his latest one: he might have missed Indiana, but got the results for the remaining 10 swing states right.

As polls get about the same amount of coverage (if not more) as the actual elections, there are a lot of people who try to pitch in. Let’s take a guess and say there were 50 people trying to predict the state-by-state 2008 election outcomes. Chances that at least someone would get at least 8 of the 11 swing states correct (assuming this would be the threshold to attract the attention of witch hunters) are 1-(255/256)50= 0.18 (for the reasoning behind this calculation, read David Spiegelhalter’s blog on the numbers behind Paul being a completely normal, if not slightly lucky, octopus). So there was an about 1 in 5 chance of at least someone coming up with some remarkably correct predictions.

XKCD: Frequentists vs Bayesians

XKCD endorsement for Bayesian stats

So we now know that frequentist statisticians would label Silver as a witch, but what about much cooler Bayesians? (no bias at all here…) Bayesian statistics differ from frequentist statistics in that it takes prior knowledge into account when putting a probability on an event. Or : Bayesian statistics is probably a cool branch of stats, but if you know XKCD thinks so too, it’s suddenly a lot more probable to be true (the coolness of a specific branch of statistics is conditional on XKCD endorsement).

To calculate the posterior probability of Nate Silver being a witch, we need to know a few things:

  • p(W), or the prior probability that Nate Silver is a witch, regardless of any other information. This will depend on the prevalence of witches in Silver’s hometown, New York. According to this NY Meetup page, there are 3023 witches in NY. Considering the population of the whole city (8,244,910 according to the US census), the prior probability of a random person in NY being a witch is 0.0004.
  • p(W’), or the probability that Nate is a muggle regardless of any other information, and that’s 1 – 0.0004 or 0.9996 in this case.
  • p(P|W), the probability of Nate making a perfect prediction, given that he’s a witch: 100%, or 1.
  • p(P|W’), the probability of Nate making a perfect prediction as a muggle, which we put at  or 0.008 earlier.
  • p(P), the probability of making a perfect prediction, regardless of any other information. Using the law of total probability – all probabilities have to add up to 1 or 100% – this is 1×0.0004 + 0.008×0.9996 = 0.0084

Now that we know all this we can fill out the formula for calculating posterior probability:

posterior probabilityThat’s pretty slim, though at 5%, we can’t be sure he isn’t a witch. However, going back to the 2008 elections, there were already some suspicions of Nate Silver’s potential Wiccan background. If we start with the 0.18 probability we arrived at earlier, the posterior probability of Nate Silver being a witch rises to 0.96 or 96%. So yes, Nate Silver is probably a witch. Alternatively, you could of course exchange ‘witch’ with ‘statistician’ and conclude with 96% confidence that he’s just very good at his job.

The Saga of Seroxat

“My name is Gail Griffith and I serve as the patient representative on this committee, and I would just like to take this opportunity to say why I am here. First, I am not a medical professional; I am a consumer. I have suffered from major depression since I was a teen.  Second, I have a son who suffers from major depression and three years ago, at age 17, after he was diagnosed and placed on a regimen of antidepressants he attempted suicide by overdosing intentionally on all his medications. He nearly died. So, I know this illness. I know what it does to adolescents.

“For the record, I would simply like to state that I have no professional ties to any advocacy group or any patient constituency. I also wish to affirm that I have no ties to any pharmaceutical company, nor do I hold any investments in pharmaceutical manufacturers. My sole responsibility is to ensure that the interests of concerned parents and families are represented”

If you have read Dr Ben Goldacre’s Bad Pharma, you may have been surprised at how the pharmaceutical industry has been able to distort the evidence for the effectiveness of medicines in their favour. As about 1/3 of my PhD focuses on antidepressant use in children, I have been a bit less surprised. Throughout the two years I’ve been looking into this particular topic, I’ve been finding out about the industry’s (and regulators’ and academics’) transgressions on an almost weekly basis. Even more so when I started reading the FDA’s seemingly endless meeting reports as I had some problems falling asleep. Surely, these would be the perfect long dry reads I needed to combat my insomnia? Well, to cut a long story short, they didn’t. But in order to get to the particular meeting Ms Griffith was introducing herself to above, we need to step back to the ’80s.

In the ’80s, Eli Lilly was trying very hard to get a market authorisation for their new antidepressant fluoxetine. It was a revolutionising drug as it was the first in a whole new class, the selective serotonin reuptake inhibitors or SSRIs, to go on the market (though not the first to be discovered). But before their drug becomes available to doctors to prescribe to patients, Eli Lilly had to get approval from the national regulators. In this case, the German Bundesgesundheitsamt (BGA for short) was chosen for a first attempt. Eli Lilly submitted their trial data and on 25 May 1984 their received a damning fax telling them the BGA did not intend to approve their drug.

Fax courtesy of German blogger Lothar Schröder

The main reason for the BGA labelling fluoxetine ‘totally unsuitable for the treatment of depression’ was that they found that 16 people taking the drug in trials had attempted suicide, 2 of whom successfully. You might think at this point that a drug like this would never make it past the regulators. You would be wrong. After being rejected again by the BGA in 1988, it finally made it onto the market in 1991. By that time fluoxetine had already made a name for itself in most of the rest of the world. You might know it better by it’s other name: Prozac.

Soon after Prozac started being prescribed to people, doctors started to worry. Some noticed that some patients they prescribed the drug to became aggressive or suicidal, which they hadn’t been before1. In response to this, the FDA got together the experts in the field to discuss the matter, and concluded that it wasn’t the drug, but the depression that the drug was trying to treat that caused the excess in suicidal behaviour. Prozac remained on the market, and no warnings were given out.

As Prozac, and other ‘me-too’ SSRIs started to make billions for pharmaceutical companies, trials were started to see whether they were effective for treating depression in teenagers as well. GSK ran three trials between 1995 and 2001 on their SSRI paroxetine. “The results of the studies were disappointing. The possibility of obtaining a safety statement from this data was considered but rejected,” read an internal email referring to the studies. Nonetheless, an article appeared in 2001, in the Journal of American Academy of Child and Adolescent Psychiatry (that’s a pretty good journal to be published in), showing that paroxetine was indeed very effective in treating depression in children2. At the same time, GSK organised meetings where they told GPs that paroxetine “demonstrates remarkable efficacy and safety in the treatment of adolescent depression.”

Academics wrote to the journal criticising the paper, drawing the attention of Shelley Jofre, a journalist for BBC’s Panorama. From 2002 to 2007 she made 4 documentaries focussing on Seroxat, as paroxetine is sold in the UK, and how GSK had distorted and hidden data on this drug. The last documentary in the series forms a nice half hour summary of, as one person in the documentary puts it, ‘the despicable actions of GSK’. I’d very much recommend watching it, but just to give a bit of an idea: GSK held back the data from two trials that showed paroxetine was no better than placebo and distorted the evidence in the study that it did publish. In the peer-reviewed study the primary outcome had changed (when using the original primary outcome paroxetine was no better than placebo or the best available treatment at the time), and as it turned out, the study was ghostwritten (Professor Martin Keller, first ‘author’ on the published paper, admits never having seen the raw data, just neatly prepared summary tables).

Paxil (the US name for Seroxat/paroxetine) protest

But most worryingly of all: the serious adverse effects (suicidal behaviour as it turns out) had been downplayed. In the trial, 11 teenagers taking paroxetine experienced serious adverse events, compared to only five for imipramine (the ‘active’ control – imipramine is a different type of antidepressant), and two for placebo; the group sizes were equal. Serious side effects are the kind that lands you in hospital. However, in the article, only 1 case (out of 11) was deemed to be related to paroxetine. When the study was later reanalysed, it turned out 10 of the 92 teenagers in the paroxetine group had experienced a potential suicidal reaction. None of this was mentioned in the article. The study, which should be retracted on the basis of distortion, is still available on the journal’s website and has been cited 475 times as of 11 October 2012.

Almost every single issue raised in Bad Pharma seems to have happened in the saga of Seroxat. In June 2003, the MHRA (after being informally told by GSK that there might be an issue with suicidal behaviour in patients taking their drug Seroxat) issued a warning, stating that paroxetine shouldn’t be prescribed in people under the age of 18. A larger investigation into all antidepressant followed, and at the end of the same year, the MHRA extended their warning to all other SSRIs, with the exception of fluoxetine.

Enter the meeting this story started with. On 13 September 2004, in a Holiday Inn in Bethesda, Maryland, the FDA held a meeting to discuss similar worries antidepressants might increase the risk of teenagers committing or attempting suicide. Gail Griffith, whose son had attempted suicide by taking an overdose of his antidepressants three and a half years before, attended this meeting as a patient representative. As the only member of the public in a meeting with 37 academics and medics, it must have been intimidating.

The meeting took two whole days, and concluded all antidepressants should carry a black box warning (a warning on the patient information leaflet in each box of medicine, emphasised by a black box surrounding the message), warning these drugs might increase the risk of ‘suicidality’ in children. Don’t know what that word, ‘suicidality’, means? Don’t worry, neither does the FDA:

“Dr. Irwin (committee member): Is there a word ‘suicidality’?
Dr. Goodman (committee chairman): Every time I write it in Word, it gets red underlined.
Dr. Irwin: It seems to me, I mean to me, I am not certain anyone really knows what it is that we are saying and what you are voting on, or, to me, I would like to know what suicidality is.
Dr. Goodman: I don’t think it is in an Oxford Dictionary either.
Ms. Griffith (patient representative): It is not in Webster’s.
Dr. Irwin: In a sense, it confounds things by, you know, the front page of the paper today [The Wall Street Journal carried a story stating that SSRIs had caused children to commit suicide], I think may lead to kind of a misrepresentation.
Dr. Pollock (committee member): Can’t we just use the explicit language?
Dr. Goodman: That is, in part, what I would favor, is that if we use it, I think we need to at least parenthetically define what we mean when we are answering the question.
Dr. Temple (FDA associate director for medical policy): Yes, that is what we do. I think that is what we actually did in labeling. Whether we should coin a new word is debatable, obviously, but it means suicidal behaviour plus suicidal ideation. That is what we use it to mean as those items.
Dr. Goodman: Would it be fair for us to slightly modify the question, or do we have to take
it as it is, because what I would say, if we could use the definition that corresponds to Outcome 3, I would feel most comfortable, because that corresponds to the reclassification and the way you approach the dataset. So, suicidality, suicide attempt, preparatory action/or suicidal ideation.
Dr. Katz (FDA Supv. Medical Officer and Director of Division of Neuropharmacological Drug Products): Yes, you can certain amend the question. We called it suicidal behavior and ideation, but it is clearly what is embodied in Codes 1, 2, and 6.
Dr Goodman: ‘I think we have a clarification on that and hopefully, the public will understand what we mean, too, and that, I think we will leave it to the press to do their job in trying to best define what we mean and don’t mean by that term, specifically, that we are not talking about actual completed suicide if we are restricting our deliberations to the clinical trials, because there weren’t any instances.’”

(Emphasis added)

So this is where our story ends, for now. The confusing warnings by the MHRA (no SSRIs apart from Prozac for children), the FDA (no antidepressants at all for people under the age of 25) and the EMA (no SSRIs, but other antidepressants are kind of okay for children) still stand, and are supposedly based on exactly the same evidence. Initially, in the UK, GPs seemed to adhere the MHRA’s advice3 (yup, that’s my study). However, the change didn’t last long: from 2005 rates for SSRI prescription in teenagers have been on the increase again. And more than that: research comparing the Netherlands and the UK suggests that the negative media attention (which mostly only happened in the UK) and regulatory warnings overall didn’t have much effect on the increasing trends in SSRI prescriptions4. Even more worrying: apart from not knowing whether antidepressants are safe for children, we also don’t know whether they work at relieving their depression. The Cochrane Library has two separate reviews looking at whether the older tricyclic antidepressants and the newer SSRIs work, but the results aren’t too comforting to say the least5,6.

Please don’t think GSK is the only the company in the wrong. It just so happens that because of their $3 billion fine earlier this year (partly because they marketed paroxetine to children) a lot of the evidence used in this piece was uprooted. It can all be found on the US Department of Justice’s website.

(Quotes from Ms Gail Griffith and the discussion on ‘suicidality’ are directly taken from transcripts of the Joint meeting of the Psychopharmacologic Drugs Advisory Committee and the Pediatric Advisory Committee on 13/14 September 2004 – excerpt 1 from page 12-13 from 13 September meeting, and excerpt 2 from page 213-16 from the 14 September meeting. Both can be found on the FDA’s CDER 2004 Meeting Documents page – I won’t link directly to the PDF’s as they’re quite big)

1)     Teicher MH, Glod C, Cole JO: Emergence of intense suicidal preoccupation during fluoxetine treatment. Am J Psychiatry 1990; 147(2):207-10

2)     Keller MB, Ryan ND, Strober M, et al: Efficacy of paroxetine in the treatment of adolescent major depression: a randomized, controlled trial. J Am Acad Child Adolesc Psychiatry 2001; 40(7):762-72

3)     Wijlaars LPMM, Nazareth I, Petersen I (2012) Trends in Depression and Antidepressant Prescribing in Children and Adolescents: A Cohort Study in The Health Improvement Network (THIN). PLoS ONE 7(3): e33181. doi:10.1371/journal.pone.0033181

4)     Hernandez JF, Mantel-Teeuwisse AK, van Thiel GJMW, Belitser SV, Warmerdam J, et al. (2012) A 10-Year Analysis of the Effects of Media Coverage of Regulatory Warnings on Antidepressant Use in The Netherlands and UK. PLoS ONE 7(9): e45515. doi:10.1371/journal.pone.0045515

5)     Hetrick SE, Merry SN, McKenzie J, et al. (2009) Selective serotonin reuptake inhibitors (SSRIs) for depressive disorders in children and adolescents. The Cochrane Library

6)     Hazell P, O’Connell D, Heathcote D, et al. (2010) Tricyclic drugs fro depression in children and adolescents. The Cochrane Library


Get every new post delivered to your Inbox.

Join 547 other followers