Trying to make sense of nonsense

The Saga of Seroxat

“My name is Gail Griffith and I serve as the patient representative on this committee, and I would just like to take this opportunity to say why I am here. First, I am not a medical professional; I am a consumer. I have suffered from major depression since I was a teen.  Second, I have a son who suffers from major depression and three years ago, at age 17, after he was diagnosed and placed on a regimen of antidepressants he attempted suicide by overdosing intentionally on all his medications. He nearly died. So, I know this illness. I know what it does to adolescents.

“For the record, I would simply like to state that I have no professional ties to any advocacy group or any patient constituency. I also wish to affirm that I have no ties to any pharmaceutical company, nor do I hold any investments in pharmaceutical manufacturers. My sole responsibility is to ensure that the interests of concerned parents and families are represented”

If you have read Dr Ben Goldacre’s Bad Pharma, you may have been surprised at how the pharmaceutical industry has been able to distort the evidence for the effectiveness of medicines in their favour. As about 1/3 of my PhD focuses on antidepressant use in children, I have been a bit less surprised. Throughout the two years I’ve been looking into this particular topic, I’ve been finding out about the industry’s (and regulators’ and academics’) transgressions on an almost weekly basis. Even more so when I started reading the FDA’s seemingly endless meeting reports as I had some problems falling asleep. Surely, these would be the perfect long dry reads I needed to combat my insomnia? Well, to cut a long story short, they didn’t. But in order to get to the particular meeting Ms Griffith was introducing herself to above, we need to step back to the ’80s.

In the ’80s, Eli Lilly was trying very hard to get a market authorisation for their new antidepressant fluoxetine. It was a revolutionising drug as it was the first in a whole new class, the selective serotonin reuptake inhibitors or SSRIs, to go on the market (though not the first to be discovered). But before their drug becomes available to doctors to prescribe to patients, Eli Lilly had to get approval from the national regulators. In this case, the German Bundesgesundheitsamt (BGA for short) was chosen for a first attempt. Eli Lilly submitted their trial data and on 25 May 1984 their received a damning fax telling them the BGA did not intend to approve their drug.

Fax courtesy of German blogger Lothar Schröder

The main reason for the BGA labelling fluoxetine ‘totally unsuitable for the treatment of depression’ was that they found that 16 people taking the drug in trials had attempted suicide, 2 of whom successfully. You might think at this point that a drug like this would never make it past the regulators. You would be wrong. After being rejected again by the BGA in 1988, it finally made it onto the market in 1991. By that time fluoxetine had already made a name for itself in most of the rest of the world. You might know it better by it’s other name: Prozac.

Soon after Prozac started being prescribed to people, doctors started to worry. Some noticed that some patients they prescribed the drug to became aggressive or suicidal, which they hadn’t been before1. In response to this, the FDA got together the experts in the field to discuss the matter, and concluded that it wasn’t the drug, but the depression that the drug was trying to treat that caused the excess in suicidal behaviour. Prozac remained on the market, and no warnings were given out.

As Prozac, and other ‘me-too’ SSRIs started to make billions for pharmaceutical companies, trials were started to see whether they were effective for treating depression in teenagers as well. GSK ran three trials between 1995 and 2001 on their SSRI paroxetine. “The results of the studies were disappointing. The possibility of obtaining a safety statement from this data was considered but rejected,” read an internal email referring to the studies. Nonetheless, an article appeared in 2001, in the Journal of American Academy of Child and Adolescent Psychiatry (that’s a pretty good journal to be published in), showing that paroxetine was indeed very effective in treating depression in children2. At the same time, GSK organised meetings where they told GPs that paroxetine “demonstrates remarkable efficacy and safety in the treatment of adolescent depression.”

Academics wrote to the journal criticising the paper, drawing the attention of Shelley Jofre, a journalist for BBC’s Panorama. From 2002 to 2007 she made 4 documentaries focussing on Seroxat, as paroxetine is sold in the UK, and how GSK had distorted and hidden data on this drug. The last documentary in the series forms a nice half hour summary of, as one person in the documentary puts it, ‘the despicable actions of GSK’. I’d very much recommend watching it, but just to give a bit of an idea: GSK held back the data from two trials that showed paroxetine was no better than placebo and distorted the evidence in the study that it did publish. In the peer-reviewed study the primary outcome had changed (when using the original primary outcome paroxetine was no better than placebo or the best available treatment at the time), and as it turned out, the study was ghostwritten (Professor Martin Keller, first ‘author’ on the published paper, admits never having seen the raw data, just neatly prepared summary tables).

Paxil (the US name for Seroxat/paroxetine) protest

But most worryingly of all: the serious adverse effects (suicidal behaviour as it turns out) had been downplayed. In the trial, 11 teenagers taking paroxetine experienced serious adverse events, compared to only five for imipramine (the ‘active’ control – imipramine is a different type of antidepressant), and two for placebo; the group sizes were equal. Serious side effects are the kind that lands you in hospital. However, in the article, only 1 case (out of 11) was deemed to be related to paroxetine. When the study was later reanalysed, it turned out 10 of the 92 teenagers in the paroxetine group had experienced a potential suicidal reaction. None of this was mentioned in the article. The study, which should be retracted on the basis of distortion, is still available on the journal’s website and has been cited 475 times as of 11 October 2012.

Almost every single issue raised in Bad Pharma seems to have happened in the saga of Seroxat. In June 2003, the MHRA (after being informally told by GSK that there might be an issue with suicidal behaviour in patients taking their drug Seroxat) issued a warning, stating that paroxetine shouldn’t be prescribed in people under the age of 18. A larger investigation into all antidepressant followed, and at the end of the same year, the MHRA extended their warning to all other SSRIs, with the exception of fluoxetine.

Enter the meeting this story started with. On 13 September 2004, in a Holiday Inn in Bethesda, Maryland, the FDA held a meeting to discuss similar worries antidepressants might increase the risk of teenagers committing or attempting suicide. Gail Griffith, whose son had attempted suicide by taking an overdose of his antidepressants three and a half years before, attended this meeting as a patient representative. As the only member of the public in a meeting with 37 academics and medics, it must have been intimidating.

The meeting took two whole days, and concluded all antidepressants should carry a black box warning (a warning on the patient information leaflet in each box of medicine, emphasised by a black box surrounding the message), warning these drugs might increase the risk of ‘suicidality’ in children. Don’t know what that word, ‘suicidality’, means? Don’t worry, neither does the FDA:

“Dr. Irwin (committee member): Is there a word ‘suicidality’?
Dr. Goodman (committee chairman): Every time I write it in Word, it gets red underlined.
Dr. Irwin: It seems to me, I mean to me, I am not certain anyone really knows what it is that we are saying and what you are voting on, or, to me, I would like to know what suicidality is.
Dr. Goodman: I don’t think it is in an Oxford Dictionary either.
Ms. Griffith (patient representative): It is not in Webster’s.
Dr. Irwin: In a sense, it confounds things by, you know, the front page of the paper today [The Wall Street Journal carried a story stating that SSRIs had caused children to commit suicide], I think may lead to kind of a misrepresentation.
Dr. Pollock (committee member): Can’t we just use the explicit language?
Dr. Goodman: That is, in part, what I would favor, is that if we use it, I think we need to at least parenthetically define what we mean when we are answering the question.
Dr. Temple (FDA associate director for medical policy): Yes, that is what we do. I think that is what we actually did in labeling. Whether we should coin a new word is debatable, obviously, but it means suicidal behaviour plus suicidal ideation. That is what we use it to mean as those items.
Dr. Goodman: Would it be fair for us to slightly modify the question, or do we have to take
it as it is, because what I would say, if we could use the definition that corresponds to Outcome 3, I would feel most comfortable, because that corresponds to the reclassification and the way you approach the dataset. So, suicidality, suicide attempt, preparatory action/or suicidal ideation.
Dr. Katz (FDA Supv. Medical Officer and Director of Division of Neuropharmacological Drug Products): Yes, you can certain amend the question. We called it suicidal behavior and ideation, but it is clearly what is embodied in Codes 1, 2, and 6.
Dr Goodman: ‘I think we have a clarification on that and hopefully, the public will understand what we mean, too, and that, I think we will leave it to the press to do their job in trying to best define what we mean and don’t mean by that term, specifically, that we are not talking about actual completed suicide if we are restricting our deliberations to the clinical trials, because there weren’t any instances.’”

(Emphasis added)

So this is where our story ends, for now. The confusing warnings by the MHRA (no SSRIs apart from Prozac for children), the FDA (no antidepressants at all for people under the age of 25) and the EMA (no SSRIs, but other antidepressants are kind of okay for children) still stand, and are supposedly based on exactly the same evidence. Initially, in the UK, GPs seemed to adhere the MHRA’s advice3 (yup, that’s my study). However, the change didn’t last long: from 2005 rates for SSRI prescription in teenagers have been on the increase again. And more than that: research comparing the Netherlands and the UK suggests that the negative media attention (which mostly only happened in the UK) and regulatory warnings overall didn’t have much effect on the increasing trends in SSRI prescriptions4. Even more worrying: apart from not knowing whether antidepressants are safe for children, we also don’t know whether they work at relieving their depression. The Cochrane Library has two separate reviews looking at whether the older tricyclic antidepressants and the newer SSRIs work, but the results aren’t too comforting to say the least5,6.

Please don’t think GSK is the only the company in the wrong. It just so happens that because of their $3 billion fine earlier this year (partly because they marketed paroxetine to children) a lot of the evidence used in this piece was uprooted. It can all be found on the US Department of Justice’s website.

(Quotes from Ms Gail Griffith and the discussion on ‘suicidality’ are directly taken from transcripts of the Joint meeting of the Psychopharmacologic Drugs Advisory Committee and the Pediatric Advisory Committee on 13/14 September 2004 – excerpt 1 from page 12-13 from 13 September meeting, and excerpt 2 from page 213-16 from the 14 September meeting. Both can be found on the FDA’s CDER 2004 Meeting Documents page – I won’t link directly to the PDF’s as they’re quite big)

1)     Teicher MH, Glod C, Cole JO: Emergence of intense suicidal preoccupation during fluoxetine treatment. Am J Psychiatry 1990; 147(2):207-10

2)     Keller MB, Ryan ND, Strober M, et al: Efficacy of paroxetine in the treatment of adolescent major depression: a randomized, controlled trial. J Am Acad Child Adolesc Psychiatry 2001; 40(7):762-72

3)     Wijlaars LPMM, Nazareth I, Petersen I (2012) Trends in Depression and Antidepressant Prescribing in Children and Adolescents: A Cohort Study in The Health Improvement Network (THIN). PLoS ONE 7(3): e33181. doi:10.1371/journal.pone.0033181

4)     Hernandez JF, Mantel-Teeuwisse AK, van Thiel GJMW, Belitser SV, Warmerdam J, et al. (2012) A 10-Year Analysis of the Effects of Media Coverage of Regulatory Warnings on Antidepressant Use in The Netherlands and UK. PLoS ONE 7(9): e45515. doi:10.1371/journal.pone.0045515

5)     Hetrick SE, Merry SN, McKenzie J, et al. (2009) Selective serotonin reuptake inhibitors (SSRIs) for depressive disorders in children and adolescents. The Cochrane Library

6)     Hazell P, O’Connell D, Heathcote D, et al. (2010) Tricyclic drugs fro depression in children and adolescents. The Cochrane Library


Stata: it’s what the cool kids use

You might have it somewhere already, but in just a couple of weeks, the year of statistics will start. I expect there will be lots of posts on this sexy science, featuring references to bikinis, the exciting things they and statistics are supposed to hide, so you might as well get in on the action early. One thing we statistically-minded people all love, is our statistical software package. It’s a bit like browser preference, where you try to tell someone’s competence by the package or browser they use. SPSS is the internet explorer equivalent for instance, while Stata, R and SAS would be Firefox (add-ons and a new update every week), Chrome (free, but you can’t see what’s going on), and SAS (the fancy industry people use it most), respectively.


My favourite is Stata, and for a very simple reason: someone wrote a program to play hangman with it (and perhaps also a bit because of the Statasaur t-shirt pictures above). Unfortunately for us epidemiologists, Marek Hlavac (the creator of Stata hangman) is an economist, meaning that the words to be guessed are not very epidemiology or statistics-based. So I made an epi add-on.

Or rather, I started on one as I found out I don’t know that many famous statisticians (though Wikipedia helped) or epidemiologists. Moreover, there are scarcely any women I included: I was a bit hesitant about including anyone still alive (with some notable exceptions though), so I didn’t get much further than Florence Nightingale and Ada Lovelace (as there would be no Stata without computers). But as Ada Lovelace day is coming up next week (16th October everyone!), I thought this might be a nice time to try to find out about all the female epidemiologists and statisticians out there.

I haven’t added too many myself, as I didn’t just wanted to add anyone I know of: it might get to the point where names are just impossible to guess in the game. At the same time: it might just be the thing to do as you’d learn about lots of epi-women (maybe I’ll adapt the program to include a google command so you can find out about all these women immediately!)

To try out stata hangman, you can download the original do-file (which contains the programming for the game) and the .dta file (containing the words-to-guess) from this website.

As I don’t think I can upload .dta files on WordPress: here is an excel file of the epi-addon: hangman_epi_addon (.xlsx file)

To import it into stata and save it as a .dta file:

import excel "path_where_you_saved_it\hangman_epi_addon.xlsx", sheet("Sheet1") firstrow allstring clear
save "path_where_hangman_do-file_is_saved\hangman_epi_addon.dta", replace

But more importantly: who else  should be included on the list?

Edit: found the Wikipedia-epidemiologists list and added a lot more epi-women!

Epidemiology courses: getting an idea of what’s out there

So I started my final year as a PhD student yesterday. Or so I am told, as the REF exercise is coming up, and PhD students finishing on time seem to make up at least part of that score. Also, my funding runs out in 12 months – 1 day which might be a bit more of a personal motivator to actually try and achieve that still seemingly unattainable goal (I seem to be slightly stuck in the valley of shit-part of my PhD at the moment).

Thesis relationship: it's complicated

As I go into that scary final year, others are just starting their perilous journey of pre-doctorhood and one of the questions that seems to keep popping up is what courses are good (all those newbies seem to swim in money!). I’ve been lucky enough to go on quite a few, and hear about even more during my two years of thesis-slavery, so I thought I’d try and make a dreadfully incomplete overview of epi & stats related courses. There’s bound to be lots more out there (apologies for the London/South-East England-bias) and I haven’t been able to find a nice summary of what’s available anywhere else. Though I suspect there’s probably a reason for that, which I’m about to find out.

So without further ado: some of the awesome epi education the UK (and again, very selective bits of Europe) have to offer:


London: UCL – Institute of Child Health All year round short stats courses, from basic courses on logistic or linear regression, to Bayesian analysis and missing data

London: UCL – Primary Care & Population Health Organises courses in October/November on the use and analysis of electronic health records

London: UCL – Infection and Population Health Perhaps a bit too specific for this list, but IPH organises 2 courses in June for HIV/sexual health researchers (nothing wrong with a bit of cross-disciplinary education)

London: LSHTM All year courses on a wide variety of statistics, epidemiology and public health topics. The causal inference course (in November) is particularly good.

London: Imperial – Statistical Advisory Service All year round stats courses: introduction to Stata or SPSS and design and analysis of clinical trials

London: The Royal Statistical Society What better place to learn about statistics? They do lots of different stats courses, all year round. They also run a good course on presenting data (equally important as getting some results).

Bristol: Uni of Bristol – Social and Community Medicine All year round courses on lots of different topics related to statistics (mainly Stata focussed), epidemiology and social medicine

Cambridge: MRC Biostatistics Unit One day course on practical use of multiple imputation to handle missing data in Stata 12 – usually held once a year but exact date varies

Cambridge: Uni of Cambridge – The Psychometrics Centre run short courses on structural equation modelling once a year

Reading: Uni of Reading – Statistical Services Centre The SSC runs lots of different statistics courses, from introduction courses to advanced level using (almost?) every statistical software imaginable (30% academic discount!).

Leeds: Uni of Leeds – Statistical Thinking Courses on statistics for non-statisticians. Most courses seem to run in February/March, and there’s a summer school as well.

Manchester: Uni of Manchester – The Cathy Marsh Centre for Census and Survey Research The CCSR run all year round stats courses ranging from intro to advanced levels, as well as some courses on (analysing) survey data.

Southampton: Population Health Sciences Research Network A 3-day course on epidemiology for clinicians covering measures of disease occurrence and risk, cohort, case-control and cross-sectional studies, randomised controlled trials, getting started in research, introduction to statistical analysis, statistical genetics, interpreting findings, and genetic epidemiology.

Lancaster: Uni of Lancaster – Department of Mathematics and Statistics Another university running all year stats courses from introduction to advanced level, using R, SPSS, Stata and AMOS.

Southampton: Uni of Southampton – Courses for Applied Social Surveys The name says it all: all the statistical and analytical skills you need to analyse (complex) survey data in one place.

Colchester: Uni of Essex – Summer School in Social Science Data Analysis Six weeks of mathematics and statistics, in collaboration with the universities of Oxford and Mannheim. The six weeks are split up in three sessions, each with courses increasing in difficulty. It’s a very mixed bag of courses, so there ought to be something interesting for everyone.

If you’re interested in courses on infectious disease epidemiology, the IDRN have a great overview on their website.


The Netherlands: Erasmus University Rotterdam (Winter / Summer) Three weeks of courses focussed on epidemiology in winter (February/March) and summer (August) with some excellent international speakers (PhD students get 50% discount on the course fees!)

The Netherlands: Utrecht University Summer School In July and August, Utrecht University organises 6 weeks of courses ranging from art history to theoretical physics. There are plenty of epidemiology and statistics related courses as well (mainly focussing on pharmacoepidemiology and environmental/occupational epi.

Switzerland: Epi Winter School in Wengen The course everyone wants to go on: lectures in the morning, skiing in the afternoon, practicals in the evening and it is actually relevant to epidemiology so you’ve got an excuse to go (though I didn’t manage to convince my supervisor of this just yet).

Italy: European Educational Programme in Epidemiology in Florence Just in case skiing isn’t your thing, there’s this course in summery Tuscany. Pasta, Pisa and P-values, what more could one want (clinically significant results and a publication in the Lancet, since you’re asking)?


US: University of Michigan summer school The school of Public Health is organising this one with courses varying from 1 to 3 weeks in length, and some online/distance-learning courses as well.

Canada: McGill Summer session Organised by the department of Epidemiology, Biostatistics and Occupational Health so topics available for everyone!



Coursera: Free online courses, too many different ones to create a list here (new ones keep getting added), but to pick just one as an example: Computing for Data Analysis (an intro to R)

EdX: Similar to Coursera (also free!), lots of different courses, but EdX has got a specific public health course, ran by Harvard – Health in Numbers: Quantitative Methods in Clinical & Public Health Research

Stata NetCourses: Online courses on how to use Stata, how to program in Stata and some time series modelling. They’re very affordable, and even more so with the current dollar/pound exchange rate 🙂

Elevate: Online courses organised by the University of Utrecht in the Netherlands on epidemiology and biostatistics, and also a few public/global health ones! Courses are priced similar to offline courses (which is my way of saying I think they’re quit expensive).

NIHES: The same people organising the Erasmus Winter & Summer programmes, but online this time. At the moment there’s only a course in diagnostic research, but I’m sure more will follow. You’ll have to miss out on visiting the Erasmus bridge and famous Dutch stroopwafels though.

UCLA: Quite possible the best stats resource on the internet. You can find web books, video lectures, explanations on how to run all commonly used statistical tests in Stata, SPSS or SAS, and lots more. Once you get into it, it’s a bit like that XKCD comic.

I’ll add courses once I run into them, but please let me know if there’s anything I should add (there are lots, I am sure!).


Thanks to @gingerly_onward,  @rlodw, @CedarUK, @lou_hurst, @jeanmadams, @rob_aldridge, @rebeccalacey and @Peter_Tennant for suggestions!

How to publish a paper: a student’s perspective (epilogue)

Thank you for all for reading my posts! I’ve managed to survive actually presenting it, and got some good feedback so I thought I’d share some other nifty things I’ve learned about.

The session on publishing included Dr Sean Hennessy, editor for the Americas for Pharmacoepidemiology and Drug Safety, and Dr Tarek Hammad, deputy division director of the FDA’s Department of Epidemiology. As not-yet-doctor and the person with the shortest ‘this is what I did to get here’-slide I did suffer from imposter syndrome quite a bit, but hearing from other students afterwards, I seemed to have hidden it quite well.

Dr Hennessy, as both an academic and editor, give an outline of what should be in your article, and where it ought to be (and as his presentation hasn’t magically appeared on the internet yet, I definitely need to work on improving on the legibility of my handwriting so I can actually make sense of it afterwards).  Dr Hammad’s presentation leaned on years of experience of publishing papers, and boiled it all down into 10 useful tips, which I will let you read yourself. His first tip, publishing under the influence, is maybe the most relevant one. Although many PhDs seem to focus on getting that coveted thesis written, Dr Hammad emphasised that as a grad student you are also in the perfect position to get some peer-reviewed papers out. You’ve got your supervisors who can help you with the actual writing and hopefully give you lots of feedback so you know what you need to work on most (and as a student, you can still benefit from courses organised by your grad school).

The most interesting topics came up during the panel discussion afterwards. After I had ascertained the room that cover letters were very important, Dr Hennessy assured us that most editors don’t actually read them. Several academics in the room gasped as if they just collectively missed a grant deadline. After I tweeted about it, @Peter_Tennant enlightened me on the fact other editors are of the exact opposite opinion. A few days later Dr Hennessy came up to me to tell me that after inquiring with some other editors at his journal, they do seem to read cover letters. Phew, so I didn’t spend all that time writing convoluted sentences about how great the journals I want to submit my article to are. (As a side note: I suspect the importance of your cover letter might depend on the type of editor – part-time editors at specialist journals who are also academics might head straight for the article while full time editors read cover letters. It would be interesting to find out whether that’s what was behind Dr Hennessy statement).

The discussion on self-plagiarism was also interesting. A lot, if not most of the people attending ISPE work with electronic health databases. Be they electronic health care records, claims data or registries, the number of large databases available are on the increase. Given the size and range of information available in these databases, they can be used over and over again for new research. The team I work in has already published over 35 papers using THIN data. A problem arises when trying to describe this data source in a paper, a necessary and important bit of the methods section. In peer-reviewed papers (and theses, while we’re on it), you’re not allowed to copy text from another paper, even if it is your own. This means that ever time we write a new paper using the same database, we have to find a new way to describe it. Every single time. You’d imagine there are only a limited number of ways to switch between the active and passive voice, mention different aspects of the database or slightly rearrange the order of the words, but the GPRD/CPRD have managed to pull off over 400 research papers, so other options most be there.

A second tip came from a student: learn how to use Word. Most people getting into science now will have grown up with Word, so it might seem a bit too basic. However, there are lots of clever things Word can do that you might not know about as you didn’t need them when you were learning how to use it at age 8 (ah, the times when Comic Sans, Wingdings and the flashy gifs on geocities ruled the world). Again, grad schools might offer good short courses on what Word is actually capable of.


Finally, I got pointed in the direction of Jane. Jane is an amazing piece of software, writing by the biosemantics group at the Erasmus MC in Rotterdam, the Netherlands. You put in an abstract or article title, and it finds journals and authors that have published similar stuff: an ideal tool for creating that list of potential journals to submit to, and to identify potential reviewers at the same time. Journals are listed by relevance, and listed with their Article Influence score, rather than those evil impact factors. As a bonus, it also finds relevant papers that you might want to cite. Jane’s perfect for impressing your supervisors with a ready made super-relevant list of journals.

So that’s it for now – I’m sure there are lots more helpful tips out there, so if you could add anything: I would love to hear from you!

How to publish a paper: a student’s perspective (part 2)

Well, thank you all for reading the first part – I got more visitors than I normally get in a month! Hope you like the second half as much as the first! Any comments/other tips are of course very welcome.

Step 3: Submission and waiting

Before you submit, you should make some final checks, for which most journals supply a handy checklist (sometimes you don’t run in to these until you actually register to their submission system, so have a look around there early on). Are you complying with the relevant reporting guidelines? Do you have all necessary forms, such as conflict of interest statements, and the perfect abstract and cover letter (those are two things an editor is sure to look at) to convince the journal your article is worth reviewing? Right then: hit those buttons and submit.

Peer-review wait

Waiting for my paper to go through the peer-review mill (it got accepted at this journal (International Journal of Obesity) so it was worth the wait!)

And now the wait starts. If you don’t hear from the journal within the next couple of days, your paper has probably been sent out for review, which is good, but could take weeks, if not months. Luckily, there are things you can do to slightly speed this process up, such as suggesting potential reviewers in your cover letter (if the journal doesn’t provide that option in their submission system). Even though the journal might have published lots of similar studies, it is always helpful to make some recommendations.

Step 4: Results!

Unless your are submitting to the journal of universal rejection, you can never be sure what the outcome of a submission is going to be. There are three possible outcomes: you’re paper gets rejected, the editor wants you to revise your paper, or your paper is accepted without changes (it’s theoretically possible, I’ve been told). In case of rejection, you can either appeal the decision or move on to the next journal.

Personally, I have never appealed, but it is possible to do so when you feel you’ve been unfairly rejected. Maybe the reviewers didn’t display any knowledge of the topic area (you’d be surprised how many reviewers accept to review paper on topics they have no or little expertise on), or the decision of the editor doesn’t seem to add up with the opinions of the reviewers. It happens. One thing to keep in mind is that an appeal can take a long time: if the editor appears to have made the wrong call, associate editors will have to make a decision, which can be fairly quick. However, if the reviewers were in the wrong, the editor will have to assert their incompetence, and find new reviewers. It might be faster to submit to a different journal (which might be preferable in the case of looming grant application deadlines).

The third option, revision, comes in two flavours: minor and major. Although the first one gives you a better chance of eventual acceptance, it’s still not sure you’re going to get in. The vestiges of published, peer-review science are guarded well, or that is the intended function of peer-review at least. Major revisions will require re-analysis, new tests or experiments, rewrites or explanations of unexplained concepts: the list is endless really.

Step 5: Responding to reviewers’ comments

Comic by PhD comics: Addressing reviewer comics

One thing to take into account when responding to reviewers’ comments is that is not personal, and that reviewers rarely agree. A large meta-analysis [1] actually found that peer reviewers only agree about 1 in 3 times (or even less if you focus on the larger studies with smaller confidence intervals). However, the editor would like to see your study published (that’s what is paying his their salary) and the reviewers’ comments are meant to be constructive, so it’s important to stay in character and be polite when you answer.

Bornemann et al. - Inter-Rater Reliability between reviewers

This might seem like pointing out the obvious, but under the guise of anonymity, some reviewers tend to lose composure. Although you might be tempted to give in and give such reviewers a piece of your mind, it will be the editor who will read your response first, so it’s better to hold your guns. Some reviewers might have a vested interest in whether or not your paper gets published: they work in your field, so they will have an opinion on whether what you’re doing is correct and line with their work. Other peculiar behaviour might happen when someone remarks you only cited one single paper by the distinguished Dr Scientist. Maybe you could also cite these other eight (barely relevant) papers by the honourable Dr Scientist, who you’re not supposed to guess is the actual reviewer?

Working through comments can get very frustrating. Here’s a beautiful pair of comments I got back from some reviewers (both on the same paper):

  • Reviewer #1: “The analysis and purpose of the study is confusing. The quality of the data is likely suspect.”
  • Reviewer #2: “I found the paper to be well written, the analysis rigorous and well conceived and the conclusions supported by the data and analysis.”

And this was just the start of both reviews, the disagreements between both reviewers got worse with every paragraph they dealt with. As mentioned before, these inconsistencies between reviewers are common, which is why it is an editor making the final decision, rather than the reviewers battling it out amongst themselves. Working through them can be become a bit tedious to say the least.

The final part of responding to reviewers’ comments (and you have to respond to all of them), is writing the rebuttal or revision letter. I like to start by thanking the editor for giving me the opportunity to revise and respond to the comments. That will take up one page, which I structure a bit like my cover letter (department-headed paper and all). Then I start the actual rebuttal:

“We thank the reviewers for their comments on our paper. We have changed our paper accordingly and addressed all the comments as listed below:”

[Short summary of major changes]

[Copy and paste comments from reviewers and write a short response to each of them for instance:]

Reviewer #1:

1. In figure 1, the authors have not included in which units the y-axis is labelled.

We thank the reviewer for noticing this omission. We have now correctly labelled our y-axis in rate per 100 person-years.

(the colour helps to distinguish between reviewer’s and my words) 

It might take a few pages to get through all of them, but it makes it easier for the editor to see what I did and why I did it – hopefully shortening my waiting time a bit. Then it’s time to resubmit the whole thing again. If the comments were only minor, it’s usually the editor who will make the final decision. If there were any major comments, the paper is likely to go back to the initial reviewers and you’ll have to wait a bit longer.

Alternative ways to get published

Writing papers isn’t the only way to get your name out there: give blogging a go! Or offer to write a book review (free books!), write science news articles (a good way to keep on top of what is happening in your field, and to practice those abstract writing skills) or enter a science writing competition. (I’m obviously not entirely subjective here). Significance is always on the look out for new bloggers, so why not try them if you’re tempted?

Last resorts

Nature efforts: because you tried really hard

So every journal on your list rejected your paper? Why not try the Journal of Negative Results in Biomedicine, the All Results Journal, the Journal of Pharmaceutical Negative Results or even the Journal of Articles in Support of the Null Hypothesis?

And now go read some author guidelines! They’re likely to be shorter than this post.

Resources & Reference:

1. Bornmann L, Mutz R, Daniel H-D (2010) A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants. PLoS ONE 5(12): e14331. doi:10.1371/journal.pone.0014331

LaTeX: http://en.wikibooks.org/wiki/LaTeX

Wordle: http://www.wordle.net/

Zotero: http://www.zotero.org/

PhD2Published.com / @acwri (organisers of a fortnightly twitter chat – Thursdays at 7pm BST)

Twitter hashtags: #PhDchat / #ECRchat / #acwri <- useful to ask questions and find other good resources. If you don’t use Twitter, no worries: PhD-chat has an off-twitter wiki, and ECR (early career researcher) chat has a blog.

How to publish a paper: a student’s perspective (part 1)

With only 3 papers with my name on it, I’m definitely not an expert when it comes to getting papers published. However, those 3 papers (and one that’s waiting in some reviewers’ inboxes right now) have been rejected a total number of 12 times, giving me at least some experience in preparing and submitting them. Maybe that’s why I got invited to give a talk on publishing papers at the student skills workshop at the ICPE – the International Conference on Pharmacoepidemiology and Risk Management. Or it might have been that when a publisher dropped out last minute, and the organisers (one of whom I happen to share a supervisor with) really needed someone who had already booked their tickets to Barcelona.

Either way, I’ve got a presentation to prepare for, and in doing so, I’ve found that actually I have developed a bit of a five-step system when it comes to preparing papers. Even more so: when talking with other students and staff around my department, I’ve found some interesting tips and thoughts on how to get published. I felt it might be worth it to actually write all of this up in a blog (and get some last minute considerations to add to my talk?), as a reference to build on, so here we go:

Step 1: Selecting a journal

Nature vs. Science - Comic by PhD Comics

I’ll start at the point where you’ve done all your analyses and have pretty good idea what you want your paper to be about. Maybe you’re working on that first draft, or you’re already on version 17.3, but at some point you’ll have to start considering what journal to submit to. As my first supervisor told me when my very first paper was rejected by JAMA: “If the first journal you submit your paper to accepts it, you didn’t aim high enough”.

And there you immediately have your first problem: what constitutes aiming high? Impact factors are one determinant of ‘high’, though we all know now that using those in any decision making will only prove you are statistically illiterate. Rather, you could aim at submitting to one of the general medicine journals, such as the New England Journal of Medicine, JAMA, the Lancet or BMJ. All of those boast large regular reader counts and even larger rejection rates.

The scopes of these journals are wide, but they will only consider the studies that will keep their impact factors up, so it might be good to consider some more specialist journals as well. You might not reach as many researchers, but you are more likely to reach the right ones. To find out which are best for your research, go over the papers your citing: there are bound to be some relevant journals there. Or ask an expert; you’re collaborators (if you have any) will probably be able to make some suggestions.

These can ideas can then form a list of potential journals to submit to. Being rejected becomes a lot less painful if you’ve your plan B at the ready. Final considerations will depend on your funder (should the journal be open access?) and funding (yes, it costs money to be published).

Step 2: Formatting and editing

However much I’d like it, there is no getting out of formatting or weeding through formatting guidelines (at least not until you’re senior enough to have someone do it for you), but there are some little things that can make it easier. One of these is Wordle, which creates a ‘word cloud’ highlighting the words you’ve used most often. The first time I copied a paper of mine in there (luckily just before I meant to send it to my supervisors), one word stood out like a sore thumb: However. Without really noticing it, I had started using the word in every other sentence in the discussion. Apart from highlighting unnecessary repetitions, it’s also a very nice tool for identifying key words in your paper: if the right ones don’t come up, you’re probably using to many different terms to describe one phenomenon.

Also important: use a reference manager (I like Zotero – it’s free and integrates with Firefox/Chrome, so you can you use it on any computer without needing to bring the most recent database file with you). Different programs will have different (dis)advantages, so shop around a bit before you decide upon one, there are a lot of options out there.

Draft approved - comic by PhD Comics

Another tip is to read your paper aloud. After taking six years of Latin, I’ve really come to love subordinate clauses and the dactylic hexameter. Unfortunately, they don’t work so well for academic writing, and reading sections aloud really helps in locating the overly complicated sentences I can come up when left to myself for too long (enter joke about ablative absolutes). It works even better when you leave your paper for a few days or even weeks, and then come back to it. Instead of reading what you think is there, you’ll suddenly be able to see what it is actually there.

When you finally come round to sending it to your co-authors make sure you give them enough time. Or even better: decide on a revision plan. How many times will each co-author see the paper, and in which order will you send it round? It can be hugely ineffective to send it to everyone at the same time, as you will end up with lots of similar or contradictory comments. Of course this will get more difficult with increasing numbers of co-authors, but it is important to keep at least the PI and supervisors involved.

One last formatting tip: LaTeX. It’s amazing. Like reference managers format your references, LaTeX can format your entire paper. It will take a bit coding (unless you opt for a program like LyX – thank you @JStreetley), but it will be worth it. One downside: the resulting text will be in PDF, making it harder for some reviewers to write comments or make changes.

Naturally, publishing involves a lot of waiting. So as my post is already past the 1,000 word mark, I’ll leave you to wait for part 2 (submitting & final checks, results!, and responding to reviewers’ comments) tomorrow.

Missing data: looking for information from beyond the veil

Image by forklift

In my last post, I promised to go a bit deeper into dealing with missing data. Although it might sound a bit paradoxical, it is pivotal to consider how to deal with what is not there in epidemiological studies. Missing data has the potential to skew the results of a study in unexpected directions if it is overlooked. As I tried to show in the example of the three general practices, the same group of people can give very different results if their information goes missing through one the three mechanisms of data missingness. However, through some slights of hand and a bit of cold reading

In the first example, a practice hit by a computer virus that randomly deletes some of the information recorded by the GP, data goes missing completely at random (MCAR). Luckily, the information that is left is still representative of the patient group as a whole. Dealing with this type of disappearing data is easy, as you can just go about your analysis without taking any special precautions. The results might be a bit less precise than initially hoped for, but they will be accurate.

The opposite is true when data has gone missing not at random (MNAR). In the example, a GP only recorded systolic blood pressure if it was over the threshold of 140 mmHg. As a result, it is nigh on impossible to predict what the blood pressure is of the group who don’t show up in this GP’s records. The only thing we know is that the patients’ blood pressure is probably lower than 140, but nothing else.

Of course, a lot of other studies will have used blood pressure before, so it would be possible to make an educated guess as to what the blood pressures of the other patients would be, based on their age and gender. But this would require an external source of data, which, if you’re doing something a bit more complicated than measuring blood pressure, might not be available.

Things get a bit more complicated if data is missing at random (MAR). In this case, some information has gone missing, but whether it is there or not, is related to something you have measured. In our case, the GP was more likely to subject older patients to a quick measurement. What is essential in this case is that although the information is more likely to be unavailable for younger patients, there is still some information there. Using the right type of imputation method – imputation is the substitution of some value for a missing data point, or ‘filling in the blanks’ – you can look beyond the veil and find the information on your missing persons.

I’m getting a ‘J’…
Imputing missing data is a bit like a psychic reading, with the statistician in the role of the psychic. Like a psychic using cold reading, the simplest method starts with the most general option available: using the mean value to fill in the blanks. A psychic might start contacting the other with a very general statement, naming just one letter of a name of someone he or she is contacting. As with using a mean value, or mean imputation as statisticians like to call it, a systolic blood pressure of 120 for instance, this will ring true for a lot of people. However, because you are predicting the same thing for everyone, a lot of people will be left out. In other words, there isn’t enough variation in your prediction to take the varying nature a measure like blood pressure into account.

A ‘J’… John? I’m hearing from a John…
In order to introduce a bit of variation, statisticians can use something called regression imputation. Rather than just using the mean value of the whole group of people that was measured, you take your other variables into account as well. For instance, when a psychic seems someone responding to the ‘J’, they look for other clues. Maybe there is an elderly woman in audience, who is likely to be there to contact her husband who has passed away. ‘John’ is common name, so guessing that she is probably there to contact a male, the psychic has used a bit of extra information to predict the missing information.  Likewise, to predict the missing measurements on blood pressure, you can take account of the age or gender of the person you are measuring.

Does the name John mean anything to you? John. Or Jonah, Jonathan, Jack, Jake…
Although regression imputation is a big step forward from the monotony of mean imputation, there are still some issues. As with the psychic guessing the name ‘John’, a single guess might still be off. Therefore, a psychic often repeats the trick, going through a list of potential names till they hit the jackpot. Statisticians can use similar techniques when using multiple imputation. Similar to regression imputation, a value is predicted using information that is already there, but rather than going with the first attempt, multiple predictions are made. Unfortunately, statisticians don’t have a willing audience telling them when their prediction is right, so we use a set of rules, called Rubin’s rules (after Donald Rubin, a professor of statistics at Harvard), to combine the results in a single, accurate and precise estimate.

The most difficult part of taking missing data into account is deciding on the mechanism of missingness. There is no test to see whether the blanks are completely random; even if the computer virus has a slight preference for larger numbers this assumption will be invalid. Nevertheless, many research studies, especially clinical trials, like to assume that their data is missing completely at random, and using this as a justification to completely ignore the problem.

In reality, this is very rare. Often the people who quit trials will have their reasons: they are the ones experiencing side effects, or the pill the trial is working is not having the effect they were expecting and they’ve gone back to usual care. Ignoring these reasons for missing information can have big effects. If the people on which your treatment wasn’t working disappear, you’re only left with the ‘responders’: patients for whom the medication works. Basing this analysis on this group might give some overly optimistic results. Therefore, it is important to consider the problem of missing data when reading journal articles claiming to have found a new wonder-drug, or when designing your own research. Although I’ve only touched upon some of methods for dealing with missing data, there are lots of options available (missingdata.org.uk is an excellent place to start). So get that crystal ball out and start filling in the blanks!

Dude, where’s my data?

In almost all research, data goes missing. Maybe the dog ate your lab book, or you’ve got some office mates with a score to settle. Luckily there is a whole field of missing data research that can come to your rescue. However, in order to use any methods to deal with missing data, you first have to try and figure out what the mechanism behind your data missingness is. And to help you find out what is going on in your data: my first attempt at data visualisation (or more concept visualisation in this case).

To conclude: there are three mechanisms of missingness, with their own catchy names. As you might guess at this point, the missingness mechanism determines how should go about analysing your data. But more on that in another post that’s coming up soon.

Live below the line day 1: that CAN do attitude

Inequality and deprivation are some of the most important problems in the world today. They limit access to health care, create a whole range of specific health care problems and can even limit the lifespan of people living a civilised, developed city as London. As Sir Michael Marmot famously pointed out: from Westminster to Canning Town, separated by just 8 stops on the Jubilee line, the average life expectancy drops by 6 years.

To demonstrate the problem in a more concrete way, I’m participating in Live Below the Line this week. I’m living of £1 a day, the UK equivalent of the extreme poverty, something which 1.4 billion people on this planet have to every day. Also, to try and make the world a tiny bit better, I’m trying to raise money for Positive Women, a charity set up to empower women and children in Swaziland. You can support them via my sponsor page.

And now for my discoveries on my first day of living on £1: you really need that can do attitude. And I mean that quite literally: all I could afford for my £1 were cans.

Cans. And instant noodles (£0.11!!)

Fresh vegetables and fruit are definitely off the table for the week. As is meat, but as a vegan I’m pretty much okay with that. I’ve tried to bulk buy stuff for the whole week, and have managed to keep spendings within budget so far. I shall see how I fare the rest of the week.

Live below the line day 1: bulk buying

I’ll try and blog about problems I run into this week, and introduce an occasional begging slot to make my goal of raising £100 for Positive Women (Please donate! Even if it’s just £1!). See you on the other side!

Theses the world: a world of theses?

Today,  a copy of the thesis of a friend of mine from the Netherlands dropped through my mailbox. If you, like me, are from the Netherlands, that sentence will contain nothing out of the ordinary. People from Britain however, will be in awe at the apparent size of my mailbox, if it can muster up the capacity to have an entire thesis pushed through it.

It was one of the first things my now supervisor told me when I met her to discuss my potential PhD project: the thesis. Did I realise how much work it was going to be? Especially compared to Dutch theses? I’d always secretly dreamt about one day writing a book, so working on a magnum opus on three years of my own research seemed like it might be right up my alley. And anyway, I’d read a couple of (Dutch) theses and they seemed entirely within reach. Even if the British version was to be twice as big (which I was sure was an overestimation, a worst case scenario), it still would be attainable. The world of academia is full of PhDs, how hard could it really be?

As you might guess at this point, I was slightly baffled, to say the least, when I first encountered the doorstop that is a thesis in this part of the world. Now, a year and a half later, when I seem to have accepted what I’ve let myself in for, my friend’s thesis reminded me of what I could have gotten away with, so to say. And it got me thinking.

What should a thesis contain? How big should it be? I’ve heard many stories from PhD hopefuls and PhD completees over the last year and there seems to be a huge variation in theses. Not only between countries (I’ve only seen theses from the UK and the Netherlands), but also between universities, and even between different departments within the same university.

According to Wikipedia, a dissertation or thesis is a document submitted in support of candidature for an academic degree or professional qualification presenting the author’s research and findings. That description allows for many interpretations, which I am sure there are.

A bulky British and dainty Dutch thesis

A bulky British and dainty Dutch thesis

The most important difference between the two theses in the picture above is their aim. While the thesis is an aim in its own worth in the world of UK universities, it is merely a tangible summary of the work you’ve accomplished in the Netherlands. Publishing your results is deemed more important, and the thesis functions as a binder of those studies, with a short general introduction and discussion to hold the whole thing together (the Dutch thesis pictured about is actually rather bulky as it contains five papers, rather than the standard three). Whether you’ve passed your examination depends more on your thesis defence and publication record than what you’ve actually put in your little paperback.

Meanwhile, in the UK you can pass your viva and become a doctor of philosophy without even a single publication, as long as you’ve done the works on your thesis. Writing the thesis, almost as much as doing the necessary research, becomes a rite of passage.

I’m sure there are many more varieties out there of the written account of completing a doctoral degree. A whole world of theses. So what do they look like? Do they look like the playful Dutch paperbacks, or are more them in the serious looking UK corner? And what do they contain? Published papers? Extensive accounts of every single piece of research? Every single graph and table ever produced? And maybe the most important question: what should be in them? I hope to hear from you in the comments!