Science Needs Data Sharing Like Sports Needs Doping Controls

I’ve got a good story for you. One that goes back to the early days of this blog, a time when I wrote about cycling.

It turns out that the biggest medical news thus far in 2016 has a connection, albeit slight, to the recent doping news out of Belgium.

You’ve heard the news from Cyclocross Worlds: an under-23 year-old woman was caught with a motor in her bike. Although the mainstream media was surprised, this Italian journalist called it “old-stuff.”

How does cheating in sports relate to medical science? The connection is trust and confidence–the ability to believe.

Medical science is in the midst of a confidence crisis. One problem is replication. More and more, one group’s discoveries cannot be replicated by another group. For a cycling connection, think about the records set in the famous climbs of the Tour during the EPO era: they cannot be replicated–because they were not real.

Another problem with medical science is that negative studies get published less often than positive studies. A drug company does four studies on its new drug. Two studies show positive effects and two are neutral. Which studies do you think make it into journals? This is called publication bias–and it has the effect of making a treatment look more positive than it is. It would be like racing a bike only on days when you were good.

Then there is something called data-dredging. A major study collects data from numerous centers across the globe. The raw datasheet is massive.

Look at the picture of the cone to the right. Screen Shot 2016-02-03 at 6.12.37 AMIn the same way that cutting a cone in different ways leads to different shapes, a group of scientists can analyze their data sets in ways that produce positive results.  There are supposed to be rules to prevent this, but the rigor with which statistics are judged varies. Consider the case when peer-reviewers are friends of the author, or, say, when the journal editors know a study will lead to prestige or big sales of reprints.

The point is that in medical science and in sports, there are great treasures but only for great results.

That’s why a new proposal from a group of medical editors is big news. The proposal would require that authors of studies share their raw data as a condition of publication in the journal. Medical researchers will have six months after a study is published to share the data.

This utterly disrupts the status quo. Currently, raw data has remained the property of either the scientists or the sponsor of the trial. It’s hard work designing a trial, collecting the data, and doing the analyses. Scientists, therefore, would often write many papers from the datasheets, sometimes over the course of years. These datasheets provided the gratification delayed for the grunt work of research. Publications are the currency of progress in academics.

If this proposal is enacted, outside researchers, who had no connection to the study, can look at the data and do their own analyses. They can verify the findings, try to disprove the findings, or ask different questions. Think about that. Is it right that almost anyone with a laptop can benefit from the hard work of the original researchers?

Another issue here is the importance of honoring the patients who volunteered to be experimented on. These people are owed a great debt: to maximize the knowledge generated from the experiment. Would any patient consent to a study if their data was not used to advance science?

My essay on this was one of the toughest I’ve written. It took hours upon hours. I tried to see the proposal from both sides. An outsider sees obvious advantages, but researchers have serious concerns.

The title is linked below:

To Believe in Science Is to Believe in Data Sharing



  1. So much research is paid for in whole or in large part by public money. The public serve as guinea pigs for the randomized treatments.

    If we have to throw around terms like ‘data parasites’, the people who hoard the products of this treasure and sacrifice for themselves might qualify.

    That said, we need a warning. Real science is messy. It has twists and turns. Decisions are made at every step of the way that can influence, knowingly or unknowingly, what is recorded. There is a difference between what happens and what is written down. There is a difference between science written down and science written up. You can’t pass all of that over to someone else in a spreadsheet.

    Taking a data set and squeezing it until something useful comes out is not statistically sound practice. Bonferronni lives!

    Starting with better questions will yield more useful information than all the re-analysis in the world.

  2. As the son of an internatinally-recognized neurosurgeon, chief of neurosurgery at NYC’s Columbia-Presbyterian Hospital, now retired and deceased, I know my father (and I) would be totally in favor of data-sharing. It is how the field moves forward and how people get helped, Thank you for for bringing this and other key subjects to our attention.

    Eugene Pool

  3. I’d like to thank you for your very informative blog. As I’m french speaking, it took me some time before finding your blog, and some time again to read your numerous posts speaking about AFib.
    As comments are closed for your article « Treating atrial fibrillation in athletes : tough choices », I’m posting here. Sorry for that (and for the mistakes in my writing).
    I’d like to comment about this :
    « The number of emails that come from fellow cyclists (and endurance athletes) with heart rhythm issues amazes me. I am more convinced than ever that our “hobby” predisposes us to electrical issues like atrial fibrillation (AF)—that the science is right. »
    I have no stats, but there are well known facts. The first one is that a lot of people don’t know they have atrial fibrillation because they don’t feel it. The second one is that those who know they have don’t necessarily worry about it : they live with their Afib (and, for some, with appropriate medicine).
    Conversely, I’m convinced most athletes feel immediately their Afib and worry about it. They go see doctors and they make their resarch on the Web. And they e-mail you because, as you’re both athlete and cardiac electrophysiologist (and have a blog about that), they believe you can answer their questions.
    Again, I have no stats, but I’m wondering : does our « hobby » really predispose us to AF, at least as strongly as one may think, reading your posts ?
    Three months ago, I got my first experience with AFib, and it was frightening. My doctors explained me I was suffering from both atrial fibrillation and atrial flutter. And they told me those very annoying disorders had, in my case, nothing to see with any other pathology. I’m 58 and my heart is normal, my blood pressure is normal, my veins and arteries look normal, my BMI is normal… and so on, and so on. I’ve a nearly stressless job, a good family life and no problems with the neighbours. I ride my bike about 5h/week, commuting or touring. No competition. My average speed is a quiet 22-23 km/h (and there are no mountains here in Belgium). I moderately drink wine, beer, coffee or tea… Just one glass of wine or beer from time to time, with my main meal ; rarely more (but often less) than three cups of coffee a day. I’m mainly drinking water. You know what I mean : I’m probably like a good part of your readers. Those who send you e-mails.
    Had I chosen not being a sportsman, would I have now a BMI of 30 and some pathologies predisposing to Afib ? Would I have Afib without knowing ? Both my parents have Afib (probably for years) without knowing. And athletes they were not.
    Don’t you think, and particularly considering sportive people having Afib in a « normal » heart, that it might be some form of mistake telling them that their « hobby » predisposes them to this disease ? Consider they’re probably in good health for years, thanks to their beloved activity. Sports have benefits we don’t have to underestimate.

Comments are closed.