Methodology

Our methodology

Below are the core principles we try to follow, with reservations. This might change over time as we steelman the techniques but we had to start somewhere.

Use structured reasons and arguments (Dutilh Novaes & Zalta, 2021; Walton, 1988).
If possible, use arguments based on evidence (Kelly & Zalta, 2016), science (Hepburn et al., 2021), and rationalism (Markie et al., 2021).
Use a hierarchy of evidence (Schünemann et al., 2022) that generally values certain types of evidence higher than others in the following order:
1. Experimental studies (Franklin et al., 2021) by experts in peer-reviewed, scientific journals
  1. Meta-analyses, systematic reviews (Lasserson et al., 2022) and umbrella reviews (Ioannidis, 2009) of the following types of studies:
  2. Randomized controlled trial (RCT) experiments (Reiss et al., 2022; Kendall, 2003): Patients are randomly assigned to an intervention group or a control group (not receiving the intervention) and the groups are compared, hopefully controlling for confounding variables (Lu, 2009):
    1. Placebo-controlled: A placebo is assumed to have no or minimal effect
    2. Non-placebo-controlled
    3. Versions of the above two types:
      1. Quadruple blinded (patient, experimenters, data analyst, and care givers)
      2. Triple blinded (patient, experimenters, and data analyst)
      3. Double blinded (patient and experimenters)
      4. Single blinded (patient)
      5. Unblinded
  3. Experiments without a control group
    1. Experiments on groups
    2. Experiments on individuals (case studies)
  4. Experiments exploring a mechanism of action (Craver et al., 2019)
2. Correlational/observational studies by experts in peer-reviewed, scientific journals
  1. Meta-analyses, systematic reviews, and umbrella reviews of the following types of studies:
  2. Cohort studies (Euser et al., 2009): Follow one exposed group and a non-exposed group (control) and compare outcomes.
    1. Prospective: Baseline is assessed and then researchers actively follow patients to perform a follow-up: More accurate data collection
    2. Retrospective: Historical analysis of existing data
  3. Case-control studies (Lu, 2009): Follow one group with an outcome and another without an outcome (control) and compare exposure.
  4. Cross-sectional studies (Lu, 2009): Analyze whether individuals were exposed and whether they had certain outcomes and compare to those that didn’t (control).
  5. Observations without a control group
    1. Observations of groups
    2. Observations of individuals (case studies)
  6. Ecological studies (Lu, 2009): Similar to cross-sectional studies but groups are analyzed instead of individuals
3. Simulated model (Frigg et al., 2020) results by experts in peer-reviewed, scientific journals
4. Opinions by experts in peer-reviewed, scientific journals
  1. Groups of experts
  2. Individual experts
5. All of the above but not in peer-reviewed, scientific journals
6. All of the above but not by experts
Use a burden of proof

Causation and Correlation

Most arguments pre-suppose some theory of causation (Gallow, 2022) where one or more things happening are necessary, sufficient, and/or contribute to one or more other things happening.

A related concept is correlation where one or more things happening may be associated, with some probability, with one or more other things happening.

However, it’s possible for things to be highly correlated but causally unrelated which is called a spurious correlation (Aldrich, 1995), thus leading to the common warning that “correlation does not [necessarily] imply causation”. This has been known since the late 19th century (Pearson, 1897).

Burden of Proof

A burden of proof is an expectation by one side about the strength of argument required by another side for persuasion. A burden of proof may be useful to establish the core of a debate and avoid an argument going on indefinitely (Walton, 1988b). A burden of proof may assert controversial philosophical or ethical premises, but we think this is still valuable in clarifying the context and exit criteria of an argument.

Potential problems of science

Potential problems with a hierarchy of evidence and evidence-based medicine

There are potential problems with the concept of a hierarchy of evidence and the related “evidence-based medicine” (EBM) movement (Murad et al., 2016; Anglemyer et al., 2014; Frieden, 2017; Stegenga, 2018; Blunt, 2015; Jureidini & McHenry, 2022; Rawlins, 2008; Charlton, 2009; Charlton & Miles, 1998).
A lack of evidence higher in the hierarchy is not necessarily problematic due to infeasibility (Smith & Pell, 2003; Prasad & Jena, 2013), unnecessary risks (Glasziou et al., 2007), ethical issues, etc., although there are risks to making such assumptions (Prasad et al., 2011; Prasad et al., 2013; Haslam et al., 2021; Herrera-Perez et al., 2019; Rossouw et al., 2002; Powell & Prasad, 2022).
In some cases, evidence lower in the hierarchy may be stronger; for example, a well-done experiment might be stronger than a poorly done RCT.

General potential problems with science

General low quality (Altman, 1994; Altman, 2002)
Failure to retract (Doshi, 2015)
Successful results with small sample sizes but failures with large sample sizes (Hwang et al., 2016)
Published results tend to be overly optimistic about effect sizes because of low power and selection on statistical significance (Ioannidis, 2008)
Poor rates of replication (Errington et al., 2021; Open Science Collaboration, 2015; Henry & Fitzpatrick, 2015)
Incomplete or distorted reporting of results (BMJ, 2012; The Cochrane Collaboration, 2014; Le Noury et al., 2015)
Poor incentives (Horton, 2015; Nosek et al., 2012)
Ostensibly useful results that are instead likely due to noise or poor data quality, despite honesty and transparency (Gelman, 2017)
Incomplete analysis of results (BMJ, 2012)
Lack of awareness of problems and lack of analytical skills by medical professionals (Ioannidis et al., 2017)
Poor evidence for either effectiveness or harm (Ioannidis, 2023)
Statistically significant but false effects due to low statistical power (Button et al., 2013)
Lacking or poor reproduction, or too much reproduction (Ioannidis & Trikalinos, 2007; Henry & Fitzpatrick, 2015)
Non-random sampling (Carlisle, 2017)
Difficulties analyzing data with multilevel structure (Gelman & Brown, 2024)
Misinterpretation of the literature (Gelman & Brown, 2024)
High heterogeneity violating random-effects model assumptions (Stanley et al., 2022) and creating invalid generalizations (Bryan et al., 2021)
Poor external validity (Abaluck et al., 2025; Reiss et al., 2022; Rawlins, 2008; Hill, 1966)
Indirect interpretations (Guyatt et al., 2011)
Incorrect statistical controls (Westfall & Yarkoni, 2016)
Medical reversals (Prasad & Cifu, 2012)
Mistakes (e.g. data entry, variable coding, statistics, over-interpretation, etc.) (Gelman, 2017; Brown & Heathers, 2017)
Multiple comparisons (Bennett et al., 2009)
p-hacking, researcher degrees of freedom, multiple potential comparisons, or fishing expeditions (Wasserstein & Lazar, 2016; Wasserstein et al., 2019; Simmons et al., 2016; Humphreys et al., 2013; Gelman & Loken, 2013)
Misconduct and fraud (Fanelli, 2009; Piller, 2024; Smith, 2021; Crocker, 2011; Van Noorden, 2022; Adam, 2019; Fang et al., 2012; Thacker, 2021, Else, 2019)
Collider bias (Holmberg & Andersen, 2022)
Incomplete reporting of what a “placebo” is in studies (Hong et al., 2023; Golomb et al., 2010)
Advertising an “inactive” placebo that has not been evaluated for inactivity (Tomljenovic & McHenry, 2024)
Conflicts of interest and funding distortions (Lexchin et al., 2003; Lundh et al., 2010; Angell, 2009)
Improper understanding and/or disclosure of blinding (Schulz & Grimes, 2002)
Impacts of lack of double blinding (Hrobjartsson et al., 2012)
Improper use of placebo run-in (Scott et al., 2022)
The problem of multiplicity in frequentist analysis (Rawlins, 2008b)
Directional errors inhibiting scientific self-correction (Agley et al., 2025)
Questionable exclusion criteria (He et al., 2020; Rawlins, 2008)
Likelihood of objectivity (whether conscious or subconscious) of researchers (Angell, 2009b)
Likelihood of objectivity (whether conscious or subconscious) of clinicians (Angell, 2009b)
Likelihood of objectivity (whether conscious or subconscious) of organizations (Angell, 2009b)
Poor instrument or method reliability (Vul et al., 2009)
Limited post-publication critique (Hardwicke et al., 2022)
Experimenter effects (Sorge et al., 2014; Schlitz et al., 2006)
Publication selection bias (Bartos et al., 2022) and difficulty detecting it (Tang & Liu, 2000)
Institutional inertia and politics (Rigas et al., 1999), etc.
Peer review failures, biases, and gate keeping (Huber et al., 2022; Ferguson et al., 2014; Siler et al., 2015; Sackett, 2000)
Low quality evidence (Howick et al., 2022)
Under-reporting of harms (Howick et al., 2022)
Poor data sharing and transparency (Hardwicke et al., 2022b; Gabelica et al., 2022; Gelman, 2017)
Outcome switching (Altman et al., 2017)
Failures of pre-registration (Brodeur et al., 2022)
Incorrect sub-group analyses (Peto, 2011)
Reporting biases (Weinerova et al., 2022)
Lack of post-publication review (Gelman, 2017)
Fake studies (Brainard, 2023, Van Noorden, 2023)
Undisclosed financial incentives of public health authorities (Lenzer, 2015)
High rates of retractions (Van Noorden, 2023b)
Significant variation in analyses of complex subjective data, even by experts with honest intentions (Silberzahn et al., 2018)
White hat bias (ends justifies the means or medical Machiavellianism) (Cope & Allison, 2010)
Overconfident claims such as “safe and effective” (Doshi, 2015)

Potential problems with specific scientific methods

Meta-analyses
1. Study selection and reviewer bias (Jørgensen et al., 2018; Rawlins, 2008d)
2. Expert disagreement and subjectivity (Bauchner & Ioannidis, 2024)
3. Transparency (Coyne et al., 2010)
4. Poor quality (Ioannidis, 2016)
5. Poor reproducibility (Bodnaruc et al., 2025)
RCTs
1. Biased results (Krauss, 2018; Vinkers et al., 2021)
2. Simpson’s Paradox (Sprenger et al., 2021)
3. Selective reporting (Baasan et al., 2022)
4. Insufficient power calculations to detect adverse events (Rawlins, 2008c)
5. Misunderstanding the number needed to treat (NNT) (Bauchner & Ioannidis, 2024b)
6. Poorly described random sequence generation (Baasan et al., 2022)
7. Poorly described allocation concealment (Baasan et al., 2022)
8. Lack of clinical insights from excessive blinding (Hill, 1966b)
Observational studies
1. Equally justifiable but different ways of analyzing data, each of which may produce different results (Wang et al., 2024)
Hierarchical/multilevel regression models
1. Degenerate covariance matrix estimates that do not have a practical interpretation, commonly for multilevel models when data are noisy and the number of groups is small (Chung et al., 2015)

Expert opinions on potential problems with science

Dr. Marcia Angell, physician and editor-in-chief of The New England Journal of Medicine (Angell, 2009)

It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of The New England Journal of Medicine.
Dr. Richard Horton, editor-in-chief of The Lancet (Horton, 2015)

The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. […] Can bad scientific practices be fixed? Part of the problem is that no-one is incentivised to be right. […] Following several high-profile errors, the particle physics community now invests great effort into intensive checking and re-checking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal. Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are).
Dr. John Ioannidis, Professor of Medicine, Epidemiology and Population Health, Statistics and Biomedical Data Science at Stanford (Freedman, 2010)

Science is a noble endeavor, but it’s also a low-yield endeavor. I’m not sure that more than a very small percentage of medical research is ever likely to lead to major improvements in clinical outcomes and quality of life. We should be very comfortable with that fact.
Dr. Irving Langmuir, Nobel prize for Chemistry (Langmuir & Hall, 1989)

These are cases where there is no dishonesty involved but where people are tricked into false results by a lack of understanding about what human beings can do to themselves in the way of being led astray by subjective effects, wishful thinking or threshold interactions. These are examples of pathological science.
Austin Bradford Hill, one of the pioneers of randomized controlled trials (Hill, 1966)

poorly-constructed trials not only teach us nothing but may even be dangerously misleading- particularly when their useless data are spuriously supported by all the latest statistical techniques and jargon.

Another problem lies in the biological variation of the human material with which we have to deal. […] the lack of clear-cut evidence for or against […] may come rather from the fact that their benefits are only marginal. It is that marginality, though perhaps combined with biological variability, that makes the results obscure.

Of course, as Pasteur observed, if we are looking for something there is the danger that we may find it. And there is the danger that if we take twenty bites at the cherry we shall at one time bite off a “significant” chunk. In short, in our comprehensive search, we may be misled by an association that is not causation. But surely, to parody the poet, “tis better to have looked and lost than never to have looked at all”. To seek through one’s data for clues, with an exacting conscience and with a cautious outlook, is demanded of every investigator. The clue may well be no more than a clue. Certainly we may not wish to draw conclusions. But with a bit of luck we may have learned something that we can put to the test in future observations, and perhaps in a further trial.

a related criticism of the present controlled trial-that it does not tell the doctor what he wants to know. It may be so constituted as to show without any doubt that treatment A is on the average better than treatment B. On the other hand, that result does not answer the practising doctor’s question what is the most likely outcome when this drug is given to a particular patient?

The trouble is that with many diseases and many treatments we are too ignorant to know where even to begin to look.
Doug Altman, Statistician with a BMJ Lifetime Achievement Award for outstanding contribution to the improvement of the scientific and medical research literature (Altman, 1994; BMJ, 2010)

What should we think about a doctor who uses the wrong treatment, either wilfully or through ignorance, or who uses the right treatment wrongly (such as by giving the wrong dose of a drug)? Most people would agree that such behaviour was unprofessional, arguably unethical, and certainly unacceptable.

What, then, should we think about researchers who use the wrong techniques (either wilfully or in ignorance), use the right techniques wrongly, misinterpret their results, report their results selectively, cite the literature selectively, and draw unjustified conclusions? We should be appalled. Yet numerous studies of the medical literature, in both general and specialist journals, have shown that all of the above phenomena are common. This is surely a scandal.

Bailar suggested that there may be greater danger to the public welfare from statistical dishonesty than from almost any other form of dishonesty.

The poor quality of much medical research is widely acknowledged, yet disturbingly the leaders of the medical profession seem only minimally concerned about the problem and make no apparent efforts to find a solution. Manufacturing industry has come to recognise, albeit gradually, that quality control needs to be built in from the start rather than the failures being discarded, and the same principles should inform medical research. The issue here is not one of statistics as such. Rather it is a more general failure to appreciate the basic principles underlying scientific research, coupled with the “publish or perish” climate.
Richard Smith, former editor of the British Medical Journal (BMJ) (Smith, 2014)

In his editorial entitled, “The Scandal of Poor Medical Research,” Altman wrote that much research was “seriously flawed through the use of inappropriate designs, unrepresentative samples, small samples, incorrect methods of analysis, and faulty interpretation.” Twenty years later I fear that things are not better but worse.

Ethics committees, who had to approve research, were ill equipped to detect scientific flaws, and the flaws were eventually detected by statisticians, like Altman, working as firefighters. Quality assurance should be built in at the beginning of research not the end, particularly as many journals lacked statistical skills and simply went ahead and published misleading research.

Sadly, the BMJ could publish this editorial almost unchanged again this week. Small changes might be that ethics committees are now better equipped to detect scientific weakness and more journals employ statisticians. These quality assurance methods don’t, however, seem to be working as much of what is published continues to be misleading and of low quality. Indeed, we now understand that the problem doesn’t arise from amateurs dabbling in research but rather from career researchers.

In January 1994 at age 41, when we published Altman’s editorial, I had confidence that things would improve. In 2002 I spent eight marvellous weeks in a 15th century palazzo in Venice writing a book on medical journals, the major outlets for medical research, and reached the dismal conclusion that things were badly wrong with journals and the research they published. I wondered after the book was published if I’d struck too sour a note, but now I think it could have been sourer. My confidence that “things can only get better” has largely drained away, but I’m not a miserable old man. Rather I’ve come to enjoy observing and cataloguing human imperfections, which is why I read novels and history rather than medical journals.

Other general methodology points

We use the heuristic that mistakes are generally made due to incompetence rather than malice (Bloch, 2003), although the latter is certainly possible.
When comparing arguments, the number of points doesn’t necessarily matter. For example, a strong RCT or mechanistic study may outweigh dozens of alternative points.

Click here to report problems and/or suggestions for this page (requires a free GitHub.com account).

References

139 references

(Abaluck et al., 2025):
“This paper finds that cancer patients who are at risk of serious adverse events are substantially less likely to be enrolled in clinical trials yet more likely to experience adverse events as a causal result of drug treatment.”

Abaluck, J., Agha, L., & Shah, S. (2025). Trials Avoid High Risk Patients and Underestimate Drug Harms (No. w34534). National Bureau of Economic Research. DOI: 10.3386/w34534. https://doi.org/10.3386/w34534 ; Recommended: https://www.nber.org/system/files/working_papers/w34534/w34534.pdf
(Adam, 2019):
Adam, D. (2019). How a data detective exposed suspicious medical trials. Nature, 571(7766), 462-465. DOI: 10.1038/d41586-019-02241-z. https://doi.org/10.1038/d41586-019-02241-z
(Agley et al., 2025):
Agley, J., Deemer, S. E., & Allison, D. B. (2025). “Non-Markovian” and “directional” errors inhibit scientific self-correction and can lead fields of study astray: an illustration using gardening and obesity-related outcomes. BMC Medical Research Methodology, 25(1), 137. DOI: 10.1186/s12874-025-02590-6. https://doi.org/10.1186/s12874-025-02590-6 ; Recommended: https://link.springer.com/content/pdf/10.1186/s12874-025-02590-6.pdf
(Aldrich, 1995):
Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statistical science, 364-376. DOI: 10.1214/ss/1177009870. https://doi.org/10.1214/ss/1177009870
(Altman, 1994):
Altman, D. G. (1994). The scandal of poor medical research. BMJ, 308(6924), 283-284. DOI: 10.1136/bmj.308.6924.283. https://doi.org/10.1136/bmj.308.6924.283 ; Recommended: https://www.bmj.com/content/308/6924/283/
(Altman, 2002):
Altman, D. G. (2002). Poor-quality medical research: what can journals do?. Jama, 287(21), 2765-2767. DOI: 10.1001/jama.287.21.2765. https://doi.org/10.1001/jama.287.21.2765
(Altman et al., 2017):
Altman, D. G., Moher, D., & Schulz, K. F. (2017). Harms of outcome switching in reports of randomised trials: CONSORT perspective. BMJ, 356. DOI: 10.1136/bmj.j396. https://doi.org/10.1136/bmj.j396
(Angell, 2009):
Angell, M. (2009). Drug companies & doctors: A story of corruption. The New York Review of Books, 56(1), 8-12. Retrieved August, 2022 from https://www.nybooks.com/articles/2009/01/15/drug-companies-doctorsa-story-of-corruption/
(Angell, 2009b):
“It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of The New England Journal of Medicine.”

Angell, M. (2009). Drug companies & doctors: A story of corruption. The New York Review of Books, 56(1), 8-12. Retrieved August, 2022 from https://www.nybooks.com/articles/2009/01/15/drug-companies-doctorsa-story-of-corruption/
(Anglemyer et al., 2014):
Anglemyer, A., Horvath, H. T., & Bero, L. (2014). Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database of Systematic Reviews, (4). DOI: 10.1002/14651858.MR000034.pub2. https://doi.org/10.1002/14651858.MR000034.pub2 ; Recommended: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.MR000034.pub2/epdf/full/en
(Baasan et al., 2022):
Baasan, O., Freihat, O., Nagy, D. U., & Lohner, S. (2022). Methodological quality and risk of bias assessment of cardiovascular disease research: analysis of randomized controlled trials published in 2017. Frontiers in Cardiovascular Medicine, 9, 830070. DOI: 10.3389/fcvm.2022.830070. https://doi.org/10.3389/fcvm.2022.830070 ; Recommended: https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2022.830070/pdf
(Bartos et al., 2022):
Bartoš, F., Maier, M., Wagenmakers, E. J., Nippold, F., Doucouliagos, H., Ioannidis, J., … & Stanley, T. D. (2022). Footprint of publication selection bias on meta-analyses in medicine, economics, and psychology. arXiv preprint arXiv:2208.12334. DOI: 10.48550/arXiv.2208.12334. https://doi.org/10.48550/arXiv.2208.12334 ; Recommended: https://arxiv.org/pdf/2208.12334
(Bauchner & Ioannidis, 2024):
“Experts often subjectively disagree on how they interpret the same evidence and what recommendations they derive from it. Meticulous processes to resolve diverging views in guideline development efforts, for example, may not remove subjectivity. Even the most prestigious organizations sometimes have different guideline recommendations. Subjective disagreements can be common, extreme, and unsettling when evidence is limited and rapidly evolving—as in many questions related to COVID-19. However, subjectivity exists, and differences ensue even for common diseases where evidence has accrued and been evaluated for decades. For example, the American College of Physicians, the American Cancer Society, and the US Preventive Services Task Force (USPSTF) vary on when to initiate screening for colorectal cancer and the preferred screening methods. Breast cancer and depression screening recommendations have been debated for decades.”

Bauchner, H., & Ioannidis, J. P. (2024). The subjective interpretation of the medical evidence. In JAMA Health Forum (Vol. 5, No. 3, pp. e240213-e240213). American Medical Association. DOI: 10.1001/jamahealthforum.2024.0213. https://doi.org/10.1001/jamahealthforum.2024.0213
(Bauchner & Ioannidis, 2024b):
“Even in a positive RCT, the number needed to treat can be large; that is, most individuals will not benefit.”

Bauchner, H., & Ioannidis, J. P. (2024). The subjective interpretation of the medical evidence. In JAMA Health Forum (Vol. 5, No. 3, pp. e240213-e240213). American Medical Association. DOI: 10.1001/jamahealthforum.2024.0213. https://doi.org/10.1001/jamahealthforum.2024.0213
(Bennett et al., 2009):
Bennett, C. M., Baird, A. A., Miller, M. B., and Wolford, G. L. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. Poster presented at Human Brain Mapping conference. https://www.psychology.mcmaster.ca/bennett/psy710/readings/BennettDeadSalmon.pdf
(Bloch, 2003):
Bloch, A. (2003). Murphy’s law. Penguin. https://archive.org/details/murphyslawbooktw00bloc/page/52/mode/2up
(Blunt, 2015):
Blunt, C. (2015). Hierarchies of evidence in evidence-based medicine (Doctoral dissertation, London School of Economics and Political Science). Retrieved July, 2022, from https://etheses.lse.ac.uk/3284/1/Blunt_heirachies_of_evidence.pdf
(BMJ, 2010):
“The quality of worldwide health research owes a great debt to Douglas Altman, one of the world’s leading experts in health research methodology, statistics, and reporting.”

BMJ (2010). BMJ Group Lifetime Achievement Award. BMJ, 340. from DOI: 10.1136/bmj.c242. https://doi.org/10.1136/bmj.c242
(BMJ, 2012):
“Why aren’t all clinical trial data routinely available for independent scrutiny once a regulatory decision has been made? How have commercial companies been allowed to evaluate their own products and then to keep large and unknown amounts of the data secret even from the regulators? Why should it be up to the companies to decide who looks at the data and for what purpose? Why should it take legal action (as in the case of GlaxoSmithKline’s paroxetine and rosiglitazone), strong arm tactics by national licensing bodies (Pfizer’s reboxetine), and the exceptional tenacity of individual researchers and investigative journalists (Roche’s oseltamivir) to try to piece together the evidence on individual drugs? […] the Cochrane group has told the BMJ that about 60% of Roche’s data from phase III trials of oseltamivir have never been published. And although the European Medicines Agency (EMA) could have requested these data from Roche, it did not do so. This means that tax payers in the United Kingdom and around the world have spent billions of dollars stockpiling a drug for which no one except the manufacturer has seen the complete evidence base. Indeed the EMA’s unprecedented infringement proceedings launched against Roche last month suggest that even the manufacturer has never fully evaluated evidence it has collected on the drug’s adverse effects.”

BMJ (2012). Clinical trial data for all drugs in current use. BMJ, 345. from DOI: 10.1136/bmj.e7304. https://doi.org/10.1136/bmj.e7304 ; Recommended: https://www.bmj.com/content/bmj/345/bmj.e7304.full.pdf
(Bodnaruc et al., 2025):
Bodnaruc, A. M., Khan, H., Shaver, N., Bennett, A., Wong, Y. L., Gracey, C., … & Moher, D. (2025). Reliability and reproducibility of systematic reviews informing the 2020–2025 Dietary Guidelines for Americans: a pilot study. The American Journal of Clinical Nutrition, 121(1), 111-124. DOI: 10.1016/j.ajcnut.2024.10.013. https://doi.org/10.1016/j.ajcnut.2024.10.013
(Brainard, 2023):
Brainard, J. (2023). Fake scientific papers are alarmingly common. Science, 380(6645), 568-569. DOI: 10.1126/science.adi6523. https://doi.org/10.1126/science.adi6523 ; Recommended: https://www.science.org/content/article/fake-scientific-papers-are-alarmingly-common
(Brodeur et al., 2022):
Brodeur, A., Cook, N., Hartley, J., & Heyes, A. (2022). Do Pre-Registration and Pre-analysis Plans Reduce p-Hacking and Publication Bias?. Available at SSRN. DOI: 10.2139/ssrn.4180594. https://doi.org/10.2139/ssrn.4180594
(Brown & Heathers, 2017):
Brown, N. J., & Heathers, J. A. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369. DOI: 10.1177/1948550616673876. https://doi.org/10.1177/1948550616673876
(Bryan et al., 2021):
Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature human behaviour, 5(8), 980-989. DOI: 10.1038/s41562-021-01143-3. https://doi.org/10.1038/s41562-021-01143-3
(Button et al., 2013):
Button, K. S., Ioannidis, J., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews neuroscience, 14(5), 365-376. DOI: 10.1038/nrn3475. https://doi.org/10.1038/nrn3475
(Carlisle, 2017):
Carlisle, J. B. (2017). Data fabrication and other reasons for non‐random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia, 72(8), 944-952. DOI: 10.1111/anae.13938. https://doi.org/10.1111/anae.13938 ; Recommended: https://associationofanaesthetists-publications.onlinelibrary.wiley.com/doi/pdfdirect/10.1111/anae.13938?download=true
(Charlton, 2009):
Charlton, B. G. (2009). The Zombie science of evidence‐based medicine: a personal retrospective. A commentary on Djulbegovic, B., Guyatt, GH & Ashcroft, RE (2009). Cancer Control, 16, 158–168. Journal of Evaluation in Clinical Practice, 15(6), 930-934. DOI: 10.1111/j.1365-2753.2009.01267.x. https://doi.org/10.1111/j.1365-2753.2009.01267.x
(Charlton & Miles, 1998):
Charlton, B. G., & Miles, A. (1998). The rise and fall of EBM. QJM: monthly journal of the Association of Physicians, 91(5), 371-374. DOI: 10.1093/qjmed/91.5.371. https://doi.org/10.1093/qjmed/91.5.371
(Chung et al., 2015):
Chung, Y., Gelman, A., Rabe-Hesketh, S., Liu, J., & Dorie, V. (2015). Weakly informative prior for point estimation of covariance matrices in hierarchical models. Journal of Educational and Behavioral Statistics, 40(2), 136-157. DOI: 10.3102/1076998615570945. https://doi.org/10.3102/1076998615570945
(Cope & Allison, 2010):
Cope, M. B., & Allison, D. B. (2010). White hat bias: examples of its presence in obesity research and a call for renewed commitment to faithfulness in research reporting. International Journal of Obesity, 34(1), 84-88. DOI: 10.1038/ijo.2009.239. https://doi.org/10.1038/ijo.2009.239
(Coyne et al., 2010):
Coyne, J. C., Thombs, B. D., & Hagedoorn, M. (2010). Ain’t necessarily so: review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology, 29(2), 107. DOI: 10.1037/a0017633. https://doi.org/10.1037/a0017633
(Craver et al., 2019):
Craver, C., Tabery, J., & Zalta, E. (Ed.) (2019). Mechanisms in Science. The Stanford Encyclopedia of Philosophy (Summer 2019 Edition). https://plato.stanford.edu/archives/sum2019/entries/science-mechanisms/ ; Recommended: https://plato.stanford.edu/entries/science-mechanisms/
(Crocker, 2011):
Crocker, J. (2011). The road to fraud starts with a single step. Nature, 479(7372), 151-151. DOI: 10.1038/479151a. https://doi.org/10.1038/479151a
(Doshi, 2015):
“A major reanalysis just published in The BMJ of tens of thousands of pages of original trial documents from GlaxoSmithKline’s infamous Study 329, has concluded that the antidepressant paroxetine is neither safe nor effective in adolescents with depression.1 This conclusion, drawn by independent researchers, is in direct contrast to that of the trial’s original journal publication in 2001, which had proclaimed paroxetine “generally well tolerated and effective.”

In 2012, GSK was fined a record $3bn (£2bn; €2.7bn), in part for fraudulently promoting paroxetine.

Then there are the matters of “editorial assistance” and undisclosed financial conflicts of interests of one of the paper’s authors. The first draft of the manuscript ultimately published in the Journal of the American Academy of Child and Adolescent Psychiatry (JAACAP) was not written by any of the 22 named authors but by an outside medical writer hired by GSK.

It is often said that science self corrects. But for those who have been calling for a retraction of the Keller paper for many years, the system has failed. None of the paper’s 22 mostly academic university authors, nor the journal’s editors, nor the academic and professional institutions they belong to, have intervened to correct the record. The paper remains without so much as an erratum, and none of its authors—many of whom are educators and prominent members of their respective professional societies—have been disciplined.

Ivan Oransky, cofounder of the Retraction Watch blog, says that transparency is vital. “GSK agreed to pay a $3bn fine and you’re [Martin] saying you had completely different results? Great. Show me.” Oransky described Martin’s silence as part of the “typical scientific playbook.” “It has certainly been our experience that journals and researchers and institutions can be incredibly stubborn about failing to retract a paper, about ignoring calls, or not responding favourably to calls to retract.”

It’s often argued that fairness in journalism requires getting “both sides” of the story, but in the story of Study 329, the “other side” does not seem interested in talking.”

Doshi, P. (2015). No correction, no retraction, no apology, no comment: paroxetine trial reanalysis raises questions about institutional responsibility. BMJ, 351. DOI: 10.1136/bmj.h4629. https://doi.org/10.1136/bmj.h4629
(Dutilh Novaes & Zalta, 2021):
Dutilh Novaes, C., & Zalta, E. (Ed.) (2021). Argument and Argumentation. The Stanford Encyclopedia of Philosophy (Fall 2021 Edition). https://plato.stanford.edu/archives/fall2021/entries/argument/ ; Recommended: https://plato.stanford.edu/entries/argument/
(Else, 2019):
Else, H. (2019). What universities can learn from one of science‘s biggest frauds. Nature, 570(7761), 287-289. DOI: 10.1038/d41586-019-01884-2. https://doi.org/10.1038/d41586-019-01884-2
(Errington et al., 2021):
Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. Elife, 10, e71601. DOI: 10.7554/eLife.71601. https://doi.org/10.7554/eLife.71601
(Euser et al., 2009):
Euser, A. M., Zoccali, C., Jager, K. J., & Dekker, F. W. (2009). Cohort studies: prospective versus retrospective. Nephron Clinical Practice, 113(3), c214-c217. DOI: 10.1159/000235241. https://doi.org/10.1159/000235241
(Fanelli, 2009):
Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PloS one, 4(5), e5738. DOI: 10.1371/journal.pone.0005738. https://doi.org/10.1371/journal.pone.0005738
(Fang et al., 2012):
Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences, 109(42), 17028-17033. DOI: 10.1073/pnas.1212247109. https://doi.org/10.1073/pnas.1212247109
(Ferguson et al., 2014):
Ferguson, C., Marcus, A., & Oransky, I. (2014). The peer-review scam. Nature, 515(7528), 480. DOI: 10.1038/515480a. https://doi.org/10.1038/515480a
(Franklin et al., 2021):
Franklin, A., Perovic, S., & Zalta, E. (Ed.) (2021). Experiment in Physics. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition). https://plato.stanford.edu/archives/sum2021/entries/physics-experiment/ ; Recommended: https://plato.stanford.edu/entries/physics-experiment/
(Freedman, 2010):
Freedman, D. H. (2010). Lies, damned lies, and medical science. The Atlantic, 306(4), 76-84. https://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/
(Frieden, 2017):
Frieden, T. R. (2017). Evidence for health decision making—beyond randomized, controlled trials. New England Journal of Medicine, 377(5), 465-475. DOI: 10.1056/NEJMra1614394. https://doi.org/10.1056/NEJMra1614394
(Frigg et al., 2020):
Frigg, R., Hartmann, S., & Zalta, E. (Ed.) (2020). Models in Science. The Stanford Encyclopedia of Philosophy (Spring 2020 Edition). https://plato.stanford.edu/archives/spr2020/entries/models-science/ ; Recommended: https://plato.stanford.edu/entries/models-science/
(Gabelica et al., 2022):
Gabelica, M., Bojčić, R., & Puljak, L. (2022). Many researchers were not compliant with their published data sharing statement: a mixed-methods study. Journal of Clinical Epidemiology, 150, 33-41. DOI: 10.1016/j.jclinepi.2022.05.019. https://doi.org/10.1016/j.jclinepi.2022.05.019
(Gallow, 2022):
Gallow, D. (2022). The Metaphysics of Causation. The Stanford Encyclopedia of Philosophy (Fall 2022 Edition). https://plato.stanford.edu/archives/fall2022/entries/causation-metaphysics/ ; Recommended: https://plato.stanford.edu/entries/causation-metaphysics/#Inst
(Gelman, 2017):
Gelman, A. (2017). Ethics and statistics: Honesty and transparency are not enough. Chance, 30(1), 37-39. DOI: 10.1080/09332480.2017.1302720. https://doi.org/10.1080/09332480.2017.1302720
(Gelman & Brown, 2024):
Gelman, A., & Brown, N. J. (2024). How statistical challenges and misreadings of the literature combine to produce unreplicable science: An example from psychology. Advances in Methods and Practices in Psychological Science, 7(4), 25152459241276398. DOI: 10.1177/25152459241276398. https://doi.org/10.1177/25152459241276398 ; Recommended: https://stat.columbia.edu/~gelman/research/published/healing3.pdf
(Gelman & Loken, 2013):
“P-values are a method of protecting researchers from declaring truth based on patterns in noise, and so it is ironic that, by way of data-dependent analyses, p-values are often used to lend credence to noisy claims based on small samples. To put it another way: without modern statistics, we find it unlikely that people would take seriously a claim about the general population of women, based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.”

“absent pre-registration, our data analysis choices will be data-dependent, even when they are motivated directly from theoretical concerns. When pre-registered replication is difficult or impossible (as in much research in social science and public health), we believe the best strategy is to move toward an analysis of all the data rather than a focus on a single comparison or small set of comparisons”

“In fields where new data can readily be gathered (such as in all four of the examples discussed above), perhaps the two-part structure of Nosek et al. (2013) will be a standard for future research. Instead of the current norm in which several different studies are performed, each with statistical significance but each with analyses that are contingent on data, perhaps researchers can perform half as many original experiments in each paper and just pair each new experiment with a pre-registered replication.”

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 348, 1-17. https://stat.columbia.edu/~gelman/research/unpublished/forking.pdf
(Glasziou et al., 2007):
Glasziou, P., Chalmers, I., Rawlins, M., & McCulloch, P. (2007). When are randomised trials unnecessary? Picking signal from noise. Bmj, 334(7589), 349-351. DOI: 10.1136/bmj.39070.527986.68. https://doi.org/10.1136/bmj.39070.527986.68
(Golomb et al., 2010):
“Placebos were seldom described in randomized, controlled trials of pills or capsules. Because the nature of the placebo can influence trial outcomes, placebo formulation should be disclosed in reports of placebo-controlled trials.”

Golomb, B. A., Erickson, L. C., Koperski, S., Sack, D., Enkin, M., & Howick, J. (2010). What’s in placebos: who knows? Analysis of randomized, controlled trials. Annals of internal medicine, 153(8), 532-535. DOI: 10.7326/0003-4819-153-8-201010190-00010. https://doi.org/10.7326/0003-4819-153-8-201010190-00010
(Guyatt et al., 2011):
Guyatt, G. H., Oxman, A. D., Kunz, R., Woodcock, J., Brozek, J., Helfand, M., … & GRADE Working Group. (2011). GRADE guidelines: 8. Rating the quality of evidence—indirectness. Journal of clinical epidemiology, 64(12), 1303-1310. DOI: 10.1016/j.jclinepi.2011.04.014. https://doi.org/10.1016/j.jclinepi.2011.04.014
(Hardwicke et al., 2022):
Hardwicke, T. E., Thibault, R. T., Kosie, J. E., Tzavella, L., Bendixen, T., Handcock, S. A., … & Ioannidis, J. P. (2022). Post-publication critique at top-ranked journals across scientific disciplines: a cross-sectional assessment of policies and practice. Royal Society Open Science, 9(8), 220139. DOI: 10.1098/rsos.220139. https://doi.org/10.1098/rsos.220139
(Hardwicke et al., 2022b):
Hardwicke, T. E., Thibault, R. T., Kosie, J. E., Wallach, J. D., Kidwell, M. C., & Ioannidis, J. P. (2022). Estimating the prevalence of transparency and reproducibility-related research practices in psychology (2014–2017). Perspectives on Psychological Science, 17(1), 239-251. DOI: 10.1177/1745691620979806. https://doi.org/10.1177/1745691620979806
(Haslam et al., 2021):
Haslam, A., Gill, J., Crain, T., Herrera-Perez, D., Chen, E. Y., Hilal, T., … & Prasad, V. (2021). The frequency of medical reversals in a cross-sectional analysis of high-impact oncology journals, 2009–2018. BMC cancer, 21, 1-9. DOI: 10.1186/s12885-021-08632-8. https://doi.org/10.1186/s12885-021-08632-8
(He et al., 2020):
He, J., Morales, D. R., & Guthrie, B. (2020). Exclusion rates in randomized controlled trials of treatments for physical conditions: a systematic review. Trials, 21, 1-11. DOI: 10.1186/s13063-020-4139-0. https://doi.org/10.1186/s13063-020-4139-0 ; Recommended: https://link.springer.com/content/pdf/10.1186/s13063-020-4139-0.pdf
(Henry & Fitzpatrick, 2015):
“Despite the importance of reproducibility in research, clinical trials are rarely subject to independent reanalysis.

In a recent review, Ebrahim and colleagues identified just 37 published reanalyses of clinical trials.5 Only five were conducted by investigators not associated with the original report. A third of the reanalyses led to interpretations that were different from those of the original articles.

Data sharing, however, is not without its risks.18 As Ebrahim and colleagues point out, threats to patient confidentiality, data dredging with a risk of chance findings, and “rogue reanalyses” by investigators with their own agenda must be considered.”

Henry, D., & Fitzpatrick, T. (2015). Liberating the data from clinical trials. BMJ, 351. DOI: 10.1136/bmj.h4601. https://doi.org/10.1136/bmj.h4601
(Hepburn et al., 2021):
Hepburn, B., Andersen, H., & Zalta, E. (Ed.) (2021). Scientific Method. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition). https://plato.stanford.edu/archives/sum2021/entries/scientific-method/ ; Recommended: https://plato.stanford.edu/entries/scientific-method/
(Herrera-Perez et al., 2019):
Herrera-Perez, D., Haslam, A., Crain, T., Gill, J., Livingston, C., Kaestner, V., … & Prasad, V. (2019). A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals. Elife, 8, e45183. DOI: 10.7554/eLife.45183. https://doi.org/10.7554/eLife.45183
(Hill, 1966):
“Any belief that the controlled trial is the only way would mean not that the pendulum had swung too far but that it had come right off its hook.

Invariably we wish to generalize from our results-that this treatment is of value in the treatment of a certain type of patient. Implicit, therefore, in the design of any trial must be a very careful definition of the type, or types, that we shall admit to it, and a very careful attempt to admit a true cross-section of patients conforming to those types.

Only thus can we safely generalize, and-equally important-only thus can we realize that outside this defined group we are extrapolating from our results.

et too often, in my capacity as a member of the Committee on Safety of Drugs and of its Sub-Committee on Clinical Trials, I am faced with trials on such an ill-defined, or undefined, pot pourri of patients that I can but hopelessly speculate upon who got what and when and usually why?

These poorly-constructed trials not only teach us nothing but may even be dangerously misleading- particularly when their useless data are spuriously supported by all the latest statistical techniques and jargon. “Blinding with science” becomes almost a meiosis.”

Hill, A. B. (1966). Reflections on controlled trial. Annals of the rheumatic diseases, 25(2), 107. DOI: 10.1136/ard.25.2.107. https://doi.org/10.1136/ard.25.2.107
(Hill, 1966b):
“There is one feature of the modern controlled trial that frequently hampers the clinician in making acute and discriminating observations of his patient-and that is the double-blind procedure.

This precaution may well be indispensible in dealing with highly subjective signs and symptoms, such as the assessment by patient and doctor of degrees of pain, discomfort, or anxiety. It may well b- valuable in allowing, without bias or fear of bias, a clinical judgement of the patient’s state of well or ill-being at any given time.

But in some situations I believe it may be inexpedient and, indeed, injurious to the trial. As Cromie has said, it is “ridiculous to scorn subjective assessments in subjective symptoms, and it is unrealistic to make artificially objective assessments”.”

Hill, A. B. (1966). Reflections on controlled trial. Annals of the rheumatic diseases, 25(2), 107. DOI: 10.1136/ard.25.2.107. https://doi.org/10.1136/ard.25.2.107
(Holmberg & Andersen, 2022):
“Selection bias is a general term describing bias that occurs when study participants are identified in a manner such that they are no longer representative of the target population. This can occur when an exposure and outcome each influence a common third variable— the collider—and that variable has been controlled for in the statistical analysis of the study data.3 Collider bias threatens the internal validity of a study and the accurate estimation of causal relationships.”

Holmberg, M. J., & Andersen, L. W. (2022). Collider bias. Jama, 327(13), 1282-1283. DOI: 10.1001/jama.2022.1820. https://doi.org/10.1001/jama.2022.1820
(Hong et al., 2023):
“Of the 113 trials, placebo content was described in 22 (19.5%) journal publications and 51 (45.1%) study protocols. The amount of each placebo ingredient was described in 15 (13.3%) journal publications and 47 (41.6%) study protocols. None of the journal publications explained the rationale for the choice of placebo ingredients, whereas a rationale was provided in 4 (3.5%) study protocols. […] There is no accessible record of the composition of placebos for approximately half of high-impact RCTs, even with access to study protocols. This impedes reproducibility and raises unanswerable questions about what effects—beneficial or harmful—the placebo may have had on trial participants, potentially confounding an accurate assessment of the experimental intervention’s safety and efficacy. Considering that study protocols are unabridged, detailed documents describing the trial design and methodology, the fact that less than half of the study protocols described the placebo contents raises concerns about clinical trial transparency.”

Hong, K., Rowhani-Farid, A., & Doshi, P. (2023). Definition and rationale for placebo composition: Cross-sectional analysis of randomized trials and protocols published in high-impact medical journals. Clinical Trials, 20(5), 564-570. DOI: 10.1177/17407745231167756. https://doi.org/10.1177/17407745231167756
(Horton, 2015):
Horton, R. (2015). Offline: What is medicine’s 5 sigma. Lancet, 385(9976), 1380. DOI: 10.1016/S0140-6736(15)60696-1. https://doi.org/10.1016/S0140-6736(15)60696-1
(Howick et al., 2022):
Howick, J., Koletsi, D., Ioannidis, J. P., Madigan, C., Pandis, N., Loef, M., … & Schmidt, S. (2022). Most healthcare interventions tested in Cochrane Reviews are not effective according to high quality evidence: a systematic review and meta-analysis. Journal of clinical epidemiology. DOI: 10.1016/j.jclinepi.2022.04.017. https://doi.org/10.1016/j.jclinepi.2022.04.017
(Hrobjartsson et al., 2012):
“On average, non-blinded assessors of subjective binary outcomes generated substantially biased effect estimates in randomised clinical trials, exaggerating odds ratios by 36%.”

Hróbjartsson, A., Thomsen, A. S. S., Emanuelsson, F., Tendal, B., Hilden, J., Boutron, I., … & Brorson, S. (2012). Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. Bmj, 344. DOI: 10.1136/bmj.e1119. https://doi.org/10.1136/bmj.e1119 ; Recommended: https://www.bmj.com/content/bmj/344/bmj.e1119.full.pdf
(Huber et al., 2022):
Huber, J., Inoua, S., Kerschbamer, R., König-Kersting, C., Palan, S., & Smith, V. L. (2022). Nobel and novice: Author prominence affects peer review. Proceedings of the National Academy of Sciences, 119(41), e2205779119. DOI: 10.1073/pnas.2205779119. https://doi.org/10.1073/pnas.2205779119
(Humphreys et al., 2013):
Humphreys, M., De la Sierra, R. S., & Van der Windt, P. (2013). Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Analysis, 21(1), 1-20. DOI: 10.1093/pan/mps021. https://doi.org/10.1093/pan/mps021
(Hwang et al., 2016):
Hwang, T. J., Carpenter, D., Lauffenburger, J. C., Wang, B., Franklin, J. M., & Kesselheim, A. S. (2016). Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA internal medicine, 176(12), 1826-1833. DOI: 10.1001/jamainternmed.2016.6008. https://doi.org/10.1001/jamainternmed.2016.6008
(Ioannidis, 2005):
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124. DOI: 10.1371/journal.pmed.0020124. https://doi.org/10.1371/journal.pmed.0020124
(Ioannidis, 2008):
Ioannidis, J. P. (2008). Why Most Discovered True Associations Are Inflated. Epidemiology, 19(5), 640-648. DOI: 10.1097/EDE.0b013e31818131e7. https://doi.org/10.1097/EDE.0b013e31818131e7 ; Recommended: https://osf.io/mfsba/download
(Ioannidis, 2009):
Ioannidis, J. P. (2009). Integration of evidence from multiple meta-analyses: a primer on umbrella reviews, treatment networks and multiple treatments meta-analyses. Cmaj, 181(8), 488-493. DOI: 10.1503/cmaj.081086. https://doi.org/10.1503/cmaj.081086 ; Recommended: https://www.cmaj.ca/content/cmaj/181/8/488.full.pdf
(Ioannidis, 2016):
Ioannidis, J. P. (2016). The mass production of redundant, misleading, and conflicted systematic reviews and meta‐analyses. The Milbank Quarterly, 94(3), 485-514. DOI: 10.1111/1468-0009.12210. https://doi.org/10.1111/1468-0009.12210
(Ioannidis, 2023):
Ioannidis, J. P. (2023). Medical necessity under weak evidence and little or perverse regulatory gatekeeping. Clinical Ethics, 18(3), 330-334. DOI: 10.1177/14777509231169898. https://doi.org/10.1177/14777509231169898
(Ioannidis et al., 2017):
Ioannidis, J. P., Stuart, M. E., Brownlee, S., & Strite, S. A. (2017). How to survive the medical misinformation mess. European journal of clinical investigation, 47(11), 795-802. DOI: 10.1111/eci.12834. https://doi.org/10.1111/eci.12834 ; Recommended: https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/eci.12834?download=true
(Ioannidis & Trikalinos, 2007):
Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical trials, 4(3), 245-253. DOI: 10.1177/1740774507079441. https://doi.org/10.1177/1740774507079441
(Jorgensen et al., 2018):
Jørgensen, L., Gøtzsche, P. C., & Jefferson, T. (2018). The Cochrane HPV vaccine review was incomplete and ignored important evidence of bias. BMJ evidence-based medicine, 23(5), 165-168. http://dx.doi.org/10.1136/bmjebm-2018-111012 ; Recommended: https://ebm.bmj.com/content/ebmed/23/5/165.full.pdf
(Jureidini & McHenry, 2022):
Jureidini, J., & McHenry, L. B. (2022). The illusion of evidence based medicine. BMJ, 376. DOI: 10.1136/bmj.o702. https://doi.org/10.1136/bmj.o702
(Kelly & Zalta, 2016):
Kelly, T., & Zalta, E. (Ed.) (2016). Evidence. The Stanford Encyclopedia of Philosophy (Winter 2016 Edition). https://plato.stanford.edu/archives/win2016/entries/evidence/ ; Recommended: https://plato.stanford.edu/entries/evidence/
(Kendall, 2003):
Kendall, J. (2003). Designing a research project: randomised controlled trials and their principles. Emergency medicine journal: EMJ, 20(2), 164. DOI: 10.1136/emj.20.2.164. https://doi.org/10.1136/emj.20.2.164 ; Recommended: https://emj.bmj.com/content/emermed/20/2/164.full.pdf
(Krauss, 2018):
“Randomised controlled trials (RCTs) are commonly viewed as the best research method to inform public health and social policy. Usually they are thought of as providing the most rigorous evidence of a treatment’s effectiveness without strong assumptions, biases and limitations. Objective: This is the first study to examine that hypothesis by assessing the 10 most cited RCT studies worldwide. Data sources: These 10 RCT studies with the highest number of citations in any journal were identified by searching Scopus (the largest database of peer-reviewed journals). Results: This study shows that these world-leading RCTs that have influenced policy produce biased results by illustrating that participants’ background traits that affect outcomes are often poorly distributed between trial groups, that the trials often neglect alternative factors contributing to their main reported outcome and, among many other issues, that the trials are often only partially blinded or unblinded. The study here also identifies a number of novel and important assumptions, biases and limitations not yet thoroughly discussed in existing studies that arise when designing, implementing and analysing trials.”

Krauss, A. (2018). Why all randomised controlled trials produce biased results. Annals of medicine, 50(4), 312-322. DOI: 10.1080/07853890.2018.1453233. https://doi.org/10.1080/07853890.2018.1453233 ; Recommended: https://www.tandfonline.com/doi/epdf/10.1080/07853890.2018.1453233?needAccess=true
(Langmuir & Hall, 1989):
Langmuir, I., & Hall, R. N. (1989). Pathological science. Physics Today, 42(10), 36-48. DOI: 10.1063/1.881205. https://doi.org/10.1063/1.881205
(Lasserson et al., 2022):
Lasserson, TJ., Thomas, J., & Higgins, JPT. (2022). Cochrane handbook for systematic reviews of interventions. Cochrane. Retrieved July, 2022, from https://training.cochrane.org/handbook/current/chapter-01#section-1-1
(Le Noury et al., 2015):
“Our RIAT analysis of Study 329 showed that neither paroxetine nor high dose imipramine was effective in the treatment of major depression in adolescents, and there was a clinically significant increase in harms with both drugs. This analysis contrasts with both the published conclusions of Keller and colleagues 2 and the way that the outcomes were reported and interpreted in the CSR.

With regard to adverse events, there were large and clinically meaningful differences between the data as analysed by us, those summarised in the CSR using the ADECS methods, and those reported by Keller and colleagues. These differences arise from inadequate and incomplete entry of data from case report forms to summary data sheets in the CSR, the ADECS coding system used by SKB, and the reporting of these data sheets in Keller and colleagues. SKB reported 338 adverse events with paroxetine and Keller and colleagues reported 265, whereas we identified 481 from our analysis of the CSR, and we found a further 23 that had been missed from the 93 case report forms that we reviewed.

There was a major difference between the frequency of suicidal thinking and events reported by Keller and colleagues and the frequency documented in the CSR, as shown in table 6.

Our findings are consistent with those of other studies, including a recent examination of 142 studies of six psychotropic drugs for which journal articles and clinical trial summaries were both available. 26 27 Most deaths (94/151, 62%) and suicides (8/15, 53%) reported in trial summaries were not reported in journal articles. Only one of nine suicides in olanzapine trials was reported in published papers.

Our review of case report forms disclosed significant under-recording of adverse events.

The effect of reporting only adverse events that have a frequency of more than 5% is compounded when, for instance, agitation might be coded under agitation, anxiety, nervousness, hyperkinesis, and emotional lability; thus, a problem occurring at a rate of >10% could vanish by being coded under different subheadings such that none of these reach a threshold rate of 5%.

The extent of the clinically significant increases in adverse events in the paroxetine and imipramine arms, including serious, severe, and suicide related adverse events, became apparent only when the data were made available for reanalysis.”

Le Noury, J., Nardo, J. M., Healy, D., Jureidini, J., Raven, M., Tufanaru, C., & Abi-Jaoude, E. (2015). Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. Bmj, 351. DOI: 10.1136/bmj.h4320. https://doi.org/10.1136/bmj.h4320 ; Recommended: https://www.bmj.com/content/bmj/351/bmj.h4320.full.pdf
(Lenzer, 2015):
Lenzer, J. (2015). Centers for Disease Control and Prevention: protecting the private good?. BMJ, 350. DOI: 10.1136/bmj.h2362. https://doi.org/10.1136/bmj.h2362
(Lexchin et al., 2003):
Lexchin, J., Bero, L. A., Djulbegovic, B., & Clark, O. (2003). Pharmaceutical industry sponsorship and research outcome and quality: systematic review. bmj, 326(7400), 1167-1170. DOI: 10.1136/bmj.326.7400.1167. https://doi.org/10.1136/bmj.326.7400.1167
(Lu, 2009):
Lu, C. Y. (2009). Observational studies: a review of study designs, challenges and strategies to reduce confounding. International journal of clinical practice, 63(5), 691-697. DOI: 10.1111/j.1742-1241.2009.02056.x. https://doi.org/10.1111/j.1742-1241.2009.02056.x
(Lundh et al., 2010):
Lundh, A., Barbateskovic, M., Hróbjartsson, A., & Gøtzsche, P. C. (2010). Conflicts of interest at medical journals: the influence of industry-supported randomised trials on journal impact factors and revenue–cohort study. PLoS medicine, 7(10), e1000354. DOI: 10.1371/annotation/7e5c299c-2db7-4ddf-8eff-ab793511eccd. https://doi.org/10.1371/annotation/7e5c299c-2db7-4ddf-8eff-ab793511eccd ; Recommended: https://journals.plos.org/plosmedicine/article/file?id=10.1371/journal.pmed.1000354&type=printable
(Markie et al., 2021):
Markie, P., Folescu, M., & Zalta, E. (Ed.) (2021). Rationalism vs. Empiricism. The Stanford Encyclopedia of Philosophy (Fall 2021 Edition). https://plato.stanford.edu/archives/fall2021/entries/rationalism-empiricism/ ; Recommended: https://plato.stanford.edu/entries/rationalism-empiricism/
(Murad et al., 2016):
Murad, M. H., Asi, N., Alsawas, M., & Alahdab, F. (2016). New evidence pyramid. BMJ Evidence-Based Medicine, 21(4), 125-127. http://dx.doi.org/10.1136/ebmed-2016-110401
(Nosek et al., 2012):
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615-631. DOI: 10.1177/1745691612459058. https://doi.org/10.1177/1745691612459058 ; Recommended: https://journals.sagepub.com/doi/pdf/10.1177/1745691612459058
(Open Science Collaboration, 2015):
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. DOI: 10.1126/science.aac4716. https://doi.org/10.1126/science.aac4716
(Pearson, 1897):
Pearson, K. (1897). Mathematical contributions to the theory of evolution.—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the royal society of london, 60(359-367), 489-498. DOI: 10.1098/rspl.1896.0076. https://doi.org/10.1098/rspl.1896.0076 ; Recommended: https://royalsocietypublishing.org/doi/pdf/10.1098/rspl.1896.0076
(Peto, 2011):
Peto, R. (2011). Current misconception 3: that subgroup-specific trial mortality results often provide a good basis for individualising patient care. British journal of cancer, 104(7), 1057-1058. DOI: 10.1038/bjc.2011.79. https://doi.org/10.1038/bjc.2011.79
(Piller, 2024):
Piller, C. (2024). Picture imperfect. Science, 385(6716), 1406-1412. DOI: 10.1126/science.adt3535. https://doi.org/10.1126/science.adt3535
(Powell & Prasad, 2022):
Powell, K., & Prasad, V. (2022). Where are randomized trials necessary: Are smoking and parachutes good counterexamples?. European journal of clinical investigation, 52(5), e13730. DOI: 10.1111/eci.13730. https://doi.org/10.1111/eci.13730
(Prasad et al., 2011):
Prasad, V., Gall, V., & Cifu, A. (2011). The frequency of medical reversal. Archives of internal medicine, 171(18), 1675-1676. DOI: 10.1001/archinternmed.2011.295. https://doi.org/10.1001/archinternmed.2011.295
(Prasad et al., 2013):
Prasad, V., Vandross, A., Toomey, C., Cheung, M., Rho, J., Quinn, S., … & Cifu, A. (2013). A decade of reversal: an analysis of 146 contradicted medical practices. In Mayo Clinic Proceedings (Vol. 88, No. 8, pp. 790-798). Elsevier. DOI: 10.1016/j.mayocp.2013.05.012. https://doi.org/10.1016/j.mayocp.2013.05.012
(Prasad & Cifu, 2012):
Prasad, V., & Cifu, A. (2012). A medical burden of proof: towards a new ethic. BioSocieties, 7, 72-87. DOI: 10.1057/biosoc.2011.25. https://doi.org/10.1057/biosoc.2011.25
(Prasad & Jena, 2013):
Prasad, V., & Jena, A. B. (2013). Prespecified falsification end points: can they validate true observational associations?. Jama, 309(3), 241-242. DOI: 10.1001/jama.2012.96867. https://doi.org/10.1001/jama.2012.96867
(Rawlins, 2008):
“Hierarchies attempt to replace judgement with an oversimplistic, pseudo-quantitative, assessment of the quality of the available evidence. Decision makers have to incorporate judgements, as part of their appraisal of the evidence, in reaching their conclusion.5 Such judgements relate to the extent to which each of the components of the evidence base is ‘fit for purpose’. Is it reliable? Is it generalisable? Do the intervention’s benefits outweigh its harms? And so on.”

Rawlins, M. (2008). De testimonio: on the evidence for decisions about the use of therapeutic interventions. The Lancet, 372(9656), 2152-2161. DOI: 10.1016/S0140-6736(08)61930-3. https://doi.org/10.1016/S0140-6736(08)61930-3
(Rawlins, 2008b):
“The difficulties in interpreting frequentist p values become convoluted when seeking to decide, during a clinical trial, whether a study should be terminated prematurely; or how (and whether) to assess outcomes in subgroups of patients once the trial has been completed. A similar problem occurs during the safety analysis of RCTs. In all of these instances, repeated tests of statistical significance – adopting the conventional p value (<0.05) – are increasingly likely to produce one or more ‘significant’ results. If, for example, 10 separate assumptions are tested, the probability of one being apparently significant (at p<0.05) is 40%. This is known as the ‘problem of multiplicity’. There are, though, very divergent views among statisticians as to how to deal with this difficulty in devising stopping rules and in subgroup analyses.”

Rawlins, M. (2008). De testimonio: on the evidence for decisions about the use of therapeutic interventions. The Lancet, 372(9656), 2152-2161. DOI: 10.1016/S0140-6736(08)61930-3. https://doi.org/10.1016/S0140-6736(08)61930-3
(Rawlins, 2008c):
“RCTs are designed to ensure that the statistical power will be sufficient to demonstrate clinical benefit. Such power calculations do not, however, usually take harms into account. 34 As a consequence, although RCTs can identify the more common adverse reactions, they singularly fail to recognise less common ones or those with a long latency (such as malignancies). Most RCTs, even for interventions that are likely to be used by patients for many years, are only of six- to 24-months duration. And, if adverse events are detected at a statistically significant level, it is easy to dismiss them as being due to chance rather than a real difference between the groups. The analysis of RCTs, for harms, poses yet another unresolved multiplicity problem. 34 In large-scale, long-term studies it will be almost inevitable that some statistically significant effects will be observed. Distinguishing those that are iatrogenic, from those that are intercurrent and non-causal, or just random error, is as much an art as a science.”

Rawlins, M. (2008). De testimonio: on the evidence for decisions about the use of therapeutic interventions. The Lancet, 372(9656), 2152-2161. DOI: 10.1016/S0140-6736(08)61930-3. https://doi.org/10.1016/S0140-6736(08)61930-3
(Rawlins, 2008d):
“Because many observational studies have not been consistently ‘tagged’ in electronic bibliographic databases, it is difficult to ensure that conventional search strategies have identified them in an unbiased manner. Many reviewers have therefore relied on personal collections of papers, their own (or others’) memories, or studies identified in previous systematic reviews. The possibility of ‘reviewer bias’ is therefore not inconsiderable.”

Rawlins, M. (2008). De testimonio: on the evidence for decisions about the use of therapeutic interventions. The Lancet, 372(9656), 2152-2161. DOI: 10.1016/S0140-6736(08)61930-3. https://doi.org/10.1016/S0140-6736(08)61930-3
(Reiss et al., 2022):
Reiss, J., Ankeny, R., & Zalta, E. (Ed.) (2022). Philosophy of Medicine. The Stanford Encyclopedia of Philosophy (Summer 2022 Edition). https://plato.stanford.edu/archives/spr2022/entries/medicine/ ; Recommended: https://plato.stanford.edu/entries/medicine/
(Rigas et al., 1999):
Rigas, B., Feretis, C., & Papavassiliou, E. D. (1999). John Lykoudis: an unappreciated discoverer of the cause and treatment of peptic ulcer disease. The Lancet, 354(9190), 1634-1635. DOI: 10.1016/S0140-6736(99)06034-1. https://doi.org/10.1016/S0140-6736(99)06034-1
(Rossouw et al., 2002):
Rossouw, J. E., Anderson, G. L., Prentice, R. L., LaCroix, A. Z., Kooperberg, C., Stefanick, M. L., … & Writing Group for the Women’s Health Initiative Investigators. (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women’s Health Initiative randomized controlled trial. Jama, 288(3), 321-333. DOI: 10.1001/jama.288.3.321. https://doi.org/10.1001/jama.288.3.321
(Sackett, 2000):
Sackett, D. L. (2000). The sins of expertness and a proposal for redemption. BMJ, 320(7244), 1283. DOI: 10.1136/bmj.320.7244.1283. https://doi.org/10.1136/bmj.320.7244.1283 ; Recommended: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1118019/pdf/1283.pdf#page=3
(Schunemann et al., 2022):
“Not downgrading [Non-randomized Studies of Interventions] from high to low certainty needs transparent and detailed justification for what mitigates concerns about confounding and selection bias (Schünemann et al 2018). Very few examples of where not rating down by two levels is appropriate currently exist.”

Schünemann, HJ., Higgins, JPT., Vist, GE., Glasziou, P., Akl, EA., Skoetz, N., & Guyatt, GH. (2022). Cochrane handbook for systematic reviews of interventions. Cochrane. Retrieved July, 2022, from https://training.cochrane.org/handbook/current/chapter-14#section-14-2
(Schlitz et al., 2006):
Schlitz, M., Wiseman, R., Watt, C., & Radin, D. (2006). Of two minds: Sceptic‐proponent collaboration within parapsychology. British Journal of Psychology, 97(3), 313-322. DOI: 10.1348/000712605X80704. https://doi.org/10.1348/000712605X80704
(Schulz & Grimes, 2002):
“Terms such as single blind, double blind, and triple blind mean different things to different people. Moreover, many medical researchers confuse blinding with allocation concealment. Such confusion indicates misunderstandings of both. […] Rather than solely relying on terminology like double blinding, researchers should explicitly state who was blinded, and how. We recommend placing greater credence in results when investigators at least blind outcome assessments, except with objective outcomes, such as death, which leave little room for bias. If investigators properly report their blinding efforts, readers can judge them. Unfortunately, many articles do not contain proper reporting. If an article claims blinding without any accompanying clarification, readers should remain sceptical about its effect on bias reduction.”

Schulz, K. F., & Grimes, D. A. (2002). Blinding in randomised trials: hiding who got what. The Lancet, 359(9307), 696-700. DOI: 10.1016/S0140-6736(02)07816-9. https://doi.org/10.1016/S0140-6736(02)07816-9
(Scott et al., 2022):
“PRI [placebo run-in] studies do not observe larger drug-placebo differences, suggesting that they do not increase trial sensitivity. As such, given the resources and probable deception required and risk to external validity, the practice of using PRI periods in RCTs of antidepressants should be ended.”

Scott, A. J., Sharpe, L., Quinn, V., & Colagiuri, B. (2022). Association of single-blind placebo run-in periods with the placebo response in randomized clinical trials of antidepressants: a systematic review and meta-analysis. JAMA psychiatry, 79(1), 42-49. DOI: 10.1001/jamapsychiatry.2021.3204. https://doi.org/10.1001/jamapsychiatry.2021.3204
(Silberzahn et al., 2018):
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … & Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337-356. DOI: 10.1177/2515245917747646. https://doi.org/10.1177/2515245917747646 ; Recommended: https://journals.sagepub.com/doi/pdf/10.1177/2515245917747646?download=true
(Siler et al., 2015):
Siler, K., Lee, K., & Bero, L. (2015). Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences, 112(2), 360-365. DOI: 10.1073/pnas.1418218112. https://doi.org/10.1073/pnas.1418218112
(Simmons et al., 2016):
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2016). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. https://psycnet.apa.org/doi/10.1037/14805-033 ; Recommended: https://journals.sagepub.com/doi/pdf/10.1177/0956797611417632
(Smith, 2014):
Smith, R. (2014). Medical research—still a scandal. BMJ Opinion, 31. https://blogs.bmj.com/bmj/2014/01/31/richard-smith-medical-research-still-a-scandal/
(Smith, 2021):
Smith, R. (2021). Time to assume that health research is fraudulent until proven otherwise. The BMJ Opinion. Retrieved August, 2022, from https://blogs.bmj.com/bmj/2021/07/05/time-to-assume-that-health-research-is-fraudulent-until-proved-otherwise/
(Smith & Pell, 2003):
“Stephen Lock, my predecessor as editor of The BMJ, became worried about research fraud in the 1980s, but people thought his concerns eccentric. Research authorities insisted that fraud was rare, didn’t matter because science was self-correcting, and that no patients had suffered because of scientific fraud. All those reasons for not taking research fraud seriously have proved to be false, and, 40 years on from Lock’s concerns, we are realising that the problem is huge, the system encourages fraud, and we have no adequate way to respond. It may be time to move from assuming that research has been honestly conducted and reported to assuming it to be untrustworthy until there is some evidence to the contrary.

Richard Smith was the editor of The BMJ until 2004.”

Smith, G. C., & Pell, J. P. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials. BMJ, 327(7429), 1459-1461. DOI: 10.1136/bmj.327.7429.1459. https://doi.org/10.1136/bmj.327.7429.1459
(Sorge et al., 2014):
Sorge, R. E., Martin, L. J., Isbester, K. A., Sotocinal, S. G., Rosen, S., Tuttle, A. H., … & Mogil, J. S. (2014). Olfactory exposure to males, including men, causes stress and related analgesia in rodents. Nature methods, 11(6), 629-632. DOI: 10.1038/nmeth.2935. https://doi.org/10.1038/nmeth.2935
(Sprenger et al., 2021):
Sprenger, J., Weinberger, N., & Zalta, E. (Ed.) (2021). Simpson’s Paradox. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition). https://plato.stanford.edu/archives/sum2021/entries/paradox-simpson/ ; Recommended: https://plato.stanford.edu/entries/paradox-simpson/
(Stanley et al., 2022):
Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. (2022). Beyond Random Effects: When Small-Study Findings Are More Heterogeneous. Advances in Methods and Practices in Psychological Science, 5(4), 25152459221120427. DOI: 10.1177/25152459221120427. https://doi.org/10.1177/25152459221120427
(Stegenga, 2018):
Stegenga, J. (2018). Medical nihilism. Oxford University Press. DOI: 10.1093/oso/9780198747048.001.0001. https://doi.org/10.1093/oso/9780198747048.001.0001
(Tang & Liu, 2000):
Tang, J. L., & Liu, J. L. (2000). Misleading funnel plot for detection of bias in meta-analysis. Journal of clinical epidemiology, 53(5), 477-484. DOI: 10.1016/S0895-4356(99)00204-8. https://doi.org/10.1016/S0895-4356(99)00204-8
(Thacker, 2021):
Thacker, P. D. (2021). Covid-19: Researcher blows the whistle on data integrity issues in Pfizer’s vaccine trial. bmj, 375. DOI: 10.1136/bmj.n2635. https://doi.org/10.1136/bmj.n2635
(The Cochrane Collaboration, 2014):
“Since 2002, governments around the world have spent billions of dollars stockpiling neuraminidase inhibitors (NIs) such as Tamiflu® (oseltamivir) and Relenza® (zanamivir) in anticipation of an influenza pandemic. This trend increased dramatically following the outbreak of the H1N1 virus (swine flu) in April 2009. It was initially believed that NIs would reduce hospital admissions and complications of influenza, such as pneumonia, during influenza pandemics. However, the original evidence presented to government agencies around the world was incomplete, raising questions about the accuracy of these claims and the efficacy of both preparations. […] This latest Cochrane Review has benefited from access to more complete reports of the original research, now made available by the manufacturers, Roche and GlaxoSmithKline. Along with documenting evidence of harms from use of NIs, the review raises the question of whether global stockpiling of the drugs is still justifiable given the lack of reliable evidence to support the original claims of its benefits. […] Initially thought to reduce hospitalisations and serious complications from influenza, the review highlights that [NIs are] not proven to do this, and it also seems to lead to harmful effects that were not fully reported in the original publications. This shows the importance of ensuring that trial data are transparent and accessible.”

The Cochrane Collaboration (2014). Tamiflu and Relenza: getting the full evidence picture. Retrieved January, 2025, from https://www.cochrane.org/news/tamiflu-and-relenza-getting-full-evidence-picture
(Tomljenovic & McHenry, 2024):
“he advertising material for the trial and the informed consent forms stated that the placebo was saline or an inactive substance, when, in fact, it contained Merck’s proprietary highly reactogenic aluminum adjuvant which does not appear to have been properly evaluated for safety. Several trial participants experienced chronic disabling symptoms, including some randomized to the adjuvant “placebo” group.”

Tomljenovic, L., & McHenry, L. B. (2024). A reactogenic “placebo” and the ethics of informed consent in Gardasil HPV vaccine clinical trials: A case study from Denmark. International Journal of Risk & Safety in Medicine, 35(2), 159-180. DOI: 10.3233/JRS-230032. https://doi.org/10.3233/JRS-230032 ; Recommended: https://journals.sagepub.com/doi/reader/10.3233/JRS-230032
(Van Noorden, 2022):
Van Noorden, R. (2022). Exclusive: investigators found plagiarism and data falsification in work from prominent cancer lab. Nature, 607(7920), 650-652. DOI: 10.1038/d41586-022-02002-5. https://doi.org/10.1038/d41586-022-02002-5 ; Recommended: https://media.nature.com/original/magazine-assets/d41586-022-02002-5/d41586-022-02002-5.pdf
(Van Noorden, 2023):
Van Noorden, R. (2023). Medicine is plagued by untrustworthy clinical trials. How many studies are faked or flawed?. Nature, 619(7970), 454-458. DOI: 10.1038/d41586-023-02299-w. https://doi.org/10.1038/d41586-023-02299-w
(Van Noorden, 2023b):
Van Noorden, R. (2023). More than 10,000 research papers were retracted in 2023—a new record. Nature, 624(7992), 479-481. DOI: 10.1038/d41586-023-03974-8. https://doi.org/10.1038/d41586-023-03974-8
(Vinkers et al., 2021):
Vinkers, C. H., Lamberink, H. J., Tijdink, J. K., Heus, P., Bouter, L., Glasziou, P., … & Otte, W. M. (2021). The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement. PLoS Biology, 19(4), e3001162. DOI: 10.1371/journal.pbio.3001162. https://doi.org/10.1371/journal.pbio.3001162
(Vul et al., 2009):
Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on psychological science, 4(3), 274-290. DOI: 10.1111/j.1745-6924.2009.01125.x. https://doi.org/10.1111/j.1745-6924.2009.01125.x
(Walton, 1988):
“This description of reasoned dialogue as a process of deepened insight into one’s own position on a controversial issue is consistent with the Socratic view of dialogue as a means to attain self-knowledge. For Socrates, the process of learning was an ascent from the depths of the cave towards the clearer light of self-knowledge through the process of reasoned, and primarily verbal, dialogue with another discussant, on controversial issues. What Socrates emphasized as a most important benefit or gain of dialogue was self-knowledge. It was somehow through the process of articulation and testing of one’s best arguments against an able opponent in dialogue that real knowledge was to be gained.

This Socratic point of view draws our attention to the more hidden and subtle benefit of good, reasoned dialogue. Not only does it enable one to rationally persuade an opponent or co-participant in discussion, but it is also the vehicle that enables one to come to better understand one’s own position on important issues, one’s own reasoned basis behind one’s deeply held convictions. It is the concept of burden of proof that makes such shifts of rational persuasion possible, and thereby enables dialogue to contribute to knowledge.”

Walton, D. N. (1988). Burden of proof. Argumentation, 2(2), 233-254. DOI: 10.1007/BF00178024. https://doi.org/10.1007/BF00178024
(Walton, 1988b):
“One of the most trenchant and fundamental criticisms of reasoned dialogue as a method of arriving at a conclusion is that argument on a controversial issue can go on and on, back and forth, without a decisive conclusion ever being determined by the argument. The only defence against this criticism lies in the use of the concept of the burden of proof within reasoned dialogue. Once a burden of proof is set externally, then it can be determined, after a finite number of moves in the dialogue, whether the burden has been met or not. Only by this device can we forestall an argument from going on indefinitely, and thereby arrive at a definite conclusion for or against the thesis at issue.”

Walton, D. N. (1988b). Burden of proof. Argumentation, 2(2), 233-254. DOI: 10.1007/BF00178024. https://doi.org/10.1007/BF00178024
(Wang et al., 2024):
Wang, Y., Pitre, T., Wallach, J. D., de Souza, R. J., Jassal, T., Bier, D., … & Zeraatkar, D. (2024). Grilling the data: application of specification curve analysis to red meat and all-cause mortality. Journal of Clinical Epidemiology, 168, 111278. DOI: 10.1016/j.jclinepi.2024.111278. https://doi.org/10.1016/j.jclinepi.2024.111278
(Wasserstein et al., 2019):
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p< 0.05”. The American Statistician, 73(sup1), 1-19. DOI: 10.1080/00031305.2019.1583913. https://doi.org/10.1080/00031305.2019.1583913
(Wasserstein & Lazar, 2016):
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70(2), 129-133. DOI: 10.1080/00031305.2016.1154108. https://doi.org/10.1080/00031305.2016.1154108
(Weinerova et al., 2022):
Weinerová, J., Szűcs, D., & Ioannidis, J. P. (2022). Published correlational effect sizes in social and developmental psychology. Royal Society Open Science, 9(12), 220311. DOI: 10.1098/rsos.220311. https://doi.org/10.1098/rsos.220311
(Westfall & Yarkoni, 2016):
Westfall, J., & Yarkoni, T. (2016). Statistically controlling for confounding constructs is harder than you think. PloS one, 11(3), e0152719. DOI: 10.1371/journal.pone.0152719. https://doi.org/10.1371/journal.pone.0152719 ; Recommended: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0152719&type=printable

Share on

Twitter Facebook LinkedIn