Our methodology

Below are the core principles we try to follow. This might change over time as we steelman the techniques but we had to start somewhere.

  1. Use structured reasons and arguments (Dutilh Novaes & Zalta, 2021; Walton, 1988).
  2. If possible, use arguments based on evidence (Kelly & Zalta, 2016), science (Hepburn et al., 2021), and rationalism (Markie et al., 2021).
  3. Use a hierarchy of evidence (Schünemann et al., 2022) that generally values certain types of evidence higher than others in the following order:
    1. Experimental studies (Franklin et al., 2021) by experts in peer-reviewed, scientific journals
      1. Meta-analyses, systematic reviews (Lasserson et al., 2022) and umbrella reviews (Ioannidis, 2009) of the following types of studies:
      2. Randomized controlled trial (RCT) experiments (Reiss et al., 2022; Kendall, 2003): Patients are randomly assigned to an intervention group or a control group (not receiving the intervention) and the groups are compared, hopefully controlling for confounding variables (Lu, 2009):
        1. Placebo-controlled: A placebo is assumed to have no or minimal effect
        2. Non-placebo-controlled
        3. Versions of the above two types:
          1. Quadruple blinded (patient, experimenters, data analyst, and care givers)
          2. Triple blinded (patient, experimenters, and data analyst)
          3. Double blinded (patient and experimenters)
          4. Single blinded (patient)
          5. Unblinded
      3. Experiments without a control group
        1. Experiments on groups
        2. Experiments on individuals (case studies)
      4. Experiments exploring a mechanism of action (Craver et al., 2019)
    2. Correlational/observational studies by experts in peer-reviewed, scientific journals
      1. Meta-analyses, systematic reviews, and umbrella reviews of the following types of studies:
      2. Cohort studies (Euser et al., 2009): Follow one exposed group and a non-exposed group (control) and compare outcomes.
        1. Prospective: Baseline is assessed and then researchers actively follow patients to perform a follow-up: More accurate data collection
        2. Retrospective: Historical analysis of existing data
      3. Case-control studies (Lu, 2009): Follow one group with an outcome and another without an outcome (control) and compare exposure.
      4. Cross-sectional studies (Lu, 2009): Analyze whether individuals were exposed and whether they had certain outcomes and compare to those that didn’t (control).
      5. Observations without a control group
        1. Observations of groups
        2. Observations of individuals (case studies)
      6. Ecological studies (Lu, 2009): Similar to cross-sectional studies but groups are analyzed instead of individuals
    3. Simulated model (Frigg et al., 2020) results by experts in peer-reviewed, scientific journals
    4. Opinions by experts in peer-reviewed, scientific journals
      1. Groups of experts
      2. Individual experts
    5. All of the above but not in peer-reviewed, scientific journals
    6. All of the above but not by experts
  4. Use a burden of proof

Causation and Correlation

Most arguments pre-suppose some theory of causation (Gallow, 2022) where one or more things happening are necessary, sufficient, and/or contribute to one or more other things happening.

A related concept is correlation where one or more things happening may be associated, with some probability, with one or more other things happening.

However, it’s possible for things to be highly correlated but causally unrelated which is called a spurious correlation (Aldrich, 1995), thus leading to the common warning that “correlation does not [necessarily] imply causation”. This has been known since the late 19th century (Pearson, 1897).

Burden of Proof

A burden of proof is an expectation by one side about the strength of argument required by another side for persuasion. A burden of proof may be useful to establish the core of a debate and avoid an argument going on indefinitely (Walton, 1988b). A burden of proof may assert controversial philosophical or ethical premises, but we think this is still valuable in clarifying the context and exit criteria of an argument.


  1. There are potential issues with the concept of a hierarchy of evidence and the related “evidence-based medicine” (EBM) movement (Murad et al., 2016; Anglemyer et al., 2014; Frieden, 2017; Stegenga, 2018; Blunt, 2015; Jureidini & McHenry, 2022; Charlton, 2009; Charlton & Miles, 1998).
  2. A lack of evidence higher in the hierarchy is not necessarily problematic due to infeasibility (Smith & Pell, 2003; Prasad & Jena, 2013), unnecessary risks (Glasziou et al., 2007), ethical issues, etc., although there are risks to making such assumptions (Prasad et al., 2011; Prasad et al., 2013; Haslam et al., 2021; Herrera-Perez et al., 2019; Rossouw et al., 2002; Powell & Prasad, 2022).
  3. There are potential issues with specific types of evidence; for examples:
    1. Meta-analyses: Study selection (Jørgensen et al., 2018), transparency (Coyne et al., 2010), poor quality (Ioannidis, 2016), etc.
    2. RCTs: Simpson’s Paradox (Sprenger et al., 2021), bias (Vinkers et al., 2021), etc.
  4. In some cases, evidence lower in the hierarchy may be stronger; for example, a well-done experiment might be stronger than a poorly done RCT.
  5. There are potential issues with all types of evidence (Ioannidis, 2005); for examples:
    1. Successful results with small sample sizes but failures with large sample sizes (Hwang et al., 2016)
    2. Poor incentives (Horton, 2015; Nosek et al., 2012)
    3. Ostensibly useful results that are instead likely due to noise or poor data quality, despite honest and transparency (Gelman, 2017)
    4. Statistically significant but false effects due to low statistical power (Button et al., 2013)
    5. Lacking or poor reproduction, or too much reproduction (Ioannidis & Trikalinos, 2007)
    6. Non-random sampling (Carlisle, 2017)
    7. High heterogeneity violating random-effects model assumptions (Stanley et al., 2022) and creating invalid generalizations (Bryan et al., 2021)
    8. Poor external validity (Reiss et al., 2022)
    9. Incorrect statistical controls (Westfall & Yarkoni, 2016)
    10. Mistakes (data entry, variable coding, statistics, over-interpretation, etc.) (Gelman, 2017; Brown & Heathers, 2017)
    11. Multiple comparisons (Bennett et al., 2009)
    12. p-hacking, researcher degrees of freedom, multiple potential comparisons, or fishing expeditions (Wasserstein & Lazar, 2016; Wasserstein et al., 2019; Simmons et al., 2016; Humphreys et al., 2013; Gelman & Loken, 2013)
    13. Misconduct and fraud (Fanelli, 2009; Smith, 2021; Crocker, 2011; Van Noorden, 2022; Adam, 2019; Fang et al., 2012; Thacker, 2021)
    14. Conflicts of interest and funding distortions (Lexchin et al., 2003; Lundh et al., 2010; Angell, 2009)
    15. Poor instrument or method reliability (Vul et al., 2009)
    16. Limiting post-publication critique (Hardwicke et al., 2022)
    17. Potential experimenter effects (Sorge et al., 2014; Schlitz et al., 2006)
    18. Publication selection bias (Bartos et al., 2022) and difficulty detecting it (Tang & Liu, 2000)
    19. Institutional inertia and politics (Rigas et al., 1999), etc.
    20. Peer review failures, biases, and gate keeping (Huber et al., 2022; Ferguson et al., 2014; Siler et al., 2015; Sackett, 2000)
    21. Evidence may be low quality (Howick et al., 2022)
    22. Potential under-reporting of harms (Howick et al., 2022)
    23. Poor data sharing and transparency (Hardwicke et al., 2022b; Gabelica et al., 2022; Gelman, 2017)
    24. Outcome switching (Altman et al., 2017)
    25. Failures of pre-registration (Brodeur et al., 2022)
    26. Incorrect sub-group analyses (Peto, 2011)
    27. Reporting biases (Weinerova et al., 2022)
    28. Lack of post-publication review (Gelman, 2017)
    29. Small effect sizes
    30. Poor study design or methodology

Other general points

  1. We use the heuristic that mistakes are generally made due to incompetence rather than malice (Bloch, 2003), although the latter is certainly possible.
  2. When comparing arguments, the number of points doesn’t necessarily matter.


88 references
  1. (Adam, 2019):

    Adam, D. (2019). How a data detective exposed suspicious medical trials. Nature, 571(7766), 462-465. DOI: 10.1038/d41586-019-02241-z. ; Recommended:

  2. (Aldrich, 1995):

    Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statistical science, 364-376. DOI: 10.1214/ss/1177009870.

  3. (Altman et al., 2017):

    Altman, D. G., Moher, D., & Schulz, K. F. (2017). Harms of outcome switching in reports of randomised trials: CONSORT perspective. Bmj, 356. DOI: 10.1136/bmj.j396.

  4. (Angell, 2009):

    Angell, M. (2009). Drug companies & doctors: A story of corruption. The New York Review of Books, 56(1), 8-12. Retrieved August, 2022 from

  5. (Anglemyer et al., 2014):

    Anglemyer, A., Horvath, H. T., & Bero, L. (2014). Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database of Systematic Reviews, (4). DOI: 10.1002/14651858.MR000034.pub2. ; Recommended:

  6. (Bartos et al., 2022):

    Bartoš, F., Maier, M., Wagenmakers, E. J., Nippold, F., Doucouliagos, H., Ioannidis, J., … & Stanley, T. D. (2022). Footprint of publication selection bias on meta-analyses in medicine, economics, and psychology. arXiv preprint arXiv:2208.12334. DOI: 10.48550/arXiv.2208.12334.

  7. (Bennett et al., 2009):

    Bennett, C. M., Baird, A. A., Miller, M. B., and Wolford, G. L. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. Poster presented at Human Brain Mapping conference.

  8. (Bloch, 2003):

    Bloch, A. (2003). Murphy’s law. Penguin.

  9. (Blunt, 2015):

    Blunt, C. (2015). Hierarchies of evidence in evidence-based medicine (Doctoral dissertation, London School of Economics and Political Science). Retrieved July, 2022, from

  10. (Brodeur et al., 2022):

    Brodeur, A., Cook, N., Hartley, J., & Heyes, A. (2022). Do Pre-Registration and Pre-analysis Plans Reduce p-Hacking and Publication Bias?. Available at SSRN. DOI: 10.2139/ssrn.4180594.

  11. (Brown & Heathers, 2017):

    Brown, N. J., & Heathers, J. A. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363-369. DOI: 10.1177/1948550616673876.

  12. (Bryan et al., 2021):

    Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature human behaviour, 5(8), 980-989. DOI: 10.1038/s41562-021-01143-3.

  13. (Button et al., 2013):

    Button, K. S., Ioannidis, J., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews neuroscience, 14(5), 365-376. DOI: 10.1038/nrn3475. ; Recommended:

  14. (Carlisle, 2017):

    Carlisle, J. B. (2017). Data fabrication and other reasons for non‐random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia, 72(8), 944-952. DOI: 10.1111/anae.13938. ; Recommended:

  15. (Charlton, 2009):

    Charlton, B. G. (2009). The Zombie science of evidence‐based medicine: a personal retrospective. A commentary on Djulbegovic, B., Guyatt, GH & Ashcroft, RE (2009). Cancer Control, 16, 158–168. Journal of Evaluation in Clinical Practice, 15(6), 930-934. DOI: 10.1111/j.1365-2753.2009.01267.x.

  16. (Charlton & Miles, 1998):

    Charlton, B. G., & Miles, A. (1998). The rise and fall of EBM. QJM: monthly journal of the Association of Physicians, 91(5), 371-374. DOI: 10.1093/qjmed/91.5.371.

  17. (Coyne et al., 2010):

    Coyne, J. C., Thombs, B. D., & Hagedoorn, M. (2010). Ain’t necessarily so: review and critique of recent meta-analyses of behavioral medicine interventions in health psychology. Health Psychology, 29(2), 107.

  18. (Craver et al., 2019):

    Craver, C., Tabery, J., & Zalta, E. (Ed.) (2019). Mechanisms in Science. The Stanford Encyclopedia of Philosophy (Summer 2019 Edition). ; Recommended:

  19. (Crocker, 2011):

    Crocker, J. (2011). The road to fraud starts with a single step. Nature, 479(7372), 151-151. DOI: 10.1038/479151a.

  20. (Dutilh Novaes & Zalta, 2021):

    Dutilh Novaes, C., & Zalta, E. (Ed.) (2021). Argument and Argumentation. The Stanford Encyclopedia of Philosophy (Fall 2021 Edition). ; Recommended:

  21. (Euser et al., 2009):

    Euser, A. M., Zoccali, C., Jager, K. J., & Dekker, F. W. (2009). Cohort studies: prospective versus retrospective. Nephron Clinical Practice, 113(3), c214-c217. DOI: 10.1159/000235241.

  22. (Fanelli, 2009):

    Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PloS one, 4(5), e5738. DOI: 10.1371/journal.pone.0005738.

  23. (Fang et al., 2012):

    Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences, 109(42), 17028-17033. DOI: 10.1073/pnas.1212247109.

  24. (Ferguson et al., 2014):

    Ferguson, C., Marcus, A., & Oransky, I. (2014). The peer-review scam. Nature, 515(7528), 480. DOI: 10.1038/515480a.

  25. (Franklin et al., 2021):

    Franklin, A., Perovic, S., & Zalta, E. (Ed.) (2021). Experiment in Physics. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition). ; Recommended:

  26. (Frieden, 2017):

    Frieden, T. R. (2017). Evidence for health decision making—beyond randomized, controlled trials. New England Journal of Medicine, 377(5), 465-475. DOI: 10.1056/NEJMra1614394. ; Recommended:

  27. (Frigg et al., 2020):

    Frigg, R., Hartmann, S., & Zalta, E. (Ed.) (2020). Models in Science. The Stanford Encyclopedia of Philosophy (Spring 2020 Edition). ; Recommended:

  28. (Gabelica et al., 2022):

    Gabelica, M., Bojčić, R., & Puljak, L. (2022). Many researchers were not compliant with their published data sharing statement: a mixed-methods study. Journal of Clinical Epidemiology, 150, 33-41. DOI: 10.1016/j.jclinepi.2022.05.019.

  29. (Gallow, 2022):

    Gallow, D. (2022). The Metaphysics of Causation. The Stanford Encyclopedia of Philosophy (Fall 2022 Edition). ; Recommended:

  30. (Gelman, 2017):

    Gelman, A. (2017). Ethics and statistics: Honesty and transparency are not enough. Chance, 30(1), 37-39. DOI: 10.1080/09332480.2017.1302720.

  31. (Gelman & Loken, 2013):

    “P-values are a method of protecting researchers from declaring truth based on patterns in noise, and so it is ironic that, by way of data-dependent analyses, p-values are often used to lend credence to noisy claims based on small samples. To put it another way: without modern statistics, we find it unlikely that people would take seriously a claim about the general population of women, based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.”

    “absent pre-registration, our data analysis choices will be data-dependent, even when they are motivated directly from theoretical concerns. When pre-registered replication is difficult or impossible (as in much research in social science and public health), we believe the best strategy is to move toward an analysis of all the data rather than a focus on a single comparison or small set of comparisons”

    “In fields where new data can readily be gathered (such as in all four of the examples discussed above), perhaps the two-part structure of Nosek et al. (2013) will be a standard for future research. Instead of the current norm in which several different studies are performed, each with statistical significance but each with analyses that are contingent on data, perhaps researchers can perform half as many original experiments in each paper and just pair each new experiment with a pre-registered replication.”


    Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 348, 1-17.

  32. (Glasziou et al., 2007):

    Glasziou, P., Chalmers, I., Rawlins, M., & McCulloch, P. (2007). When are randomised trials unnecessary? Picking signal from noise. Bmj, 334(7589), 349-351. DOI: 10.1136/bmj.39070.527986.68.

  33. (Hardwicke et al., 2022):

    Hardwicke, T. E., Thibault, R. T., Kosie, J. E., Tzavella, L., Bendixen, T., Handcock, S. A., … & Ioannidis, J. P. (2022). Post-publication critique at top-ranked journals across scientific disciplines: a cross-sectional assessment of policies and practice. Royal Society Open Science, 9(8), 220139. DOI: 10.1098/rsos.220139.

  34. (Hardwicke et al., 2022b):

    Hardwicke, T. E., Thibault, R. T., Kosie, J. E., Wallach, J. D., Kidwell, M. C., & Ioannidis, J. P. (2022). Estimating the prevalence of transparency and reproducibility-related research practices in psychology (2014–2017). Perspectives on Psychological Science, 17(1), 239-251. DOI: 10.1177/1745691620979806.

  35. (Haslam et al., 2021):

    Haslam, A., Gill, J., Crain, T., Herrera-Perez, D., Chen, E. Y., Hilal, T., … & Prasad, V. (2021). The frequency of medical reversals in a cross-sectional analysis of high-impact oncology journals, 2009–2018. BMC cancer, 21, 1-9. DOI: 10.1186/s12885-021-08632-8.

  36. (Hepburn et al., 2021):

    Hepburn, B., Andersen, H., & Zalta, E. (Ed.) (2021). Scientific Method. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition). ; Recommended:

  37. (Herrera-Perez et al., 2019):

    Herrera-Perez, D., Haslam, A., Crain, T., Gill, J., Livingston, C., Kaestner, V., … & Prasad, V. (2019). A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals. Elife, 8, e45183. DOI: 10.7554/eLife.45183.

  38. (Horton, 2015):

    Horton, R. (2015). Offline: What is medicine’s 5 sigma. Lancet, 385(9976), 1380. DOI: 10.1016/S0140-6736(15)60696-1.

  39. (Howick et al., 2022):

    Howick, J., Koletsi, D., Ioannidis, J. P., Madigan, C., Pandis, N., Loef, M., … & Schmidt, S. (2022). Most healthcare interventions tested in Cochrane Reviews are not effective according to high quality evidence: a systematic review and meta-analysis. Journal of clinical epidemiology. DOI: 10.1016/j.jclinepi.2022.04.017.

  40. (Huber et al., 2022):

    Huber, J., Inoua, S., Kerschbamer, R., König-Kersting, C., Palan, S., & Smith, V. L. (2022). Nobel and novice: Author prominence affects peer review. Proceedings of the National Academy of Sciences, 119(41), e2205779119. DOI: 10.1073/pnas.2205779119.

  41. (Humphreys et al., 2013):

    Humphreys, M., De la Sierra, R. S., & Van der Windt, P. (2013). Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Analysis, 21(1), 1-20. DOI: 10.1093/pan/mps021.

  42. (Hwang et al., 2016):

    Hwang, T. J., Carpenter, D., Lauffenburger, J. C., Wang, B., Franklin, J. M., & Kesselheim, A. S. (2016). Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA internal medicine, 176(12), 1826-1833. DOI: 10.1001/jamainternmed.2016.6008.

  43. (Ioannidis, 2005):

    Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124. DOI: 10.1371/journal.pmed.0020124.

  44. (Ioannidis, 2009):

    Ioannidis, J. P. (2009). Integration of evidence from multiple meta-analyses: a primer on umbrella reviews, treatment networks and multiple treatments meta-analyses. Cmaj, 181(8), 488-493. DOI: 10.1503/cmaj.081086. ; Recommended:

  45. (Ioannidis, 2016):

    Ioannidis, J. P. (2016). The mass production of redundant, misleading, and conflicted systematic reviews and meta‐analyses. The Milbank Quarterly, 94(3), 485-514. DOI: 10.1111/1468-0009.12210.

  46. (Ioannidis & Trikalinos, 2007):

    Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical trials, 4(3), 245-253. DOI: 10.1177/1740774507079441.

  47. (Jorgensen et al., 2018):

    Jørgensen, L., Gøtzsche, P. C., & Jefferson, T. (2018). The Cochrane HPV vaccine review was incomplete and ignored important evidence of bias. BMJ evidence-based medicine, 23(5), 165-168. ; Recommended:

  48. (Jureidini & McHenry, 2022):

    Jureidini, J., & McHenry, L. B. (2022). The illusion of evidence based medicine. BMJ, 376. DOI: 10.1136/bmj.o702.

  49. (Kelly & Zalta, 2016):

    Kelly, T., & Zalta, E. (Ed.) (2016). Evidence. The Stanford Encyclopedia of Philosophy (Winter 2016 Edition). ; Recommended:

  50. (Kendall, 2003):

    Kendall, J. (2003). Designing a research project: randomised controlled trials and their principles. Emergency medicine journal: EMJ, 20(2), 164. DOI: 10.1136/emj.20.2.164. ; Recommended:

  51. (Lasserson et al., 2022):

    Lasserson, TJ., Thomas, J., & Higgins, JPT. (2022). Cochrane handbook for systematic reviews of interventions. Cochrane. Retrieved July, 2022, from

  52. (Lexchin et al., 2003):

    Lexchin, J., Bero, L. A., Djulbegovic, B., & Clark, O. (2003). Pharmaceutical industry sponsorship and research outcome and quality: systematic review. bmj, 326(7400), 1167-1170. DOI: 10.1136/bmj.326.7400.1167.

  53. (Lu, 2009):

    Lu, C. Y. (2009). Observational studies: a review of study designs, challenges and strategies to reduce confounding. International journal of clinical practice, 63(5), 691-697. DOI: 10.1111/j.1742-1241.2009.02056.x.

  54. (Lundh et al., 2010):

    Lundh, A., Barbateskovic, M., Hróbjartsson, A., & Gøtzsche, P. C. (2010). Conflicts of interest at medical journals: the influence of industry-supported randomised trials on journal impact factors and revenue–cohort study. PLoS medicine, 7(10), e1000354. DOI: 10.1371/annotation/7e5c299c-2db7-4ddf-8eff-ab793511eccd. ; Recommended:

  55. (Markie et al., 2021):

    Markie, P., Folescu, M., & Zalta, E. (Ed.) (2021). Rationalism vs. Empiricism. The Stanford Encyclopedia of Philosophy (Fall 2021 Edition). ; Recommended:

  56. (Murad et al., 2016):

    Murad, M. H., Asi, N., Alsawas, M., & Alahdab, F. (2016). New evidence pyramid. BMJ Evidence-Based Medicine, 21(4), 125-127.

  57. (Nosek et al., 2012):

    Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615-631. DOI: 10.1177/1745691612459058. ; Recommended:

  58. (Pearson, 1897):

    Pearson, K. (1897). Mathematical contributions to the theory of evolution.—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the royal society of london, 60(359-367), 489-498. DOI: 10.1098/rspl.1896.0076. ; Recommended:

  59. (Peto, 2011):

    Peto, R. (2011). Current misconception 3: that subgroup-specific trial mortality results often provide a good basis for individualising patient care. British journal of cancer, 104(7), 1057-1058. DOI: 10.1038/bjc.2011.79.

  60. (Powell & Prasad, 2022):

    Powell, K., & Prasad, V. (2022). Where are randomized trials necessary: Are smoking and parachutes good counterexamples?. European journal of clinical investigation, 52(5), e13730. DOI: 10.1111/eci.13730.

  61. (Prasad et al., 2011):

    Prasad, V., Gall, V., & Cifu, A. (2011). The frequency of medical reversal. Archives of internal medicine, 171(18), 1675-1676. DOI: 10.1001/archinternmed.2011.295.

  62. (Prasad et al., 2013):

    Prasad, V., Vandross, A., Toomey, C., Cheung, M., Rho, J., Quinn, S., … & Cifu, A. (2013). A decade of reversal: an analysis of 146 contradicted medical practices. In Mayo Clinic Proceedings (Vol. 88, No. 8, pp. 790-798). Elsevier. DOI: 10.1016/j.mayocp.2013.05.012.

  63. (Prasad & Jena, 2013):

    Prasad, V., & Jena, A. B. (2013). Prespecified falsification end points: can they validate true observational associations?. Jama, 309(3), 241-242. DOI: 10.1001/jama.2012.96867.

  64. (Reiss et al., 2022):

    Reiss, J., Ankeny, R., & Zalta, E. (Ed.) (2022). Philosophy of Medicine. The Stanford Encyclopedia of Philosophy (Summer 2022 Edition). ; Recommended:

  65. (Rigas et al., 1999):

    Rigas, B., Feretis, C., & Papavassiliou, E. D. (1999). John Lykoudis: an unappreciated discoverer of the cause and treatment of peptic ulcer disease. The Lancet, 354(9190), 1634-1635. DOI: 10.1016/S0140-6736(99)06034-1.

  66. (Rossouw et al., 2002):

    Rossouw, J. E., Anderson, G. L., Prentice, R. L., LaCroix, A. Z., Kooperberg, C., Stefanick, M. L., … & Writing Group for the Women’s Health Initiative Investigators. (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women’s Health Initiative randomized controlled trial. Jama, 288(3), 321-333. DOI: 10.1001/jama.288.3.321.

  67. (Sackett, 2000):

    Sackett, D. L. (2000). The sins of expertness and a proposal for redemption. BMJ, 320(7244), 1283. DOI: 10.1136/bmj.320.7244.1283. ; Recommended:

  68. (Schunemann et al., 2022):

    “Not downgrading [Non-randomized Studies of Interventions] from high to low certainty needs transparent and detailed justification for what mitigates concerns about confounding and selection bias (Schünemann et al 2018). Very few examples of where not rating down by two levels is appropriate currently exist.”


    Schünemann, HJ., Higgins, JPT., Vist, GE., Glasziou, P., Akl, EA., Skoetz, N., & Guyatt, GH. (2022). Cochrane handbook for systematic reviews of interventions. Cochrane. Retrieved July, 2022, from

  69. (Schlitz et al., 2006):

    Schlitz, M., Wiseman, R., Watt, C., & Radin, D. (2006). Of two minds: Sceptic‐proponent collaboration within parapsychology. British Journal of Psychology, 97(3), 313-322. DOI: 10.1348/000712605X80704.

  70. (Siler et al., 2015):

    Siler, K., Lee, K., & Bero, L. (2015). Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences, 112(2), 360-365. DOI: 10.1073/pnas.1418218112.

  71. (Simmons et al., 2016):

    Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2016). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. ; Recommended:

  72. (Smith, 2021):

    Smith, R. (2021). Time to assume that health research is fraudulent until proven otherwise. The BMJ Opinion. Retrieved August, 2022, from

  73. (Smith & Pell, 2003):

    “Stephen Lock, my predecessor as editor of The BMJ, became worried about research fraud in the 1980s, but people thought his concerns eccentric. Research authorities insisted that fraud was rare, didn’t matter because science was self-correcting, and that no patients had suffered because of scientific fraud. All those reasons for not taking research fraud seriously have proved to be false, and, 40 years on from Lock’s concerns, we are realising that the problem is huge, the system encourages fraud, and we have no adequate way to respond. It may be time to move from assuming that research has been honestly conducted and reported to assuming it to be untrustworthy until there is some evidence to the contrary.

    Richard Smith was the editor of The BMJ until 2004.”


    Smith, G. C., & Pell, J. P. (2003). Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials. BMJ, 327(7429), 1459-1461. DOI: 10.1136/bmj.327.7429.1459.

  74. (Sorge et al., 2014):

    Sorge, R. E., Martin, L. J., Isbester, K. A., Sotocinal, S. G., Rosen, S., Tuttle, A. H., … & Mogil, J. S. (2014). Olfactory exposure to males, including men, causes stress and related analgesia in rodents. Nature methods, 11(6), 629-632. DOI: 10.1038/nmeth.2935.

  75. (Sprenger et al., 2021):

    Sprenger, J., Weinberger, N., & Zalta, E. (Ed.) (2021). Simpson’s Paradox. The Stanford Encyclopedia of Philosophy (Summer 2021 Edition). ; Recommended:

  76. (Stanley et al., 2022):

    Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. (2022). Beyond Random Effects: When Small-Study Findings Are More Heterogeneous. Advances in Methods and Practices in Psychological Science, 5(4), 25152459221120427. DOI: 10.1177/25152459221120427.

  77. (Stegenga, 2018):

    Stegenga, J. (2018). Medical nihilism. Oxford University Press. DOI: 10.1093/oso/9780198747048.001.0001.

  78. (Tang & Liu, 2000):

    Tang, J. L., & Liu, J. L. (2000). Misleading funnel plot for detection of bias in meta-analysis. Journal of clinical epidemiology, 53(5), 477-484. DOI: 10.1016/S0895-4356(99)00204-8.

  79. (Thacker, 2021):

    Thacker, P. D. (2021). Covid-19: Researcher blows the whistle on data integrity issues in Pfizer’s vaccine trial. bmj, 375. DOI: 10.1136/bmj.n2635.

  80. (Van Noorden, 2022):

    Van Noorden, R. (2022). Exclusive: investigators found plagiarism and data falsification in work from prominent cancer lab. Nature, 607(7920), 650-652. DOI: 10.1038/d41586-022-02002-5. ; Recommended:

  81. (Vinkers et al., 2021):

    Vinkers, C. H., Lamberink, H. J., Tijdink, J. K., Heus, P., Bouter, L., Glasziou, P., … & Otte, W. M. (2021). The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement. PLoS Biology, 19(4), e3001162. DOI: 10.1371/journal.pbio.3001162.

  82. (Vul et al., 2009):

    Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on psychological science, 4(3), 274-290. DOI: 10.1111/j.1745-6924.2009.01125.x.

  83. (Walton, 1988):

    “This description of reasoned dialogue as a process of deepened insight into one’s own position on a controversial issue is consistent with the Socratic view of dialogue as a means to attain self-knowledge. For Socrates, the process of learning was an ascent from the depths of the cave towards the clearer light of self-knowledge through the process of reasoned, and primarily verbal, dialogue with another discussant, on controversial issues. What Socrates emphasized as a most important benefit or gain of dialogue was self-knowledge. It was somehow through the process of articulation and testing of one’s best arguments against an able opponent in dialogue that real knowledge was to be gained.

    This Socratic point of view draws our attention to the more hidden and subtle benefit of good, reasoned dialogue. Not only does it enable one to rationally persuade an opponent or co-participant in discussion, but it is also the vehicle that enables one to come to better understand one’s own position on important issues, one’s own reasoned basis behind one’s deeply held convictions. It is the concept of burden of proof that makes such shifts of rational persuasion possible, and thereby enables dialogue to contribute to knowledge.”


    Walton, D. N. (1988). Burden of proof. Argumentation, 2(2), 233-254. DOI: 10.1007/BF00178024.

  84. (Walton, 1988b):

    “One of the most trenchant and fundamental criticisms of reasoned dialogue as a method of arriving at a conclusion is that argument on a controversial issue can go on and on, back and forth, without a decisive conclusion ever being determined by the argument. The only defence against this criticism lies in the use of the concept of the burden of proof within reasoned dialogue. Once a burden of proof is set externally, then it can be determined, after a finite number of moves in the dialogue, whether the burden has been met or not. Only by this device can we forestall an argument from going on indefinitely, and thereby arrive at a definite conclusion for or against the thesis at issue.”


    Walton, D. N. (1988b). Burden of proof. Argumentation, 2(2), 233-254. DOI: 10.1007/BF00178024.

  85. (Wasserstein et al., 2019):

    Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p< 0.05”. The American Statistician, 73(sup1), 1-19. DOI: 10.1080/00031305.2019.1583913.

  86. (Wasserstein & Lazar, 2016):

    Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70(2), 129-133. DOI: 10.1080/00031305.2016.1154108.

  87. (Weinerova et al., 2022):

    Weinerová, J., Szűcs, D., & Ioannidis, J. P. (2022). Published correlational effect sizes in social and developmental psychology. Royal Society Open Science, 9(12), 220311. DOI: 10.1098/rsos.220311.

  88. (Westfall & Yarkoni, 2016):

    Westfall, J., & Yarkoni, T. (2016). Statistically controlling for confounding constructs is harder than you think. PloS one, 11(3), e0152719. DOI: 10.1371/journal.pone.0152719. ; Recommended:


Other topics: