There has been a replication crisis in medical and psychological studies in recent years. This NYT piece documents a prime example in social psychology. At the center of the "drama" are Amy Cuddy whose TED presentation on power pose has attracted tens of millions of views on YouTube and several statistically savvy researchers including Andrew Gelman.
For many empirical scientists, using statistical analyses of a random sample to make inference for the entire population is the most reliable way to reveal some hidden patterns, except when a "pattern" already exists in the researcher's mind before even conducting the study. What the researcher needs to do is to find an ideal "path" to use the data to prove the preexisting pattern. This has been known as "p-hacking." (also read here, here, here,) Conducting research has never been easier. Scientists are "blessed" with so much data. With the blessing also comes a daunting task: correctly detecting signals from an expanding sea of noises. We empirical scientists thus have the obligation to constantly educate ourselves about the most recent advances in statistical methods. In the meantime, we need to constantly remind ourselves of putting aside our biases and wishful thinking while conducting research. Actions need to be taken at not only individual level but also collective level. Too much hype has been given to the so-called statistically significant findings. The glory of "statistical significance" has permeated the entire culture of science. Scientists are first and foremost humans who are driven by desires for success, fame, status, and respect. No one would pay much attention to a study that produces largely insignificant results. Maybe, incentives can be provided to researchers who after painstaking research design, meticulous data collection, and rigorous statistical tests end up with statistically insignificant results. I remember having an in-class discussion in graduate school. The professor told us that sometimes insignificant results can be meaningful too. Journal editors may want to give equal consideration to those rigorous studies that fail to produce significant results. While everyone is busy conducting original studies, incentives need to be provided to encourage replications. Only when we put checks and balances in place can we make this system more transparent and healthier.
Scientists have started brainstorming about a path forward in the wake of the replication crisis. Some suggest, if P-hacking is such a rampant issue, why not make it harder to achieve significance? This Nature Human Behavior article proposes raising the bar by lowering the P-value threshold from 0.05 to 0.005.