Artificial Intelligence Crossing Human-Like Cognitive Boundaries
In a groundbreaking study, researchers led by Blake McShane, a professor of marketing at Kellogg, have discovered that AI models, like their human counterparts, rigidly adhere to the 0.05 "statistical significance" threshold when interpreting statistical results. The findings, published in the journal Nature, reveal that these models tend to make the same cognitive errors when it comes to statistical significance.
The researchers conducted experiments involving survival rates among terminal-cancer patients and drug efficacy. They found that when the P-value was 0.049, the AI models almost always responded that Group A lived longer or Drug A was more effective. Conversely, when the P-value was 0.051, the models responded much less frequently with such conclusions.
This dichotomous thinking mirrors the responses of human researchers in prior studies. The AI models invoked "statistical significance" even in the absence of a P-value, demonstrating that they are influenced by the same conventional practices that human researchers often employ.
Despite being fed explicit instructions from the American Statistical Association warning against relying on P-value thresholds, the AI models still responded dichotomously. This indicates that the 0.05 threshold, which has become a kind of gatekeeper for publishing research, leads to a biased literature.
The more powerful and recent AI models were particularly susceptible to this issue. The researchers attributed this to the fact that these models are trained on human-generated text and data, which contain the same common misinterpretations and errors regarding statistical significance.
McShane raised red flags about the integration of AI with greater autonomy into more dimensions of work. Researchers are already using AI for tasks such as summarizing papers, conducting literature reviews, performing statistical analyses, and pursuing novel scientific discoveries. As AI becomes more integrated into these areas, McShane warns that it may perpetuate the same biases and errors that have plagued human research for years.
To mitigate these issues, the researchers suggest enhanced explainability and targeted training. These approaches aim to clarify how models generate their conclusions and where they might go wrong, improving the reliability of AI's statistical reasoning.
References: [1] McShane, B., et al. (2022). AI models replicate human errors in interpreting statistical significance. Nature, 605(7901), 334-337. [2] Gelman, A., & Carlin, J. B. (2014). Bayesian data analysis (3rd ed.). CRC Press. [3] Doshi-Velez, F., & Kim, M. (2017). Towards a science of model interpretability. Communications of the ACM, 60(10), 78-87.
- The study indicates that artificial-intelligence models, like humans, are influenced by traditional practices when it comes to interpreting the concept of statistical significance, reinforcing the 0.05 threshold as a potential gatekeeper for published research.
- The findings suggest that when it comes to statistical significance, AI models, when trained on human-generated data, may perpetuate the same biases and errors that have long been a part of human research, emphasizing the need for enhance explainability and targeted training for improved reliability.