Researcher degrees of freedom in statistical software contribute to unreliable results: A comparison of nonparametric analyses conducted in SPSS, SAS, Stata, and R

Document Type


Publication Date



Nonparametric procedures, Reproducibility, Researcher degrees of freedom, Statistical conclusion validity, Statistical software


Researcher degrees of freedom can affect the results of hypothesis tests and consequently, the conclusions drawn from the data. Previous research has documented variability in accuracy, speed, and documentation of output across various statistical software packages. In the current investigation, we conducted Pearson’s chi-square test of independence, Spearman’s rank-ordered correlation, Kruskal–Wallis one-way analysis of variance, Wilcoxon Mann–Whitney U rank-sum tests, and Wilcoxon signed-rank tests, along with estimates of skewness and kurtosis, on large, medium, and small samples of real and simulated data in SPSS, SAS, Stata, and R and compared the results with those obtained through hand calculation using the raw computational formulas. Multiple inconsistencies were found in the results produced between statistical packages due to algorithmic variation, computational error, and statistical output. The most notable inconsistencies were due to algorithmic variations in the computation of Pearson’s chi-square test conducted on 2 × 2 tables, where differences in p-values reported by different software packages ranged from.005 to.162, largely as a function of sample size. We discuss how such inconsistencies may influence the conclusions drawn from the results of statistical analyses depending on the statistical software used, and we urge researchers to analyze their data across multiple packages to check for inconsistencies and report details regarding the statistical procedure used for data analysis.

Journal Title

Behavior Research Methods





First Page


Last Page




This document is currently not available here.