Scheibehenne, Jamil, and Wagenmakers (2016; SJW) recently introduced Bayesian evidence synthesis (BES). They used it to combine evidence from seven published studies that examined the influence of social-norm messages on hotel towel reuse rates. Although most of the original studies provided non-significant results (p-value > .05), BES provided strong support for the effect (Bayes factor = 37). We think that this conclusion is wrong. We demonstrate that BES is inherently flawed because it pools data in a way that is vulnerable to a Simpson’s paradox, and that a Bayesian meta-analysis that avoids this problem produces weaker evidence.