|► We compared nine crop simulation models for spring barley at seven sites in Europe. ► Applying crop models with restricted calibration leads to high uncertainties. ► Multi-crop model mean yield estimates were in good agreement with observations. ► The degree of uncertainty for simulated grain yield of barley was similar to winter wheat. ► We need more suitable data enabling us to verify different processes in the models. In this study, the performance of nine widely used and accessible crop growth simulation models (APES-ACE, CROPSYST, DAISY, DSSAT-CERES, FASSET, HERMES, MONICA, STICS and WOFOST) was compared during 44 growing seasons of spring barley (Hordeum vulgare L) at seven sites in Northern and Central Europe. The aims of this model comparison were to examine how different process-based crop models perform at multiple sites across Europe when applied with minimal information for model calibration of spring barley at field scale, whether individual models perform better than the multi-model mean, and what the uncertainty ranges are in simulated grain yields. The reasons for differences among the models and how results for barley compare to winter wheat are discussed. Regarding yield estimation, best performing based on the root mean square error (RMSE) were models HERMES, MONICA and WOFOST with lowest values of 1124, 1282 and 1325 (kg ha(-1)), respectively. Applying the index of agreement (IA), models WOFOST, DAISY and HERMES scored best having highest values (0.632, 0.631 and 0.585, respectively). Most models systematically underestimated yields, whereby CROPSYST showed the highest deviation as indicated by the mean bias error (MBE) (-1159 kg ha(-1)). While the wide range of simulated yields across all sites and years shows the high uncertainties in model estimates with only restricted calibration, mean predictions from the nine models agreed well with observations. Results of this paper also show that models that were more accurate in predicting phenology were not necessarily the ones better estimating grain yields. Total above-ground biomass estimates often did not follow the patterns of grain yield estimates and, thus, harvest indices were also different. Estimates of soil moisture dynamics varied greatly. In comparison, even though the growing cycle for winter wheat is several months longer than for spring barley, using RMSE and IA as indicators, models performed slightly, but not significantly, better in predicting wheat yields. Errors in reproducing crop phenology were similar, which in conjunction with the shorter growth cycle of barley has higher effects on accuracy in yield prediction.