||• We simulate biomass, soil water content (SWC) and temperature (ST) in grasslands. • We compare nine models to the multi-model median (MMM) at nine sites. • With model calibration, we obtain satisfactory estimates of ST, less of SWC and biomass. • We observe discrepancies across models in the simulation of grassland processes. • We improve performance with multi-model approach. This study presents results from a major grassland model intercomparison exercise, and highlights the main challenges faced in the implementation of a multi-model ensemble prediction system in grasslands. Nine, independently developed simulation models linking climate, soil, vegetation and management to grassland biogeochemical cycles and production were compared in a simulation of soil water content (SWC) and soil temperature (ST) in the topsoil, and of biomass production. The results were assessed against SWC and ST data from five observational grassland sites representing a range of conditions – Grillenburg in Germany, Laqueuille in France with both extensive and intensive management, Monte Bondone in Italy and Oensingen in Switzerland – and against yield measurements from the same sites and other experimental grassland sites in Europe and Israel. We present a comparison of model estimates from individual models to the multi-model ensemble (represented by multi-model median: MMM). With calibration (seven out of nine models), the performances were acceptable for weekly-aggregated ST (R² > 0.7 with individual models and >0.8–0.9 with MMM), but less satisfactory with SWC (R² < 0.6 with individual models and < ∼ 0.5 with MMM) and biomass (R² < ∼0.3 with both individual models and MMM). With individual models, maximum biases of about −5 °C for ST, −0.3 m3 m−3 for SWC and 360 g DM m−2 for yield, as well as negative modelling efficiencies and some high relative root mean square errors indicate low model performance, especially for biomass. We also found substantial discrepancies across different models, indicating considerable uncertainties regarding the simulation of grassland processes. The multi-model approach allowed for improved performance, but further progress is strongly needed in the way models represent processes in managed grassland systems.