%0 Dataset %T Accurately estimating global wheat yields at 4-km resolution during 1982-2020 (GlobalWheatYield4km) %J National Cryosphere Desert Data Center %I National Cryosphere Desert Data Center(www.ncdc.ac.cn) %U http://www.ncdc.ac.cn/portal/metadata/f61f04f0-2495-4490-837c-4a689ebaf54a %W NCDC %R 10.6084/m9.figshare.10025006.v1 %A Zhang Zhao %K Wheat;4 kilometers;global;global wheat production mapping system;simulated agricultural production system %X Accurate and spatially explicit information on global crop yield is paramount for guiding policy-making and ensuring food security. However, most public datasets are at coarse resolution in both space and time. Here, we used datadriven models to develop a 4-km dataset of global wheat yield (GlobalWheatYield4km) from 1982 to 2020. First, we proposed 15 a phenology-based approach to map spatial distributions of spring and winter wheat. Then we determined the optimal gridscale yield estimation model by comparing the performance of two data-driven models (i.e., Random Forest (RF) and Long Short-Term Memory (LSTM)), with publicly available data (i.e., satellite and climatic data from the Google Earth Engine (GEE) platform, soil properties, and subnational-level census data covering ~11000 political units). The results showed that GlobalWheatYield4km captured 82% of yield variations with RMSE of 619.8 kg/ha across all subnational regions and years. In addition, our dataset had a higher accuracy (R2~0.71) as compared with Spatial Production Allocation Model (SPAM) (R2 20 ~ 0.49) across all subnational regions and three years. The GlobalWheatYield4km dataset might play important roles in modelling crop system and assessing climate impact over larger areas.