ObjectiveCharacterizing the spatial distribution of the regional groundwater table is critical for effective groundwater management and pollution control. However, the limited number and uneven distribution of observation wells in many regions, including Jiangsu Province, China, make it difficult for traditional interpolation or physically based numerical models to provide reliable predictions. Interpolation methods such as Kriging depend heavily on well coverage, which restricts their applicability in data-scarce areas, while numerical models require large amounts of hydrogeological parameters and boundary conditions that are often unavailable in practice.
MethodsTo address these limitations, this study develops a machine learning-based framework that integrates multi-source data, including elevation, vegetation coverage, rainfall, distance from surface water, land surface temperature, and soil moisture. A dataset of 953 groundwater observations collected during the dry season, complemented by surface water levels and published measurements, was compiled and standardized. A deep neural network (DNN) was trained using 80% of the data, validated on 10%, and tested on the remaining 10%.
ResultsThe model achieved a determination coefficient (R2) of 0.91 on the test dataset, substantially outperforming ordinary Kriging (R2=0.63). The predicted groundwater table maps revealed clear large-scale patterns consistent with hydrogeological understanding, including a west-to-east flow gradient and discharge into the Yangtze River, Taihu Lake, and the East China Sea. Compared with nationwide groundwater models, the proposed approach provided finer spatial resolution and captured more local flow features. Validation in three representative demonstration areas-a coastal industrial park, a riverside development zone, and a cross-hydrogeological unit-confirmed that predicted groundwater flow directions matched observed values, even where monitoring wells were sparse. To enhance interpretability, Shapley additive explanations (SHAP) analysis was applied, which revealed that land surface temperature, vegetation cover, and distance to surface water exerted dominant influences at the provincial scale, while site-specific analyses emphasized the importance of local hydrological connectivity.
ConclusionOverall, the machine learning framework developed in this study provides an efficient and scalable tool for estimating groundwater table distributions in data-limited regions. By integrating diverse environmental factors, this approach improves predictive accuracy, enhances the spatial resolution of groundwater flow mapping, and offers insights into governing factors. The results highlight the potential of combining big data and artificial intelligence methods to support groundwater monitoring optimization, regional environmental impact assessments, and sustainable water resource management.