Abstract:
[Objective] Aiming at the problems of confusing descriptions and inconsistent coding of multi-source heterogeneous borehole data in urban underground space development, which seriously restrict the accuracy of 3D geological modeling, and addressing the bottlenecks that traditional manual standardization is inefficient and existing models struggle to handle data missing and long-range dependencies, this study aims to establish an efficient data-driven automatic strata standardization method. [Methods]Taking 2, 980 engineering boreholes in the Xiamen area as the research object, a deep learning standardization model based on SparseTransformer is proposed. First, based on relevant codes and Pearson correlation analysis, 12 key discriminative features such as water content and compression modulus are screened. Second, a sparse masking mechanism is designed to dynamically shield missing values during attention calculation, and a combined augmentation strategy of class-aware resampling and structured feature masking, along with the Focal Loss function, is introduced to solve the sample imbalance problem. Finally, strategies such as Bayesian optimization are adopted to achieve hyperparameter optimization. [Results]The results show that the precision, recall, and F1-score of the model on the test set reached 0.85, 0.84, and 0.85, respectively; compared with Random Forest (F1=0.62) and LSTM (F1=0.55), the performance is significantly improved. The confusion matrix shows that the model can effectively capture the stratigraphic sedimentary rhythm, and the classification accuracy for dominant categories such as cohesive soil and silt exceeds 80%. [Conclusion]This method not only breaks through the "forgetting" defect of traditional models in long-sequence geological data modeling but also solves the problem of long-tail distribution of engineering data through data augmentation technology. The research results validate the effectiveness of deep learning in geological data standardization and provide an intelligent data processing paradigm for building high-precision urban-level 3D geological models.