Climate change is causing permafrost in the Qinghai-Tibet Plateau to degrade, triggering thermokarst hazards and impacting the environment. Despite their ecological importance, the distribution and risks of thermokarst lakes are not well understood due to complex influencing factors. In this study, we introduced a new interpretable ensemble learning method designed to improve the global and local interpretation of susceptibility assessments for thermokarst lakes. Our primary aim was to offer scientific support for precisely evaluating areas prone to thermokarst lake formation. In the thermokarst lake susceptibility assessment, we identified ten conditioning factors related to the formation and distribution of thermokarst lakes. In this highly accurate stacking model, the primary learning units were the random forest (RF), extremely randomized trees (EXTs), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost) algorithms. Meanwhile, gradient boosted decision trees (GBDTs) were employed as the secondary learning unit. Based on the stacking model, we assessed thermokarst lake susceptibility and validated accuracy through six evaluation indices. We examined the interpretability of the stacking model using three interpretation methods: accumulated local effects (ALE), local interpretable model-agnostic explanations (LIME), and Shapley additive explanations (SHAP). The results showed that the ensemble learning stacking model demonstrated superior performance and the highest prediction accuracy. Approximately 91.20% of the total thermokarst hazard points fell within the high and very high susceptible areas, encompassing 20.08% of the permafrost expanse in the QTP. The conclusive findings revealed that slope, elevation, the topographic wetness index (TWI), and precipitation were the primary factors influencing the assessment of thermokarst lake susceptibility. This comprehensive analysis extends to the broader impacts of thermokarst hazards, with the identified high and very high susceptibility zones affecting significant stretches of railway and highway infrastructure, substantial soil organic carbon reserves, and vast alpine grasslands. This interpretable ensemble learning model, which exhibits high accuracy, offers substantial practical significance for project route selection, construction, and operation in the QTP.
Recently, ensemble multiple deep learning (DL) classifiers has been reported to be an effective method for improving remote sensing classification accuracy. Although these approaches still follow the conventional pattern of inputting instance features and outputting corresponding classes, they often overlook the intrinsic relationships between pixels beyond their spatial features. As a result, the diversity in the ensemble classification results primarily relies on different DL models. However, training the DL models consumes a significant amount of time, and training multiple networks not only incurs additional time costs but also affects the overall efficiency. To address this, a new approach has been proposed in this paper, which takes advantage of the relationships between pixels and their combinations to generate diverse classification results. It’s a novel ensemble classification framework, termed as the Doublet-Based Ensemble Classification Framework (DBECF), which eliminates the need for multiple classifiers. The DBECF starts by utilizing the training set to combine different samples to generate doublets. Then, features are assigned to these doublets through an exponentiation operation, resulting in a doublet training set. Using both the original training set and the derived doublet datasets, the DBECF is trained. For each input pixel, the DBECF produces multiple classification results, which are then integrated to obtain a more accurate output. To validate the proposed approach, experiments were conducted on three datasets, including multispectral images, hyperspectral images, and time series images. The maximum accuracies achieved by DBECF on the three datasets are 87.80 %, 97.71 %, and 83.51 %, respectively. In comparison to the contrastive methods, the incremental improvements in accuracy are 3.73 %, 7.66 %, and 9.16 %, respectively. The experimental results indicate that no matter using DL or non-deep learning for training, our proposed framework achieves progress on accuracy improvement outperforming classifications using comparative approach that based on single instance. This research provides a new perspective on the combination of DL and ensemble learning, highlighting its important implications and practical value in enhancing classification accuracy and efficiency.
The thawing of permafrost on the Qinghai-Tibet Plateau (QTP) leads to more frequent occurrences of thaw slump (TS), which have significant impacts on local ecosystems, carbon cycles, and infrastructure development. Ac-curate recognition of TS would help in understanding its occurrence and evolution. Machine learning capabilities for TS recognition are still not fully exploited. We systematically evaluate the performance of machine learning models for TS recognition from unmanned aerial vehicle (UAV) and propose an ensemble learning object-based model for TS recognition (EOTSR). The EOTSR has the following advantages: 1) pioneering the introduction of spatial information to assist in recognition; 2) the misclassification of recognition models is improved by object -based technology; and 3) attempting to integrate the strengths of different machine learning models to obtain a recognition accuracy no less than that of commonly used deep learning models. The results show that object -based technology is more suitable for TS recognition than pixel-based technology. Recursive feature elimina-tion (RFE)-based feature selection proves that texture and geometry are effective complements to TS recognition. Among the improved object-based machine learning models, support vector machine (SVM) has the highest recognition accuracy, with an overall accuracy of 93.06 %. McNemar's test proves that EOTSR significantly improves TS recognition compared to a single model and achieves an overall accuracy of 97.32 %. The EOTSR model provides an effective recognition method for the increasingly frequent TS events in the permafrost regions of the QTP, and can produce label data for deep learning models based on satellite imagery.
Although detailed spatial and temporal distribution of soil moisture is crucial for numerous applications, current global soil moisture products generally have low spatial resolutions (25-50 km), which largely limit their application at local scales. In this study, we developed a high-resolution soil moisture retrieval framework based on ensemble learning by integrating Landsat 8 optical and thermal observations with multi-source datasets, including in-situ measurements from 1,154 stations in the International Soil Moisture Network, the Soil Moisture Active Passive (SMAP) soil moisture product, the ERA5-Land reanalysis dataset, and auxiliary datasets (terrain, soil texture, and precipitation). Two widely used ensemble learning models were explored and compared using ten-fold cross-validation. The extreme gradient boosting (XGBoost) model performed slightly better than the random forest (RF) model, with a root mean square error (RMSE) of 0.047 m(3)/m(3) and correlation coefficient (R) of 0.952, respectively. Further validation using data from four independent soil moisture networks demonstrated that the prediction accuracy of the XGBoost model was comparable to the SMAP soil moisture product, but with a much higher spatial resolution. The model was finally used to map soil moisture over the high-altitude Tibetan Plateau, which is especially sensitive to climate change, from May to September of 2015. The comparison between our fine-scale soil moisture map at 30 m resolution and the coarse-scale SMAP soil moisture product (36 km) revealed high spatial consistency. These results suggest that there is potential to generate accurate soil moisture products globally at 30 m spatial resolution from the long-term Landsat archive. This finding has practical implications in scenarios requiring fine-scale soil moisture maps, such as climate change and permafrost modeling, hydrological and land surface modeling, and agriculture monitoring.