Research on filling methods of missing data in cultivated land quality evaluation
In the process of cultivated land quality data investigation and collection, there will be missing data due to human, environmental, and other factors. However, the current missing data-filling methods have insufficient applicability. In order to improve the cultivated land quality database and eval...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | ZH |
Publicado: |
Agro-Environmental Protection Institute, Ministry of Agriculture
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/349a7eaa1425435c846571df3a02b94a |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Sumario: | In the process of cultivated land quality data investigation and collection, there will be missing data due to human, environmental, and other factors. However, the current missing data-filling methods have insufficient applicability. In order to improve the cultivated land quality database and evaluation accuracy, it is important to explore missing data-filling methods in cultivated land quality evaluation. In this study, the cultivated land quality database of Conghua District Guangzhou City was used as the sample set. According to the spatial correlation and spatial distribution, the dataset was divided into spatial and non-spatial correlation datasets. Various filling methods were used to simulate the missing data filling, and a cross method was used to verify the accuracy. The results indicated the proportion of total outliers was less than 1.2%, and 25 factors such as elevation, temperature, and available zinc showed spatial correlation. The four-image nearest neighbor algorithm presented the highest filling accuracy for spatial association data, and the accuracy was as high as 80% when the missing rate was less than 20%. The accuracy decreased with the increase in the missing rate. The four-image nearest neighbor algorithm was followed by K-nearest neighbor algorithm(KNN), expectation maximization algorithm, multiple interpolation algorithm, and regression model algorithm. The four-image nearest neighbor algorithm showed better accuracy than K-nearest neighbor algorithm when the data was dense. For the non-spatial correlation dataset, the highest filling accuracy was the similar aggregation filling algorithm, which could maintain more than 80% accuracy within 25% of the missing rate, followed by expectation maximization algorithm, multiple interpolation algorithm, and regression model algorithm. To sum up, the four-image nearest neighbor algorithm and the similar aggregation filling algorithm proposed in this study show higher accuracy, more stable effect, and wider practicability than other algorithms for filling missing data in cultivated land quality evaluation. |
---|