Article Information
Corresponding Author: Sheng Miao , smiao@qut.edu.cn
Chao Liu, School of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao, China, liuchao@qut.edu.cn
Ruolan Mu, School of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao, China, muruolan@hotmail.com
Chuanlong Wang, School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China, Wcl20000322@outlook.com
Xiuhe Yuan, School of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao, China, YuanxiuheZS711@outlook.com
Sheng Miao, School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China, smiao@qut.edu.cn
Received: February 5 2024
Revision received: August 8 2024
Accepted: August 27 2024
Published (Print): February 28 2025
Published (Electronic): February 28 2025
1. Introduction
Wetlands have a rich biodiversity of ecological landscapes that can stabilize the environment and protect species. The Yellow River Delta (YRD) is a late-formed natural delta in China with a function in regional climate regulation and biodiversity conservation. However, the YRD has a special geographic location and a fragile ecosystem that is susceptible to the influence of external factors. Therefore, analyzing the landscape evolution process of the YRD can provide experiences, and references for future ecological development through the change rules in the past [1]. Following the advancements in deep learning technology, remote sensing combined with deep learning technology has been widely used to the identification of wetland landscape types [2], but the spatial and temporal evolution of wetland landscape types cannot be visualized. Optical flow algorithm combines an optical flow field with an image and describe the movements of an object through a flow of image luminance information. In order to identify the changes in wetland landscapes in different years, this paper introduces the optical flow algorithm to visualize the wetland landscapes, which better analyzes the evolution of wetland landscapes and facilitates the future management and development of wetlands.
This paper provides a classification for the landscape types of the YRD wetlands, and then uses the optical flow algorithm to visualize the spatial and temporal evolution of the landscape types. The results of this study have great significance for ecological environmental protection and ecological restoration in the YRD region. The main research contributions are as follows:
· Presents a semantic segmentation network based on the encoder and decoder structure of ResNet-18. It identifies and classifies the wetland landscape types in the YRD in 2000, 2013, and 2020.
· Uses the FlowNet2.0 model that combines convolutional neural network (CNN) and optical flow algorithm. The model visualizes and analyzes the spatial and temporal evolution of wetland landscape types in different years.
· The YRD has changed over the past 20 years as follows: farmland as a whole has moved to the northwest and has been replaced by building land. Natural wetland has mainly moved to the northeast side, and the area has decreased. Artificial wetland has mainly changed in the southeast, and the area has increased.
The following contents of this paper are structured as follows: Section 2 presents the related work. Section 3 presents the data and modeling for the study area. Section 4 presents the classification results for wetland landscapes and the identification and change analysis of the classification results by the optical flow algorithm. Section 5 summarizes the conclusions of this study. The research workflow is shown in Fig. 1.
2. Related Works
At present, the remote sensing data sources that can be utilized in wetland information extraction studies are mainly multispectral data, hyperspectral data and radar data. In large-scale, long-term dynamic monitoring of wetlands, Landsat, MODIS, and other series of satellite images provide long-term continuous data support for wetland research. Dou et al. [1] analyzed the spatial and temporal changes in land use, wetland migration and landscape patterns in the YRD in the last decade using satellite remote sensing data such as Landsat-8 and Sentinel-1, and identified the changes in wetland landscapes. Wetlands are severely degraded under the influence of climate change and human interference. Land use classification of wetlands is usually based on pixel interpretation of remote sensing images. Martinez Prentice et al. [2] demonstrated that the deep learning algorithm is better at classifying high-resolution pixel targets and has been widely used for classification and change detection in wetlands. Wang et al. [3] used deep learning algorithm to classify the land cover of wetland cities. The delineation of coastal wetland landscape categories needs to be strengthened to provide better planning and protection. Piaser and Villa [4] compared the performance of selected machine learning classifiers for detailed wetland vegetation type mapping. The integrated machine learning approach provides better recognition of wetland landscapes.
Remote sensing image change detection is used to obtain change information by comparing remote sensing images at different times. Deep learning can extract a richer representation of the features in the data through multilayer nonlinear transformations to better capture the various patterns and structures of the data. Zhao et al. [5] presented a fully convolutional network with an attention mechanism based on balanced sampling for remote sensing image change detection. Fu et al. [6] used a deep learning algorithm to construct an optimal marsh vegetation recognition model, and identified the best features for recognizing wetland vegetation types. Deep learning automatically learns useful features of input data through neural network models. CNNs are one of the typical supervised deep network models that outperform most machine learning algorithms and have become the main architecture for image recognition and classification. In recent years, CNN has been improved and optimized, and has been widely used in image classification such as vegetation classification, water body extraction, building extraction [7], road extraction, and so on. Helaly et al. [8] used a transfer learning approach to build a high-accuracy deep CNN architecture for human facial emotion recognition. The results show that the ResNet-18 model has the highest performance in terms of recognition accuracy.
The optical flow algorithm was proposed by the field of computer vision for pixel-level motion. The changes of pixels in remote sensing images are similar to the motion of objects in consecutive frames. The optical flow algorithm can be used in remote sensing image. However, the optical flow algorithm is not commonly used in remote sensing. Currently, only sparse optical flow algorithm is used for multimodal remote sensing image alignment in a single scene. It is more common to combine target detection with deep learning as an efficient detection method, Hamza et al. [9] proposed an efficient vehicle target detection method utilizing CNN and Cartesian product. Delibasoglu et al. [10] proposed a combination of background modelling approach and optical flow approach. By applying FlowNet to detect moving targets, the target pixels can be better detected.
At present, there are more studies using deep learning to classify the landscape of wetland remote sensing images, but there are fewer studies using target detection techniques to identify the spatial and temporal changes in wetland landscape types, which cannot intuitively see the direction of change and the degree of change in the evolution of each landscape category. To address the above problems, this paper adopts a semantic segmentation network based on the encoder and decoder structure of ResNet-18
to classify wetland landscapes in the YRD. In addition, this paper uses the optical flow algorithm to visualize and analyze the results of landscape classification in different years to represent the trend of wetland landscape types.
3. Materials and Methods
The YRD wetlands belong to the estuarine wetland ecosystem, which is also the most comprehensive wetland ecosystem in China. The YRD is located in northeastern Shandong Province, spread out in Dongying City, close to the Bohai Sea. The study area of this paper is the YRD wetland, which mainly includes four coastal districts and counties in Dongying City, namely Dongying District, Kenli District, Lijin County, and Hekou District. It is located between 37°16'-38°13'N and 117°55'-119°20'E, with a land area of about 7,128.30 km2. The geographic location is shown in Fig. 2.
Location of the study area.
3.1 Data Collection and Pre-processing
In this study, long time series remote sensing images were used to identify wetland types in the study area. The data sources selected for this study were Landsat 5 TM and Landsat 8 OLI/ TIRS, which were obtained from the open data supplied by the United States Geological Survey (USGS). These datasets offer a spatial resolution of 30 m, allowing for detailed analysis and interpretation. Landsat series satellites have a large time span, rich bands, and high spatial resolution, which can provide long time series of remote sensing images for multidisciplinary analysis, and have been widely used in vegetation identification, water body identification. The information of the data is shown in Table 1.
Landsat data information used in this study
In this paper, the Environment for Visualizing Images 5.6 (ENVI5.6) is used for the pre-processing. Firstly, image fusion between panchromatic and multispectral bands is performed, and then the fused images are subjected to radiometric, atmospheric and geometric corrections. To avoid image distortion or aberration caused by light refraction and noise during satellite imaging. Remote sensing images are pre-processed to ensure the accuracy of subsequent classification. The remote sensing images are cut into integer multiples of 256×256×3 because they are too large. After the dataset is divided and 80% of the total data is selected as the training set and 20% as the validation set. The wetland type data in the training and validation sets are labelled using LabelMe (https://labelme.io/). After remote sensing image data processing, a semantic segmentation network based on the structure of ResNet-18 encoder and decoder was used for wetland type identification, and the optical flow algorithm was used to represent the trend of wetland types.
3.2 Modeling
The deep CNN model chosen in this study is to extract image features using ResNet-18 as the backbone network of the encoder. The feature map is also upsampled using transposed convolution as a decoder to finally achieve semantic segmentation of remote sensing images. ResNet-18 has a network depth of 18 layers and contains 18 convolutional layers. Compared to the other CNN, ResNet-18 adds a short-circuit mechanism between every two layers. This short-circuiting mechanism allows the CNN model to perform residual learning between the two layers when extracting image features, the process is shown in Eq. (1):
where [TeX:] $$\alpha_l$$ means the first layer of inputs to the residual unit, [TeX:] $$\alpha_{l+2}$$ means the second layer of outputs from the residual unit, [TeX:] $$W_{l+1}$$ means the weight of the convolution kernel of the first layer of the residual unit, [TeX:] $$W_{l+2}$$ means the weight of the convolution kernel of the second layer of the residual unit, [TeX:] $$b_{l+1}$$ means the offset of the first layer of the residual cell, [TeX:] $$b_{l+2}$$ means the offset of the second layer of the residual cell, [TeX:] $$F\left(n_l\right)$$ means the output obtained by [TeX:] $$n_l$$ after a complete residual unit, and Relu(n) means the activation function.
When the residual unit is extended to several, let [TeX:] $$n_L$$ mean the output of the residual unit of the L-th layer. The specification is shown in Eq. (2):
The model network architecture is shown in Fig. 3. It consists of three parts: encoder, decoder, and convolution block attention module (CBAM). The input to the model is a Landsat remote sensing image of size 256×256×3. After entering into the encoding path, the image data goes into a separate convolutional structure with a convolutional kernel size of 7×7. The feature maps are processed by batch normalization, activation function, and a subsampling process in the maxpool layer.
Then the convolution module with added residual structure is used to extract shallow features to deep features of remote sensing images. The information of shallow features and deep features of the image are fused with feature information to enhance the feature expression ability of the model. The formula for the convolution process is shown in Eq. (3):
where X is the input feature map, W is the convolution kernel, i,j is the location of the convolution result and m,n is the convolution kernel size.
The image is then upsampled and a CBAM attention mechanism is added after the coding layer to enhance the performance of the segmentation network. The CBAM module incorporates the spatial attention mechanism (SAM) and the channel attention mechanism (CAM) to assign different weights on feature channels and pixel points, giving more weight to features with easily segmentable edges. The input to CBAM is an intermediate feature map, which is input to the CAM module to obtain channel attention weighted to different channels. The formula is shown in Eq. (4):
where σ is the activation function, AvgPool() and MaxPool() are global average pooling and max pooling.
Then, the feature map with applied channel attention is input to the SAM module. The spatial attention vector is obtained to act on the feature map. The formula shown in Eq. (5):
where σ is the activation function, AvgPool() and MaxPool() are global average pooling and max pooling, [TeX:] $$f^{7 \times 7}$$ is a 7×7 convolution operation.
Next, the extracted features are decoded. Each decoding layer contains two convolutional layers and one transpositional convolution layer for upsampling operations. The feature map is progressively reduced and linked to the output of the corresponding encoder layer. Finally, a series of convolutions and transposed convolutions are used to classify the output. With the deepening of the network, the shallow features are gradually weakened and feature dilution problem occurs during the upsampling process of the decoder. Therefore, the feature fusion module is used to take the encoder of the corresponding resolution, get the feature map and the input of the decoder for the Concat operation, followed by the transposed convolution for upsampling.
Semantic segmentation network based on the encoder and decoder structure of ResNet-18.
3.3 Visualization of Wetland Types Change
Optical flow algorithm is used for motion target detection, where continuous images are analyzed to establish the positional relationship of the images and obtain the motion trajectories. FlowNet network consists of CNN combined with optical flow algorithm. Therefore, FlowNet network can obtain more accurate results than traditional optical flow method. And then, the FlowNet2.0 network is improved by the FlowNet network. The FlowNet2.0 network consists of FlowNet-Simple and FlowNet-Corr. The motion detection of the target is achieved by converting the motion field into an optical flow field. The optical flow field is associated with an image and describes the motion of the object in terms of a stream of image luminance information. The optical flow diagram contains information about the rate and direction of the target object and the relationship with the surrounding environment. Different colors in the optical flow diagram represent different directions, while the depth of the color represents the instantaneous speed of the moving object. The process of applying the optical flow algorithm is shown in Fig. 4.
Application of optical flow algorithm.
3.4 Related Software and Code Descriptions
This study uses a semantic segmentation network based on the encoder and decoder structure of ResNet-18 (https://github.com/storydd/mrl_seg) and the FlowNet2.0 model of CNN combined with the optical flow algorithm (https://github.com/L1213822/flownet2). In this study, the deep learning framework PyTorch 2.0.1 based on Python 3.9 was used for computational acceleration using GPU, which can increase the training speed and model performance by moving the tensor and model parameters to GPU for computation. The model is trained in a workstation, and the configuration of the workstation used in this study is shown in Table 2.
Configuration of the training environment
4. Results and Discussion
4.1 Modeling Results
The years represented by the three models are labelled on the loss function curves of the three models. Due to the large span of years in the remote sensing images and the selection of different remote sensing satellite series, the bands of the remote sensing are not the same. Therefore, the models are constructed separately for each year. The loss function curves for the three models trained in this paper for three years are shown in Fig. 5, showing that the loss curves are in a rapidly decreasing state in the first 25 iterations. After 50 iterations, the loss exists up and down, but has been maintained at a low level, indicating that the classification accuracy of all three models has reached a better convergence state and there is no overfitting. After 125 iterations, the losses of the three models have reached a steady state, and there are only smaller fluctuations in the situation, indicating that the models are stable. The training set accuracies and validation set accuracies for different landscape classifications under the three models are shown in Table 3.
Due to the complexity of wetland types in the YRD, and there are numerous types of natural and artificial wetlands. Wetland types play different roles in ecological environment evaluation. In this paper, with reference to the International Convention on Wetlands and China's standard wetland classification system, the wetland landscape system in the study area is classified into five types: natural wetland, artificial wetland, farmland, building land, and grassland. Natural wetlands include shallow marine waters, marshes, mudflats, silty mudflats and rivers. Artificial wetlands include saltpans, paddy fields, reservoir ponds and aquaculture ponds. The landscape classification results for the three years are shown in Fig. 6.
Loss function curves for the three models: (a) 2000, (b) 2013, and (c) 2020.
Accuracy of different landscape classification under the three models
4.2 Visualization Results of the Optical Flow Algorithm
This paper selected three of the five landscape types—farmland, natural wetland, and artificial wetland—to do optical flow diagram analysis, and the results of the optical flow diagram are shown in Fig. 7.
Classification results for the study area of three years: (a) 2000, (b) 2013, and (c) 2020.
From the optical flow diagram, it is shown that the farmland decreased by 591 km2 during 2000 to 2013, with the main changes varying at the edges of the study area. Farmland decreased by 217 km2 during 2013 to 2020, with a large shift to the south. Farmland shifted mainly to the west side during 2000 to 2020 and was replaced by building land. Natural wetlands decreased by 273 km2 mainly during 2000 to 2013, with natural wetlands in the north moving to the northeast and natural wetlands in the east moving to the north. Natural wetlands changed little during 2013 to 2020, with a decrease of 172 km2. Artificial wetlands increased by 917 km2 during 2000 to 2013, changing mainly in the northern part of the study area, with the expansion of artificial wetlands to the south in the northwest and the expansion of artificial wetlands to the northeast both replacing previous grassland. Artificial wetlands increased by 104 km2 during 2013 to 2020, changing mainly in the northeast and southwest. Artificial wetlands change mainly in the southeast during 2000 to 2020.
(a–i) Optical flow diagram for the study area in the periods.
4.3 Discussion
During 2000 to 2020, farmland in the YRD is decreasing, mainly due to accelerated urbanization and changes induced by human activities. Natural wetlands are also decreasing, mainly due to wetland destruction caused by human activities and changes in river water quality and quantity. Artificial wetlands are increasing, mainly due to ecological restoration and conservation needs.
Therefore, the YRD should subsequently strengthen land planning and management and formulate strict land use policies and planning; promote ecological restoration and protection, implement wetland ecological restoration projects, restore the area and water quality of natural wetlands, improve the ecological functions of wetlands, and provide certain ecological functions and biodiversity protection; and to guide agricultural restructuring, promote sustainable agricultural development and protect limited farmland resources.
5. Conclusion
This paper uses a semantic segmentation network based on the encoder and decoder structure of ResNet-18 to classify the wetland landscape types in the YRD, and the classification accuracy of each wetland type reaches more than 90%. The FlowNet2.0 model, combined with the CNN and the optical flow algorithm, is used to represent the direction and degree of the landscape types. The methods in this paper can effectively classify the wetland landscape types and visualize the spatial and temporal evolution of wetland landscape types. By analyzing the changing characteristics of wetland landscape types in the study area, the areas in need of ecological protection can be identified and the protection measures can be implemented as soon as possible.
The future work will carry out more years of landscape identification and visualization analysis in the study area to explore the evolution pattern of the wetland landscapes in the YRD under continuous time series. The precise identification of vegetation in the YRD wetlands will also be carried out to analyze the evolution pattern of vegetation community structure. It will also comprehensively assess the ecological protection status of the YRD and offer a theoretical basis for subsequent development.
Conflict of Interest
The authors declare that they have no competing interests
Acknowledgement
This paper is the extended version of “Spatial and temporal evolution of vegetation based on optical flow algorithms,” in the 15th International Conference on Computer Science and its Applications (CSA2023) held in Nha Trang, Vietnam dated December 18-20, 2023.