Data-Driven Airflow Prediction for Wastewater Treatment Plant Aeration System

Xuefei Li , Changqing Liu , Shuqi Liu and Sheng Miao

Abstract

Abstract: A wastewater treatment plant is an intricate system with a wealth of information, where the aeration system of the active sludge process is designed to provide oxygen to microorganisms. Owing to the time delay in biochemical reactions, adjustments made by operational staff to the airflow often lead to delayed changes in dissolved oxygen concentration, frequently causing overaeration. The paper introduces a machine learning model that utilizes water quality indicators and air blower indicators to predict current airflow. By leveraging the airflow predicted by this model, the dissolved oxygen concentration for the next hour is successfully maintained within the optimal range of 2 mg/L to 4 mg/L. In the case of airflow prediction, the Transformer model proved more effective than the random forest and long short-term memory models, owing to its self-attention model architecture. In conclusion, the study demonstrates the successful applicability of machine learning models to predict airflow on the promise of maintaining dissolved oxygen stability. These findings present a data-driven approach to guarantee the steady operation of wastewater treatment plants.

Keywords: Airflow Prediction , Decision Support , Machine Learning , Wastewater Treatment Plant

1. Introduction

Wastewater treatment plants (WWTPs) employ diverse equipment to purify wastewater, functioning as intricate information systems that gather monitoring data from various devices. Among these devices, the aeration system plays a crucial role by supplying oxygen to facilitate the growth of microorganisms in the active sludge process. Decisions regarding the operation of aeration systems should be adjusted regarding real-time influent quality and effluent quality [1], However, the time lag issue in water quality monitoring data persists due to limitations imposed by monitoring technology and the intricate nature of biochemical reactions. In the actual WWTP, operation staff do not adjust airflow according to water quality changes in time. Even if the staff does not adjust the airflow, increasing oxygen can enhance the treatment effect to some extent. However, excessive aeration may cause the effluent quality to exceed standards, as it disperses the activated sludge, leading to reduced processing capacity. Essentially, varying operational decisions directly impact treatment effectiveness, as illustrated in Fig. 1. Presently, most studies rely on mathematical simulation models, such as the activated sludge model and the benchmark simulation model, to analyze changes in pollutant concentrations. However, the simulation process involves numerous assumptions, leading to results that may deviate from actual reactions [2,3]. This is particularly critical for anaerobic-anoxic-oxic (AAO) processes, where the aeration system is specifically designed to supply oxygen in an aerobic reactor. Approximately 75% of the overall energy consumption in WWTPs is attributed to aeration systems and pumps [4]. Notably, air blowers alone account half of this energy consumption. Consequently, there is a pressing need for a more precise method of predicting water quality and airflow. This method aims to ensure that the effluent consistently adheres to relevant standards. This paper introduces a real-time airflow prediction method, leveraging real-time influent and effluent quality data to overcome the challenges associated with time lag. Guided by the insights derived from these predictions, this data-driven decision support system offers the potential to reduce wastewater treatment costs.

Fig. 1.
Different decision support affecting wastewater treatment effect.

When the influent organism concentration is high, indicating a need for increased oxygen to meet effluent quality standards. While higher airflow is essential, it can adversely impact the quality of the active sludge and consequently raise treatment costs—an undesirable outcome for WWTPs. In essence, the relationship between influent and effluent quality, as well as airflow, constitutes a complex dynamic. Traditional data analysis methods, like linear regression, encounter difficulties in handling the uncertainty and nonlinear nature of these relationships [5]. Machine learning (ML) technology has gained widespread adoption across various domains due to its capacity to handle extensive datasets and intricate relationships [6,7]. Currently, water quality and soft sensors have found applications in WWTPs. Despite the potential benefits, the broader implementation of ML in this context faces challenges due to a shortage of wastewater treatment professionals. Notably, models like XGBoost have proven effective in predicting sludge yield [8]. Additionally, deep-learning recurrent neural networks have shown promise in predicting concentrations of ammonia and nitrate [9]. Artificial Neural Networks remain a dominant algorithm for water quality prediction [10]. Moreover, genetic algorithms and expert systems play a crucial role in optimizing various aspects of WWTPs, including air blower flow, carbon source addition, and dissolved oxygen setpoint.

The primary objective of this paper is to offer airflow decision support based on the time lag in WWTP and steer away from the current “trial and error” approach to operation. To achieve this goal, the paper employs random forest (RF), long short-term memory (LSTM), and Transformer models to predict current airflow based on indicators related to influent and effluent quality, as well as air blower indicators. The structure of the paper is outlined as follows: Section 2 introduces the WWTP process and question definition. Besides, the three ML algorithms, along with model performance evaluation metrics are also been demonstrated. Section 3 presents the experimental results and engages in a discussion of the findings. Finally, Section 4 encapsulates the conclusion of this study.

2. Materials and Methods

2.1 Treatment Process and Question Definition

The data was collected from a WWTP located in Rizhao City, Shandong Province, China, where the economy is growing rapidly. More and more municipal wastewater arrived at this WWTP transported by drainage basins, except for industrial wastewater exceeding influent standards. The WWTP design capacity is [TeX:] $$\begin{equation} 50,000 \mathrm{~m}^3 / \text { day } \end{equation}$$, the main process includes the primary clarifier, the biochemical reactor, the secondary clarifier, and the advanced treatment process. The nitrogen and phosphorus removal process method is AAO process. The main wastewater treatment process and monitoring indicators are as shown in Fig. 2

Fig. 2.
Treatment process and monitoring parameters of wastewater treatment plant.

This paper gathered diverse monitoring data spanning from June 2020 to June 2021, comprising two primary categories: water quality indicators and air blower indicators. Water quality indicators encompass dissolved oxygen (DO), mix liquor-suspended solids (MLSS), nitrate concentration (NO3-N), inflow chemical oxygen demand into AAO reactor (CODin), inflow total nitrate into AAO reactor (TNin), outflow COD from AAO reactor (CODef), and outflow total nitrate from AAO reactor (TNef). On the other hand, air blower indicators include air blower power (p), rotational speed (n), and airflow (Qair), totaling 10 indicators. Distinguishing itself from mathematical simulation methods, this study utilizes a dataset obtained from the real-time operation of devices and control systems. This real-world data offers a more accurate representation of the actual treatment process compared to data derived from simulation models. The values for CODin and CODef are calculated from influent and effluent quality data obtained from an online monitoring web provided by the government. The removal rate for the primary clarifier is set at 30%, while the secondary clarifier and advanced treatment process exhibit a removal rate of 10%. TNin and TNef are identical to the influent and effluent quality of WWTP monitoring data since nitrate removal primarily occurs in the AAO reactor.

Defining the question is a critical initial step in selecting an appropriate analysis method to address the complexity of the problem at hand. As highlighted in Section 1, the intricate relationship between airflow and the DO concentration poses a significant challenge. A notable characteristic of these processes is the presence of time lag, encompassing delays in aeration effects and feedback from effluent quality monitoring—essential references for operational decisions. Achieving stable DO levels necessitates adjusting airflow in response to influent quality. The introduction of airflow into wastewater raises DO concentration, but the subsequent oxygen consumption by microorganisms causes a decrease. The relationship between oxygen dissolution rate and consumption rate is unequal and subject to time lag, varying with different microorganism states. Consequently, changes in airflow may not be immediately reflected in DO concentration values, leading to notable fluctuations in operations relying on current DO levels. Ensuring that every effluent quality indicator meets the standard is paramount for WWTPs. However, operational staff often maintain airflow at higher levels—resulting in overaeration—to ensure effluent compliance with relative standards. In essence, WWTPs exhibit complex and nonlinear relationships among factors such as airflow, DO, and effluent quality. Consequently, there is a need for a data-driven airflow prediction method based on advanced ML algorithms. This method should output data-supported airflow decisions to maintain DO stability. In this paper, three ML algorithms—RF, LSTM, and Transformer—are employed to predict present airflow, with the objective of sustaining DO stability within the range of 2 mg/L to 4 mg/L in the subsequent hour and preventing overaeration phenomena.

It is essential to recognize that the impact of airflow is not continuous: its effect dissipates as the batch of water enters the subsequent treatment process. Within the wastewater treatment domain, the hydraulic retention time (HRT) serves as a critical indicator, reflecting the duration that wastewater resides in reactors. When the time step exceeds the HRT, it signifies the waning influence of the present batch airflow. These considerations offer valuable insights for determining parameters in ML models. In the context of this study, aligned with the actual treatment process, the HRT is established at 8 hours. This implies that the most recent 8 hours of monitoring data are utilized as inputs for the model. The inputs encompass the current moment and the first 8 hours of parameter data, including water quality and air blower data. The model output is set as the present airflow that can maintain the stability of DO in the subsequent hours. This approach ensures a comprehensive consideration of relevant factors within the given timeframe.

2.2 Machine Learning Algorithm

RF model, a commonly used method for regression and classification, was introduced in 2017 and involves the construction of multiple decision trees. This model relies on creating subsamples with replacements from the dataset, ensuring they share the same distribution. Each decision tree within the forest independently evaluates and classifies new input samples. The RF model considers the category with the highest frequency among the classification results as the final outcome. Currently, RF models have found application in predicting influent water quality and membrane performance. The procedure for implementing the RF model is as follows: Supposed [TeX:] $$\begin{equation} x_1, x_2, \ldots, x_n \in \mathrm{D} \end{equation}$$, and the numbers of estimators (t), maximum depth (L), and minimum depth (p) were screened and set. The algorithm would select the bootstrap datasets, for which the features m was random. Once the node is split, node prediction y(x) is calculated and saved. The error was calculated by Eq. (1), and the result of the RF model was the average value. In this paper, the number of decision trees is set to 50, with a maximum depth of 25 for each tree.

(1)
[TeX:] $$\begin{equation} E=\frac{1}{N} \sum_{i=1}^N\left(y_i-y_m\right)^2 \end{equation}$$

where E is the error of the node, N is the sample number of the node, yi is the true value, and ym is the predicted value.

The LSTM model, introduced in 1997, brought forth a distinctive mechanism known as a “gate” to regulate the flow of information. The primary goal of the LSTM model was to address challenges related to vanishing and exploding gradients. As a type of gated neural network, its fundamental structure includes an input gate, output gate, forget gate, and cell state. This unique architecture enables the LSTM model to effectively handle time-series data, taking into account not only the present input but also the historical dynamics. Over time, the LSTM model has found applications in various fields, including water quality prediction, text recognition, and other tasks involving time-series data. Correlations in historical water quality and airflow data are learned by the model to extract hidden features h. The hidden feature h is updated step by step as constant time advances new x inputs to the recursive neuron. The hidden features h for each time step after the computation of the current layer is completed are passed on to the next layer. After that, the h of the last neuron in the last LSTM layer will be passed to the fully connected layer. The equations are as follows in Eqs. (2)–(9):

(2)
[TeX:] $$\begin{equation} f_t=\sigma\left(W_f \cdot\left[h_{t-1}, x_t\right]+b_f\right), \end{equation}$$

(3)
[TeX:] $$\begin{equation} i_t=\sigma\left(W_i \cdot\left[h_{t-1}, x_t\right]+b_i\right), \end{equation}$$

(4)
[TeX:] $$\begin{equation} \sigma(z)=\frac{1}{1+e^{(-z)}} \end{equation}$$

(5)
[TeX:] $$\begin{equation} C_t^{\prime}=\tanh \left(W_c \cdot\left[h_{t-1}, x_t\right]+b_c\right), \end{equation}$$

(6)
[TeX:] $$\begin{equation} \tanh (x)=\frac{\sinh (x)}{\cosh (x)}=\frac{e^x-e^{(-x)}}{e^x+e^{(-x)^{\prime}}} \end{equation}$$

(7)
[TeX:] $$\begin{equation} C_t=f_t * C_{t-1}+i_t * C_t^{\prime}, \end{equation}$$

(8)
[TeX:] $$\begin{equation} o_t=\sigma\left(W_o \cdot\left[h_{t-1}, x_t\right]+b_o\right), \end{equation}$$

(9)
[TeX:] $$\begin{equation} h_t=o_t * \tanh \left(C_t\right), \end{equation}$$

where t is the input of the current timestamp of the model, it corresponds to all the indicators to be entered at the current moment of the data. t-1 indicates the previous timestamp of the current timestamp. In this study, the temporal resolution is 1 hour. The initial inputs to the model are the randomly taken values of h and C also known as h0 and C0 and the input x0 at the initial moment. Each subsequent step has a new x input and a previous momentary memory cell passing in h and C. Wf, Wt, WC, and Wo denote the different weights that the model needs to learn, respectively. bf, bi, bC, and bo indicate bias terms.

In this paper, the model detail of LSTM and Transformer are shown in Fig. 3. The LSTM model minimum unit is 8×13. The loss function used in the model is mean squared error and the optimizer is Adam with a learning rate of 1e-4. The batch size is 32, and each batch has a 32×8×13 tensor as input. The hidden layer has a size of 2 and the dropout layer has a dropout rate of 20%.

Fig. 3.
Timestep, input, and output of LSTM and Transformer model.

The Transformer model, introduced in 2017, has found wide applications in natural language processing, presenting a novel attention mechanism architecture. This model employs the network structure of the multi-head attention mechanism, allowing it to access historical information across sequences during training. This ability enables the model to extract longer temporal correlation features, allowing neural networks to prioritize important features during training while minimizing the impact of less relevant ones. Additionally, the integration of innovative mechanisms, such as residual connections, effectively addresses the potential issue of gradient explosion. The model structure as is shown in Fig. 4, and the main calculation equation as followed.

(10)
[TeX:] $$\begin{equation} P E_{(\text {time }, 2 i)}=\sin \left(\text { time } / 10000^{2 i / d_{\text {model }}}\right), \end{equation}$$

(11)
[TeX:] $$\begin{equation} P E_{(\text {time }, 2 i+1)}=\cos \left(\text { time } / 10000^{2 i / d_{\text {model }}}\right), \end{equation}$$

(12)
[TeX:] $$\begin{equation} \left[\begin{array}{l} Q \\ K \\ V \end{array}\right]=h_t\left[\begin{array}{l} W^q \\ W^k \\ W^v \end{array}\right], \end{equation}$$

(13)
[TeX:] $$\begin{equation} \text { Attention }(Q, K, V)=\operatorname{softmax}\left(\frac{Q K^T}{\sqrt{d_k}}\right) V, \end{equation}$$

where PE(time,2i) and PE(time,2i+1) is the sin function and cos function, respectively, giving the ability to time sequence or location information in hidden data. Where the time represents a list of location numbers for all time windows containing water quality indicator data. i is when the location of the dmodel represents the mapping dimension consistent with the feature embedding location of the data. Time location coding is accomplished by performing the above operations on the location of water quality data in the form of time grid. The internal structure involves two objects, the query sequence Q and the key-value sequence (K,V). [TeX:] $$\begin{equation} Q=\left[q_1, q_2, \ldots, q_n\right] \in \mathbb{R}^{\mathrm{d}^* n}, K=\left[k_1, k_2, \ldots, k_m\right] \in \mathbb{R}^{d^* m}, V=\left[v_1, v_2, \ldots, v_{n n}\right] \in \mathrm{R}^{\alpha^* m} \end{equation}$$. Wq,Wk,Wv are the corresponding weights. [TeX:] $$\begin{equation} \sqrt{d_k} \end{equation}$$ is the normalization factor. The value of dk is the length of K, and the output of the hidden layer [TeX:] $$\begin{equation} h_t=\left[h_1, h_2, \ldots, h_{\mathrm{n}}\right] \in \mathrm{R}^{d^* m} \end{equation}$$ preserves the hidden state sequence of historical state data features. The softmax activation function calculation for normalization process, and the weights are summed with the hidden vectors of V to obtain Attention(Q,K,V).

Fig. 4.
The structure of the Transformer model.

In this paper, an encoder structure, augmented with expert knowledge, is utilized to enhance the Transformer model. The output of the Transformer-encoder model is the current airflow, which is adjusted to simultaneously meet the DO concentration requirements. The training process involves segmenting the timesteps into batches, which are subsequently input into the model for training.

The model initially assigns a high-dimensional hidden representation to each indicator through feature embedding, while simultaneously incorporating positional information in time via timestep encoding. Subsequently, the hidden representation of water quality is fed into the encoder’s hidden layers to grasp the intricate relationship between water quality and airflow in a time series. The principal components of the encoder encompass a multi-head attention mechanism, a residual structure, layer normalization, and a feed-forward neural network. The multi-head attention mechanism deduces the correlation between each time step and other indicators in the water quality data, assigning a hidden representation through scoring that emphasizes crucial relationships and diminishes less significant ones. Essential features in the water quality information are further extracted utilizing residual networks, layer normalization, and feed-forward neural networks. Lastly, a fully connected neural network is employed to condense dimensionality and generate the predicted value for the current fan airflow. The Transformer employs the encoder component for data processing. It incorporates positional embedding into the time series data, using an embedding dimension of 512, indicating that the input layer’s feed-forward network comprises 512 neurons. The encoder is structured with 2 layers and utilizes 8 heads in the multi-head attention mechanism. The architecture concludes with two fully connected layers, with one of them functioning as the output layer.

Before feeding the data into the model, a pre-processing step was undertaken to address null values and incorrect values in the monitoring data. This paper employed two methods for this purpose: k-nearest neighbors (KNN) and linear interpolation. KNN is a well-established classification algorithm that assigns a sample to a particular class based on the majority class among its KNN in the feature space. On the other hand, linear interpolation utilizes pixel values from neighboring points and assigns different weights to these points based on their distance from the interpolation point. This approach helps in mitigating the impact of missing or incorrect values in the dataset.

Evaluating model performance is essential for accurately assessing prediction accuracy. Numerous scholars have employed various performance metrics to compare predicted values with actual values. In this study, three metrics—mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE)—were utilized for error analysis. The formulas for calculating these metrics are presented in Eqs. (14) through (16):

(14)
[TeX:] $$\begin{equation} \mathrm{MAE}=\frac{1}{n} \sum_{i=1}^n\left|y_i-\hat{y}_i\right|, \end{equation}$$

(15)
[TeX:] $$\begin{equation} \mathrm{MSE}=\sqrt{\frac{1}{n} \sum_{i=1}^n\left(y_i-\hat{y}_i\right)^2}, \end{equation}$$

(16)
[TeX:] $$\begin{equation} \text { MAPE }=\frac{1}{n} \sum_{i=1}^n \frac{y_i-\hat{y}_i}{\hat{y}_i} \times 100 \%, \end{equation}$$

where yi is the true value and [TeX:] $$\begin{equation} \hat{y}_i \end{equation}$$ is the predicted value.

3. Results and Discussion

The WWTP encompasses numerous treatment units, including primary treatment units, AAO reactor, and advanced treatment units, generating a substantial amount of data that constitutes a large-scale information system. This study utilized hourly monitoring data collected over 1 year, resulting in a dataset comprising 8,760 samples. The statistical characteristics of the dataset are detailed in Table 1.

Table 1.
Dataset statistical properties

The model code in this study was implemented using the PyTorch ML library, and computational tasks were executed on an NVIDIA GeForce RTX 3090 graphics card. Fig. 5 illustrates the training loss variance for both the LSTM and Transformer models. The decreasing loss over training time indicates that all three models effectively handled and predicted the time-series data. Optimal performance is characterized by training and validation losses gradually diminishing until reaching a state of constancy, with minimal disparity between the final loss values. Specifically, the LSTM model was trained for 25 epochs, achieving a rapid decrease in loss. The training loss stabilized around 0.05, while the validation loss remained consistently at 0.05. Furthermore, the Transformer model underwent training for 100 epochs, demonstrating stable training and validation losses at around 0.01.

Fig. 5.
The loss variance during the training epochs: (a) LSTM model and (b) Transformer model.

The model prediction performance was evaluated using three metrics: MAPE, RMSE, and MAE. The results, presented in Table 2, highlight the most accurate prediction model. The RF, LSTM, and Transformer models demonstrated good prediction performance. In general, MAPE primarily focuses on the percentage of prediction errors, while RMSE and MAE consider the absolute values of prediction errors. Additionally, RMSE assigns a higher weight to larger errors, whereas MAE calculates the average absolute difference between predicted values and true values, disregarding the direction of the errors. Specifically, the MAPE values for the three models RF, LSTM, and Transformer are 3.36%, 2.21%, and 1.91%, respectively. Furthermore, in the error comparison between the two metrics, RMSE and MAE, the Transformer model exhibited the smallest error.

Table 2.
Model error comparison

The models were employed to predict the present airflow based on the promising results from the prediction performance analysis. Fig. 6 illustrates the distribution of predicted values and true values. Combining numerical error analysis, it is evident that the Transformer model’s prediction results are more accurate compared to the RF model and LSTM model. This is further supported by the distribution curves in Fig. 5, where a smaller distance between each point and the line (y=x) indicates closer proximity of predicted values to the true values. The red distribution curve represents the true values, while the blue distribution curve represents the predicted values. The closer the trend of the two curves, the more precise the prediction can be. From Fig. 5, the Transformer prediction values distribution curve is the most closely with true values, among the above three model results. Besides, according to the distribution curve of the true values, the majority of the data is clustered around 5,500 m3/min, implying that 5,500 m3/min might be the empirical value for the aeration system of this WWTP.

Fig. 6.
The data distributions of predicted values vs. true values by three models: (a) RF, (b) LSTM, and (c) Transformer.

Recent literature has focused on crucial parameters in WWTPs, including influent quality and quantity, and DO as part of the aeration system control parameters. While there have been studies on DO prediction, airflow prediction has received less attention. DO prediction, being an indicator directly reflecting the treatment state, has been a research hotspot. However, calculating DO values involves air diffusion methodologies and mathematical formulas, leading to larger errors when using DO as the model output. In practical WWTPs, airflow is the control variable. In this paper, airflow is set as the model output, aiming to provide primary decision support while ensuring stable DO levels. Considering model characteristics, the Transformer model exhibits advantages over the RF model in handling time-series data, learning capabilities, and the trainability of model parameters. The Transformer model, based on a self-attention mechanism, is a deep neural network with strong capabilities in handling time-series data. In contrast, LSTM models may encounter issues such as vanishing or exploding gradients when dealing with long-time series data, limiting their ability to capture long-term dependencies. The Transformer model typically has more trainable parameters and can be trained end-to-end using large-scale datasets. Conversely, LSTM models have fewer parameters and may be more prone to overfitting with smaller datasets. Therefore, in the airflow prediction task of this study, the Transformer model outperformed the RF and LSTM models in terms of prediction accuracy. The predicted results offer data-based decision support for operations, and with the guidance of data-driven analysis, the overaeration phenomenon can be avoided while ensuring stability in wastewater treatment effectiveness.

4. Conclusion

This paper employed three ML models to predict the current airflow while ensuring DO stability within the range of 2 mg/L to 4 mg/L. Leveraging professional knowledge, the study selected ten parameters as model inputs, comprising seven water quality parameters and three air blower parameters. The RF model, LSTM model, and Transformer model demonstrated varying degrees of accuracy in predicting airflow. Due to its attention mechanism and unique structure, the Transformer model outperformed the other two models. The model outputs, serving as data-driven decision support, have the potential to decrease wastewater treatment costs by addressing overaeration.

In future research, the aim is to refine the DO range based on effluent quality and microorganism growth states, to further optimize and reduce airflow.

Conflict of Interest

The authors declare that they have no competing interests.

Funding

None.

Biography

Xuefei Li
https://orcid.org/0000-0003-0575-5106

She received an M.S. degree in the School of Environmental and Municipal Engineering from the Qingdao University of Technology in 2021. Since September 2023, she has been a Ph.D. candidate at Qingdao University of Technology. Her current research interests include wastewater treatment problems based on the Artificial Intelligence method.

Biography

Changqing Liu
https://orcid.org/0000-0003-0706-216X

He received the Ph.D. degree from Tongji University in 2007. He is a full professor in the School of Environmental and Municipal Engineering at the Qingdao University of Technology. His research interests include biological nitrogen and phosphorus removal of wastewater and sludge resource treatment and disposal.

Biography

Shuqi Liu
https://orcid.org/0009-0005-2802-1115

He received a bachelor of Engineering from Qingdao University Technology in 2018. He is currently studying for a master's degree in electronic information at Qingdao University of Technology. His current research interest is digital twins.

Biography

Sheng Miao
https://orcid.org/0000-0001-6176-3624

He is an associate professor at the School of Information and Control Engineering, Qingdao University of Technology. He received his Ph.D. degree from Towson University, Maryland, USA in 2017. His research interests include machine learning, smart healthcare, and intelligence systems. He has published multiple high-quality research papers in journals and conferences in recent years.

References

  • 1 Y . Zhang, C. Li, H. Duan, K. Yan, J. Wang, and W. Wang, "Deep learning based data-driven model for detecting time-delay water quality indicators of wastewater treatment plant influent," Chemical Engineering Journal, vol. 467, article no. 143483. 2023. https://doi.org/10.1016/j.cej.2023.143483doi:[[[10.1016/j.cej.2023.143483]]]
  • 2 Y . Liu, J. Yuan, B. Cai, H. Chen, Y . Li, and D. Huang, "Multi-step and multi-task learning to predict qualityrelated variables in wastewater treatment processes," Process Safety and Environmental Protection, vol. 180, pp. 404-416, 2023. https://doi.org/10.1016/j.psep.2023.10.015doi:[[[10.1016/j.psep.2023.10.015]]]
  • 3 J. Wang, K. Wan, X. Gao, X. Cheng, Y . Shen, Z. Wen, U. Tariq, and M. J. Piran, "Energy and materialssaving management via deep learning for wastewater treatment plants," IEEE Access, vol. 8, pp. 191694191705, 2020. https://doi.org/10.1109/ACCESS.2020.3032531doi:[[[10.1109/ACCESS.2020.3032531]]]
  • 4 A. S. Qambar and M. M. Al Khalidy, "Optimizing dissolved oxygen requirement and energy consumption in wastewater treatment plant aeration tanks using machine learning," Journal of Water Process Engineering, vol. 50, article no. 103237, 2022. https://doi.org/10.1016/j.jwpe.2022.103237doi:[[[10.1016/j.jwpe.2022.103237]]]
  • 5 O. E. L. Castro, X. Deng, and J. H. Park, "Comprehensive survey on AI-based technologies for enhancing IoT privacy and security: trends, challenges, and solutions," Human-centric Computing and Information Sciences, vol. 13, article no. 39, 2023. https://doi.org/10.22967/HCIS.2023.13.039doi:[[[10.22967/HCIS.2023.13.039]]]
  • 6 D. Zhao, Y . Liu, G. Zeng, X. Wang, S. Miao, and W. Gao, "A knowledge-based human-computer interaction system for the building design evaluation using artificial neural network," Human-Centric Computing and Information Sciences, vol. 13, article no. 2, 2023. https://doi.org/10.22967/HCIS.2023.13.002doi:[[[10.22967/HCIS.2023.13.002]]]
  • 7 B. Seo, S. Jang, J. Bong, K. Park, I. Lee, and M. Jeong, "Application of deep learning to the production of sub-divided land cover maps," Human-Centric Computing and Information Sciences, vol. 12, article no. 58, 2022. https://doi.org/10.22967/HCIS.2022.12.058doi:[[[10.22967/HCIS.2022.12.058]]]
  • 8 S. Shao, D. Fu, T. Yang, H. Mu, Q. Gao, and Y . Zhang, "Analysis of machine learning models for wastewater treatment plant sludge output prediction," Sustainability, vol. 15, no. 18, article no. 13380, 2023. https://doi.org/10.3390/su151813380doi:[[[10.3390/su151813380]]]
  • 9 N. Farhi, E. Kohen, H. Mamane, and Y . Shavitt, "Prediction of wastewater treatment quality using LSTM neural network," Environmental Technology & Innovation , vol. 23, article no. 101632, 2021. https://doi.org/ 10.1016/j.eti.2021.101632doi:[[[10.1016/j.eti.2021.101632]]]
  • 10 S. Miao, C. Zhou, S. A. AlQahtani, M. Alrashoud, A. Ghoneim, and Z. Lv, "Applying machine learning in intelligent sewage treatment: a case study of chemical plant in sustainable cities," Sustainable Cities and Society, vol. 72, article no. 103009, 2021. https://doi.org/10.1016/j.scs.2021.103009doi:[[[10.1016/j.scs.2021.103009]]]

Table 1.

Dataset statistical properties
Indicators (abbreviation) Unit Min Max Average Std.
Category 1. Water quality indicators Dissolved oxygen (DO) mg/L 0.70 4.49 2.71 0.91
Mix liquor suspended solids (MLSS) mg/L 4305.56 9114.58 6089.93 809.12
Nitrate concentration (NO3-N) mg/L 0.51 14.21 3.29 1.29
Inflow COD into AAO reactor (CODin) mg/L 14.00 278.60 85.23 50.00
Inflow total nitrate into AAO reactor (TNin) mg/L 2.00 79.98 35.56 20.71
Outflow COD from AAO reactor (CODef) mg/L 11.00 49.97 15.74 3.45
Outflow total nitrate from AAO reactor (TNef) mg/L 1.85 25.00 9.10 1.89
Category 2. Air blower indicators Air blower power (p) kW 62.02 143.33 81.64 11.41
Air blower rotational speed (n) r/min 24043.48 29997.60 25784.36 600.28
Airflow (Qair) m3/min 2763.96 6216.44 4166.10 810.57
Min, Max, and Std. represents the minimum values, the maximum values, and the standard deviation of the dataset, respectively.

Table 2.

Model error comparison
Model MAPE RMSE MAE
RF 3.36 244.15 175.49
LSTM 2.21 144.21 114.18
Transformer 1.91 138.36 97.36
Bold font indicates the best performance in each model.
Different decision support affecting wastewater treatment effect.
Treatment process and monitoring parameters of wastewater treatment plant.
Timestep, input, and output of LSTM and Transformer model.
The structure of the Transformer model.
The loss variance during the training epochs: (a) LSTM model and (b) Transformer model.
The data distributions of predicted values vs. true values by three models: (a) RF, (b) LSTM, and (c) Transformer.