## Yanping Shen* , Kangfeng Zheng and Chunhua Wu## |

Features | C | [TeX:] $$\gamma$$ |
---|---|---|

[TeX:] $$F_{1} \ldots F_{n}$$ | [TeX:] $$x_{1}$$ | [TeX:] $$x_{2}$$ |

The fitness function is used to improve the accuracy and reduce the number of features used. It can be defined as

where acc denotes the accuracy; [TeX:] $$n_{F}$$ denotes the size of all the features; [TeX:] $$f_{i}$$ is the state of the i-th feature, and [TeX:] $$f_{i}=1$$ if i-th feature used, or [TeX:] $$f_{i}=0,$$ otherwise; and [TeX:] $$w_{1} \text { and } w_{2}$$ represent the weights of the corresponding measures.

The accuracy, detection rate and false positive rate are the most common performance evaluation criteria. Therefore, the performance evaluation of the proposed hybrid model uses these three evaluation parameters as presented in formulas (13), (14) and (15).

The accuracy (Acc) indicates the proportion of samples correctly judged.

The detection rate (DR) represents the proportion of attack instances correctly detected by the model.

The false positive rate (FPR) represents the percentage of normal instances judged as attack instances.

where TP denotes the number of attack instances correctly judged, FP denotes the number of normal instances misjudged to be attacks, FN indicates the number of attacks classified as normal, and TN denotes the number of correctly judged normal instances.

The KDD99 [33], NSL [34], and Kyoto datasets [35] were employed to validate the proposed model. KDD99 is the most widespread dataset for intrusion detection [36]. The NSL evolved from KDD99 and removed a large number of duplicate records. Its data features are the same as those of KDD99. Another more recent labeled dataset named Kyoto is also used. The Kyoto dataset is obtained from diverse types of honeypots consists of 24 features in which 14 statistical features are derived from KDD99 and 10 features are newly added. We selected 17 features, which are shown in Table 2 [37], for the experiment in this paper. Due to space limitations, the features of KDD99 are omitted here.

Table 2.

Representation | Feature | Representation | Feature |
---|---|---|---|

[TeX:] $$P_{1}$$ | duration | [TeX:] $$P_{10}$$ | dst_host_srv_count |

[TeX:] $$P_{2}$$ | service | [TeX:] $$P_{11}$$ | dst_host_same_src_port_rate |

[TeX:] $$P_{3}$$ | src_bytes | [TeX:] $$P_{12}$$ | dst_host_serror_rate |

[TeX:] $$P_{4}$$ | dst_bytes | [TeX:] $$P_{13}$$ | dst_host_srv_serror_rate |

[TeX:] $$P_{5}$$ | count | [TeX:] $$P_{14}$$ | flag |

[TeX:] $$P_{6}$$ | same_srv_rate | [TeX:] $$P_{15}$$ | IDS_detection |

[TeX:] $$P_{7}$$ | serror_rate | [TeX:] $$P_{16}$$ | Malware_detection |

[TeX:] $$P_{8}$$ | srv_serror_rate | [TeX:] $$P_{17}$$ | Ashula_detection |

[TeX:] $$P_{9}$$ | dst_host_count | - | - |

The value ranges of C and [TeX:] $$\gamma$$ are [TeX:] $$\left[2^{-10}, 2^{3}\right] \text { and }\left[2^{-3}, 2^{10}\right]$$ respectively. The max value of the velocity is set to approximately 20% of the range of the variables, so the velocity of parameter C is restricted to the range [-1.6, 1.6] and to [-204.8, 204.8]. For the discrete particle of binary PSO, the max velocity is restricted between [-4, 4]. The personal and social learning factors [TeX:] $$\gamma$$ are set to be in (2, 2). The inertia weight [TeX:] $$\left(c_{1}, c_{2}\right)$$ [TeX:] $$\left(C_{1}, C_{2}\right)$$ is set to 0.72. The population quantity and the number of iterations are 20 and 80, respectively. We randomly selected 50 groups of data from the original datasets for the experiment, and recorded the average results as follows.

First, as discussed in Section 4.2, suitable values for [TeX:] $$\omega_{1} \text { and } \omega_{2}$$ need to be determined. To select pro¬per values for the proposed hybrid PSO-KELM, different values of [TeX:] $$\omega_{1} \text { and } \omega_{2}$$ are analyzed on the three datasets as seen in Fig. 2–4. Regardless, the fitness is mainly determined by the accuracy rather than the number of selected features. Therefore, we only selected several sets of weight parameters for the experi¬ment. Using several values adjusted from 0.8 to 0.99 with interval 0.05 and the last interval 0.04 for [TeX:] $$\omega_{1},$$ the train accuracy and test accuracy are shown in Figs. 2(a), 3(a), and 4(a), which reveal that the test accuracy remains almost unchanged when [TeX:] $$\omega_{1}$$ is greater than 0.95. Furthermore, the number of selected features using different values of [TeX:] $$\omega_{2}$$ is presented in Figs. 2(b), 3(b), and 4(b). It is observed that the number of the selected features decreases with the increase of [TeX:] $$\omega_{2}.$$ This means that the number of chosen features is dependent on the value of [TeX:] $$\omega_{2}.$$ We chose values [TeX:] $$\omega_{1}=0.95 \text { and } \omega_{2}=0.05$$ for further testing.

The experimental results of the Grid-KELM, continue PSO-KELM, hybrid PSO-KELM and GA-KELM methods on the three datasets are shown in Tables 3–5. For Grid-KELM and continuous PSO-KELM, only parameter optimization can be done, but feature selection cannot be done. For the hybrid PSO-KELM, parameter optimization and feature selection can be carried out simultaneously. GA can also optimize the parameters and select the features simultaneously. Therefore, the writing method in the table is used to show the difference. In Table 3, the average test accuracy of Grid-KELM is 98.2997%. For the continuous PSO-KELM method, the average test accuracy is 98.3165%. For the hybrid PSO-KELM, the average test accuracy is 98.2492%, the average number of selected features is 14, while the testing time is 0.7791 seconds. There is no doubt that the hybrid PSO-KELM method can determine the parameters and the features used at the same time. Compared with the continuous PSO-KELM method, the testing time of the hybrid PSO-KELM decreased by 1.6%. Even when the dimension has been reduced to 14, the reduction of testing time is not too much. This result is understandable because the dimension of the input space has no direct effect on the size of the kernel matrix. However, generally, the computational complexity of the kernel matrix depends on the number of samples and the dimension of the input features.

Table 3.

Techniques | Parameters [TeX:] $$[C, \gamma]$$ | Test time (s) | Test Acc (%) | Feature size |
---|---|---|---|---|

Grid-KELM | [8, 0.125] | 0.7922 | 98.2997 | 41 |

Continues PSO-KELM | [7.6925, 0.125] | 0.7916 | 98.3165 | 41 |

Hybrid PSO-KELM | [4.2364, 0.125] | 0.7791 | 98.2492 | 14 |

GA-KELM without FS | [7.5878, 0.125] | 0.7936 | 98.2828 | 41 |

GA-KELM with FS | [7.6059, 0.1302] | 0.7780 | 98.1987 | 12 |

Table 4.

Techniques | Parameters [TeX:] $$[C, \gamma]$$ | Test time (s) | Test Acc (%) | Feature size |
---|---|---|---|---|

Grid-KELM | [8, 0.125] | 0.7997 | 95.7071 | 41 |

Continues PSO-KELM | [8, 0.125] | 0.7943 | 95.7071 | 41 |

Hybrid PSO-KELM | [7.8414, 0.125] | 0.7884 | 96.9697 | 14 |

GA-KELM without FS | [7.7509, 0.1485] | 0.7938 | 95.5892 | 41 |

GA-KELM with FS | [7.8405, 0.1345] | 0.7775 | 95.1515 | 13 |

Table 5.

Techniques | Parameters [TeX:] $$[C, \gamma]$$ | Test time (s) | Test Acc (%) | Feature size |
---|---|---|---|---|

Grid-KELM | [8, 0.125] | 0.7785 | 98.4007 | 17 |

Continues PSO-KELM | [3.1310, 0.125] | 0.7736 | 97.6936 | 17 |

Hybrid PSO-KELM | [8, 0.125] | 0.7664 | 98.2828 | 4 |

GA-KELM without FS | [7.9871, 0.2711] | 0.7701 | 97.5084 | 17 |

GA-KELM with FS | [7.9878, 0.2533] | 0.7636 | 97.4747 | 3 |

In other words, feature selection is still important in reducing the computational complexity of kernel mapping. Table 4 shows that the NSL is more demanding with respect to the method, i.e., the testing accuracy on NSL is not as ideal as that of KDD99. The proposed approach is also compared with GA-KELM. Taken together, although the GA can perform parameter optimization and feature selection, the results are slightly worse than those of the hybrid PSO. Tables 3–5 show that a competitive or better level of accuracy can be achieved with fewer features, which indicates that some features are uncritical to the performance of the classifier.

In the hybrid PSO-KELM, the frequencies of the features used in ten runs are listed in Tables 6 and 7. The feature with frequency equal to or greater than four times the other features’ frequencies is considered to be a significant feature. There are 41 features in the KDD99 and NSL datasets, represented by [TeX:] $$M_{l}, M_{2}, \ldots, M_{41}.$$ The significant features consist of [TeX:] $$M_{2}, M_{3}, M_{16}, M_{23}, M_{32}, M_{33}, M_{35}, M_{36} \text { and } M_{40} \text {. }$$ The important features of the Kyoto are [TeX:] $$P_{4}, P_{10}, \text { and } P_{14}.$$ The features chosen above are important in judging invasion. For instance, [TeX:] $$F_{23}$$ F23 represents the number of connections with the same target host as the current connection in the last 2 seconds. There is a very close relation between [TeX:] $$F_{23}$$ and the DoS attack. In contrast, the features that are not chosen even once, represented by “others” in the last column, are thought to be redundant.

Table 6.

Feature | [TeX:] $$M_{2}$$ | [TeX:] $$M_{3}$$ | [TeX:] $$M_{5}$$ | [TeX:] $$M_{6}$$ | [TeX:] $$M_{9}$$ | [TeX:] $$M_{10}$$ | [TeX:] $$M_{11}$$ | [TeX:] $$M_{12}$$ | [TeX:] $$M_{13}$$ | [TeX:] $$M_{14}$$ | [TeX:] $$M_{15}$$ |
---|---|---|---|---|---|---|---|---|---|---|---|

Frequency | 4 | 9 | 1 | 2 | 1 | 2 | 2 | 3 | 3 | 1 | 2 |

Feature | [TeX:] $$M_{16}$$ | [TeX:] $$M_{17}$$ | [TeX:] $$M_{18}$$ | [TeX:] $$M_{20}$$ | [TeX:] $$M_{22}$$ | [TeX:] $$M_{23}$$ | [TeX:] $$M_{25}$$ | [TeX:] $$M_{26}$$ | [TeX:] $$M_{29}$$ | [TeX:] $$M_{30}$$ | [TeX:] $$M_{31}$$ |

Frequency | 4 | 1 | 2 | 1 | 2 | 6 | 2 | 1 | 3 | 1 | 1 |

Feature | [TeX:] $$M_{32}$$ | [TeX:] $$M_{33}$$ | [TeX:] $$M_{34}$$ | [TeX:] $$M_{35}$$ | [TeX:] $$M_{36}$$ | [TeX:] $$M_{37}$$ | [TeX:] $$M_{38}$$ | [TeX:] $$M_{40}$$ | others | ||

Frequency | 5 | 8 | 1 | 7 | 4 | 3 | 2 | 4 | 0 |

Table 7.

Feature | [TeX:] $$\boldsymbol{P}_{1}$$ | [TeX:] $$\boldsymbol{P}_{2}$$ | [TeX:] $$\boldsymbol{P}_{4}$$ | [TeX:] $$\boldsymbol{P}_{5}$$ | [TeX:] $$\boldsymbol{P}_{9}$$ | [TeX:] $$\boldsymbol{P}_{10}$$ | [TeX:] $$\boldsymbol{P}_{14}$$ | others |
---|---|---|---|---|---|---|---|---|

Frequency | 1 | 3 | 10 | 1 | 1 | 10 | 8 | 0 |

Finally, performance comparisons of the different methods for KDD99 were carried out. The average simulation results of the measures are shown in Table 8 [7,38]. Feature selection is performed, and the three methods used the same selected features. The proposed model has the best overall performance among the current popular machine learning methods. Compared with other activation functions, ELM with sigmoid (sig) activation function has the best performance. Its performance is dependent on the selection of the hidden neuron, which is set to 80 here. Table 8 shows that the hybrid PSO-KELM is better than the ELM in terms of accuracy. This is because KELM takes a stable kernel mapping as an alternative to the random mappings in ELM and has stable network output weights. Therefore, the KELM can avoid random fluctuations in the output of the model caused by random assignments in ELM and enhance the stability and generalization ability of the model. It also shows that SVM has relatively stable performance, but the false positive rate is not as good as our proposed approach.

A hybrid PSO-BPSO based KELM model is proposed and applied to intrusion detection. The standard PSO and binary PSO are both adopted to optimize the parameter combination and input features. A fitness function is designed for the hybrid PSO. It is proved by experiments that the method can determine the parameters and the appropriate features at the same time. The results also show that the method out¬performs the GA-KELM model. The Gaussian kernel is chosen in this work. In future works, other kernels or multiple kernel learning can be studied. In addition, other metaheuristic methods can also be developed to optimize the model.

She received a Ph.D. degree in computer application technology from Beijing Uni-versity of Posts and Telecommunications in 2021. She is an associate professor who is working in the School of Information Engineering, Institute of Disaster Prevention, Sanhe, China. Her research interest centers on network security.

He received a Ph.D. degree in computer application technology from Beijing Uni-versity of Posts and Telecommunications in 2006, where he is currently a professor with the School of Cyberspace Security. His research interests are network security, artificial intelligence security application and cognitive security.

- 1 S. W. Lin, K. C. Ying, C. Y. Lee, Z, J. Lee, "An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection,"
*Applied Soft Computing*, vol. 12, no. 10, pp. 3285-3290, 2012.doi:[[[10.1016/j.asoc.2012.05.004]]] - 2 S. Elhag, A. Fernandez, A. Bawakid, S. Alshomrani, F. Herrera, "On the combination of genetic fuzzy systems and pairwise learning for improving detection rates on intrusion detection systems,"
*Expert Systems with Applications*, vol. 42, no. 1, pp. 193-202, 2015.doi:[[[10.1016/j.eswa.2014.08.002]]] - 3 L. M. Ibrahim, D. T. Basheer, M. S. Mahmod, "A comparison study for intrusion database (Kdd99, Nsl-Kdd) based on self organization map (SOM) artificial neural network,"
*Journal of Engineering Science and Technology*, vol. 8, no. 1, pp. 107-119, 2013.custom:[[[-]]] - 4 W. Hu, J. Gao, Y. Wang, O. Wu, S. Maybank, "Online Adaboost-based parameterized methods for dynamic distributed network intrusion detection,"
*IEEE Transactions on Cybernetics*, vol. 44, no. 1, pp. 66-82, 2013.doi:[[[10.1109/TCYB.2013.2247592]]] - 5 W. Feng, Q. Zhang, G. Hu, J. X. Huang, "Mining network data for intrusion detection through combining SVMs with ant colony networks,"
*Future Generation Computer Systems*, vol. 37, pp. 127-140, 2014.doi:[[[10.1016/j.future.2013.06.027]]] - 6 G. B. Huang, Q. Y. Zhu, C. K. Siew, "Extreme learning machine: theory and applications,"
*Neurocom-puting*, vol. 70, no. 1-3, pp. 489-501, 2006.doi:[[[10.1016/j.neucom.2005.12.126]]] - 7 C. Cheng, W. P. Tay, G. B. Huang, "Extreme learning machines for intrusion detection," in
*Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN)*, Brisbane, Australia, 2012;pp. 1-8. custom:[[[-]]] - 8 Z. Ye, Y. Y u, "Network intrusion classification based on extreme learning machine," in
*Proceedings of 2015 IEEE International Conference on Information and Automation*, Lijiang, China, 2015;pp. 1642-1647. custom:[[[-]]] - 9 R. Singh, H. Kumar, R. K. Singla, "An intrusion detection system using network traffic profiling and online sequential extreme learning machine,"
*Expert Systems with Applications*, vol. 42, no. 22, pp. 8609-8624, 2015.doi:[[[10.1016/j.eswa.2015.07.015]]] - 10 J. M. Fossaceca, T. A. Mazzuchi, S. Sarkani, "MARK-ELM: application of a novel multiple kernel learning framework for improving the robustness of network intrusion detection,"
*Expert Systems with Applications*, vol. 42, no. 8, pp. 4062-4080, 2015.doi:[[[10.1016/j.eswa.2014.12.040]]] - 11 S. Huang, B. Wang, J. Qiu, J. Yao, G. Wang, G. Y u,
*Neurocomputing, vol*, 174, pp. 352-367, 2016.custom:[[[-]]] - 12 L. J. Pan, W. Jin, J. Wu, "A novel intrusion detection approach using multi-kernel functions,"
*Telkomnika*, vol. 12, no. 4, pp. 1088-1095, 2014.custom:[[[-]]] - 13 R. Jayaprakash, S. Murugappan, "Intrusion detection based on KELM with Levenberg-Marquardt optimization," in
*Proceedings of 2015 International Conference on Communications and Signal Processing (ICCSP)*, Melmaruvathur, India, 2015;pp. 0154-0156. custom:[[[-]]] - 14 V. Jaiganesh, P. Sumathi, "Kernelized extreme learning machine with Levenberg-Marquardt learning approach towards intrusion detection,"
*International Journal of Computer Applications*, vol. 54, no. 14, pp. 38-44, 2012.custom:[[[-]]] - 15 J. Kennedy, R. Eberhart,
*in Proceedings of International Conference on Neural Networks (ICNN), Perth, Australia, 1995, pp*, 1942-, 1942-1948, 1948.custom:[[[-]]] - 16 C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, et al., "A survey on filter techniques for feature selection in gene expression microarray analysis,"
*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol. 9, no. 4, pp. 1106-1119, 2012.doi:[[[10.1109/TCBB.2012.33]]] - 17 Z. Zhu, Y. S. Ong, M. Dash, "Wrapper–filter feature selection algorithm using a memetic framework,"
*IEEE Transactions on SystemsMan, and Cybernetics, Part B (Cybernetics)*, vol. 37, no. 1, pp. 70-76, 2007.custom:[[[-]]] - 18 G. B. Huang, H. Zhou, X. Ding, R. Zhang, "Extreme learning machine for regression and multiclass classification,"
*IEEE Transactions on SystemsMan, and Cybernetics, Part B (Cybernetics)*, vol. 42, no. 2, pp. 513-529, 2012.doi:[[[10.1109/TSMCB.2011.2168604]]] - 19 S. W. Lin, K. C. Ying, S. C. Chen, Z. J. Lee, "Particle swarm optimization for parameter determination and feature selection of support vector machines,"
*Expert Systems with Applications*, vol. 35, no. 4, pp. 1817-1824, 2008.doi:[[[10.1016/j.eswa.2007.08.088]]] - 20 D. Mladenic, J. Brank, M. Grobelnik, N. Milic-Frayling, "Feature selection using linear classifier weights: interaction with classification models," in
*Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development Information Retrieval*, Sheffield, UK, 2004;pp. 234-241. custom:[[[-]]] - 21 C. L. Huang, C. J. Wang, "A GA-based feature selection and parameters optimization for support vector machines,"
*Expert Systems with Applications*, vol. 31, no. 2, pp. 231-240, 2006.custom:[[[-]]] - 22 H. Frohlich, O. Chapelle, B. Scholkopf, "Feature selection for support vector machines by means of genetic algorithm," in
*Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence*, Sacramento, CA, 2003;pp. 142-148. custom:[[[-]]] - 23 F. Kuang, W. Xu, S. Zhang, "A novel hybrid KPCA and SVM with GA model for intrusion detection,"
*Applied Soft Computing*, vol. 18, pp. 178-184, 2014.doi:[[[10.1016/j.asoc.2014.01.028]]] - 24 A. Onan, S. Korukoglu, H. Bulut, "A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification,"
*Expert Systems with Applications*, vol. 62, pp. 1-16, 2016.doi:[[[10.1016/j.eswa.2016.06.005]]] - 25 X. Zhang, X. Chen, Z. He, "An ACO-based algorithm for parameter optimization of support vector machines,"
*Expert Systems with Applications*, vol. 37, no. 9, pp. 6618-6628, 2010.doi:[[[10.1016/j.eswa.2010.03.067]]] - 26 C. L. Huang, J. F. Dun, "A distributed PSO–SVM hybrid system with feature selection and parameter optimization,"
*Applied Soft Computing*, vol. 8, no. 4, pp. 1381-1391, 2008.custom:[[[-]]] - 27 Y. Shen, K. Zheng, C. Wu, M. Zhang, X. Niu, Y. Yang, "An ensemble method based on selection using bat algorithm for intrusion detection,"
*The Computer Journal*, vol. 61, no. 4, pp. 526-538, 2018.doi:[[[10.1093/comjnl/bxx101]]] - 28 Y. Bao, Z. Hu, T. Xiong, "A PSO and pattern search based memetic algorithm for SVMs parameters optimization,"
*Neurocomputing*, vol. 117, pp. 98-106, 2013.doi:[[[10.1016/j.neucom.2013.01.027]]] - 29 R. Ahila, V. Sadasivam, K. Manimala, "An integrated PSO for parameter determination and feature selection of ELM and its application in classification of power system disturbances,"
*Applied Soft Computing*, vol. 32, pp. 23-37, 2015.doi:[[[10.1016/j.asoc.2015.03.036]]] - 30 C. Ma, J. Ouyang, H. L. Chen, J. C. Ji, "A novel kernel extreme learning machine algorithm based on self-adaptive artificial bee colony optimisation strategy,"
*International Journal of Systems Science*, vol. 47, no. 6, pp. 1342-1357, 2016.doi:[[[10.1080/00207721.2014.924602]]] - 31 C. R. Rao, S. K. Mitra, "Further contributions to the theory of generalized inverse of matrices and its applications,"
*Sankhyā: The Indian Journal of Statistics Series A*, vol. 33, no. 3, pp. 289-300, 1971.custom:[[[-]]] - 32 J. Kennedy, R. C. Eberhart, "A discrete binary version of the particle swarm algorithm," in
*Proceedings of 1997 IEEE International Conference on Systems*, Man, And Cybernetics: Computational Cybernetics and Simulation, Orlando, FL, 1997;pp. 4104-4108. custom:[[[-]]] - 33
*The UCI KDD Archive, 1999 (Online). Available:*, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html - 34
*University of New Brunswick, 2006 (Online). Available:*, https://www.unb.ca/cic/datasets/nsl.html - 35
*Kyoto University, 2016 (Online). Available:*, https://www.takakura.com/Kyoto_data/ - 36 H. G. Kayacik, A. N. Zincir-Heywood, M. I. Heywood, "Selecting features for intrusion detection: a feature relevance analysis on KDD 99 intrusion detection datasets," in
*Proceedings of the 3rd Annual Conference on Privacy*, Security and Trust, St. Andrews, Canada, 2005;pp. 1723-1722. custom:[[[-]]] - 37 M. M. Najafabadi, T. M. Khoshgoftaar, N. Seliya, "Evaluating feature selection methods for network intrusion detection with Kyoto data,"
*International Journal of ReliabilityQuality and Safety Engineering, 2016*, vol. 23, no. 1, 2185.doi:[[[10.1142/S039316500017]]] - 38 D. S. Kim, J. S. Park,
*in Information Networking*, Germany: Springer, Heidelberg, pp. 747-756, 2003.custom:[[[-]]]