## Miaomiao Liu* , Jingfeng Guo** and Jing Chen**## |

Dataset | [TeX:] $$|V| /|E|$$ | weighted CN | weighted AA | weighted RA | CRMA | IEM |
---|---|---|---|---|---|---|

Karate Club | 34 / 78 | 2 / 0.4547 | 2 / 0.4547 | 2 / 0.4547 | 2 / 0.4547 | 4 / 0.4950 |

Les Misérables | 77 / 254 | 1 / 0.0350 | 3 / 0.4185 | 3 / 0.4577 | 9 / 0.5222 | 5 / 0.5427 |

Train Bombing | 64 / 243 | 1 / 0.0303 | 4 / 0.3626 | 4 / 0.3604 | 4 / 0.4420 | 5 / 0.4579 |

US Airport | 332/ 2,126 | 2 / 0.0174 | 3 / 0.0987 | 3 / 0.1039 | 4 / 0.1347 | 4 / 0.1932 |

Net Science | 379 / 914 | 8 / 0.6045 | 19 / 0.8453 | 18 / 0.8499 | 21 / 0.8430 | 19 / 0.8512 |

(1) As to Karate club network, division results of the first four algorithms were the same where the network was divided into 2 communities as shown in Fig. 1; while IEM algorithm divided this network into 4 communities as shown in Fig. 2. The modularity was improved by 11.11% compared with the other four algorithms. Here, it should be noticed that in all figures of this paper, nodes in different communities were represented by different colors according to division results so as to express the results clearly. Additionally, it should be emphasized that the value of the network modularity and the number of communities of division results will be different in terms of different algorithms and different implementation methods. In general, the larger modularity means the relatively accurate number of communities and the better community structure that is closer to the real network.

(2) As to Les Misérables network and Train Bombing network, division results of these five algorithms were all different. Among them, the division result of the weighted CN algorithm is the poorest because the weighted CN index only considered the influence of the weight of the edge connecting two nodes and their neighbors on the similarity. However, in the Les Misérables network, the weight represents the times of the two characters’ appearance in the same scene simultaneously. And in the Train Bombing network, the weight represents the frequency of the contact of terrorists. Thus, most weights of edges in these two networks are 1, which led to the weighted CN similarity are 1 all the same. So in the clustering, one neighbor would be randomly selected and clustered into the community, which resulted in the deviation of community division. Overall, there are relatively small differences in division results of the latter four algorithms, and the CRMA and IEM algorithm have the relatively better performance. For these two networks, community division results of CRMA and IEM algorithms were shown in Figs. 3–6.

(3) As to the US Airport network, division results of these five algorithms were all poor. In this network, 59.7% node pairs have no common neighbors. Moreover, among all node pairs with CNs, 46.5% node pairs have only one CN, which led to the lower modularity of each algorithm and poor quality of community division of weighted CN, weighted AA and weighted RA. Because these three similarity indexes just take into account of the influence of the degree or the strength of CNs on the similarity. However, IEM algorithm used the edge weight strength of the node pair to deal with the situation of having no CNs, so its community division quality is the highest. For this network, community division results of CRMA and IEM algorithms were shown in Figs. 7 and 8.

(4) As to the Net Science network, although it is sparse, the average weighted degree of nodes is 2.583, and the average clustering coefficient is about 0.798. So these five algorithms all have better performance on this network. Among them, the performance of the weighted CN is the worst, which had a large difference in the number of communities and the modularity compared with other four algorithms. While the division results of the latter four algorithms have small differences. Of these, community division results of CRMA and IEM algorithms were shown in Figs. 9 and 10. In general, the community division quality of IEM algorithm is better than the other four algorithms. This further verified the correctness of IEM algorithm which defined the weighted similarity extensively combining with the edge weight, the degree, the intensity and the common neighbors of the two nodes.

From experimental results we know that IEM outperforms the other three weighted similarity indices, which could further verify its correctness and higher division quality. Additionally, IEM is better than CRMA algorithm. CRMA was the improvement of AGMA algorithm which took into account of the sign of the edge in order to get better division results in signed networks. Though it is still applicable to traditional networks that only have positive links, each algorithm was designed to achieve its underlying goal, which naturally brought different results with regard to different networks. In terms of community division in traditional weighted networks, IEM is more reasonable and effective than CRMA algorithm.

Overall, though community division results of IEM for all these datasets are different from the other four algorithms, the modularity of its divisions are all the highest, which can show its superiority. Moreover, division results of IEM for networks of Karate club, US Airport and Net Science are basically consistent with the fast greedy, betweenness and other classical algorithms described in [13] in the number of communities and the modularity. This can also further verify the correctness of IEM algorithm.

For the network [TeX:] $$G=(V, E, W) \text { where }|V|=n,$$ if we directly use the method based on the optimization of the modularity to cluster the nodes, the complexity would be very high. Therefore, the algorithm proposed in this paper first realized the fast clustering of nodes by defining the weighted similarity index, and then the initial division result can be got which consists of p communities. After that, it optimized the division result by merging communities to maximize the modularity of the network so as to get the best performance.

In IEM algorithm, we first calculate the similarity of any nodes pairs and save these values into the matrix, so the computational complexity is O(m). Secondly, we traverse the matrix and use list (i, arraylist) to store the label of the node that has the highest similarity with node [TeX:] $$v_{\mathrm{i}},$$ so the computational complexity is O(nlog2n). Lastly, we calculate the modularity of the network after merging any two communities among p communities, so the computational complexity is [TeX:] $$O\left(p^{2}\right).$$ Compared with other algorithms, the computational complexity of IEM is slightly increased because of the additional computation of merging communities. However, for large scale networks, p is far less than n. So it can also guarantee the feasibility and effectiveness of the time on the premise of achieving a higher accuracy.

The existing weighted similarity indices only considered the influence of the weight information of common neighbors on the similarity, which may lead to poor community division results for some special networks. In view of this, a new algorithm IEM was proposed so as to achieve a more reasonable division of weighted networks. The algorithm consisted of three stages, namely, forming the initial community, expanding communities and merging them. In the first two stages, we focused on the influence of common neighbors to the similarity of the two nodes, and the weighted similarity of the two nodes based on the degree, the strength and the weight information of their common neighbors was defined. Moreover, the situation of the two nodes having no CNs was also taken into account, and the edge weight strength was defined as their similarity. Then the most closely related nodes were clustered fast according to their similarity to form the initial community and expand it. In the third stage, for those small communities consisting of only two nodes that may emerge in the first two stages, we merged these communities by maximizing the weighted modularity of the network, thus the more reasonable and accurate community division results could be got. The weighted similarity index proposed improved weighted CN, weighted AA, and weighted RA. In addition, for the traditional weighted network containing only positive links, the IEM algorithm was more efficient than CRMA algorithm. The experimental results showed its effectiveness and high quality for community division of weighted networks. For large scale networks, how to reduce the computational complexity so as to improve the efficiency of the algorithm is the further research.

She was born in 1982 and she is currently an associate professor of Northeast Petroleum University in China. She received the master’s degree from Ocean University of China in 2006 and got her doctorate at Yanshan University in 2017. Her main research interests include community discovery and link prediction in social networks.

- 1 M. E. J. Newman, "Analysis of weighted networks,"
*Physical Review E*, vol. 70, no. 5, 2004.custom:[[[-]]] - 2 K. Subramani, A. V elkov, I. Ntoutsi, P. Kroger, H. P. Kriegel, "Density-based community detection in social networks," in
*Proceedings of 2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application*, Bangalore, India, 2011;pp. 1-8. custom:[[[-]]] - 3 R. Liu, S. Feng, R. Shi, W. Guo, "Weighted graph clustering for community detection of large social networks,"
*Procedia Computer Science*, vol. 31, pp. 85-94, 2014.doi:[[[10.1016/j.procs.2014.05.248]]] - 4 T. Sharma, "Finding communities in weighted signed social networks," in
*Proceedings of 2012 IEEE/ACM International Conference on Advances Social Networks Analysis and Mining*, Istanbul, T urkey, 2012;pp. 978-982. custom:[[[-]]] - 5 Z. Lu, Y. Wen, G. Cao, "Community detection in weighted networks: algorithms and applications," in
*Proceedings of 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom)*, San Diego, CA, 2013;pp. 179-184. custom:[[[-]]] - 6 K. W ang, G. H. Lv, Z. W. Liang, M. Y. Y e, "Detecting community in weighted complex network based on similarities,"
*Journal of Sichuan University (Natural Science Edition)*, vol. 51, no. 6, pp. 1170-1176, 2014.custom:[[[-]]] - 7 W. Q. Lin, F. S. Lu, Z. Y. Ding, Q. Y. Wu, B. Zhou, Y. Jia, "Parallel computing hierarchical community approach based on weighted-graph,"
*Journal of Software*, vol. 6, no. 23, pp. 1517-1530, 2012.custom:[[[-]]] - 8 S. W ang, "Community detection based on the interaction modularity on weighted graphs,"
*Y unnan UniversityKunming, China*, 2014.custom:[[[-]]] - 9 P. Zhan, "Implementation of parallelized method for local community detection in weighted complex networks,"
*South China University of T echnologyGuangzhou, China*, 2013.custom:[[[-]]] - 10 J. Zhao, J. An, "Community detection algorithm for directed and weighted network,"
*Application Research of Computers*, vol. 31, no. 12, pp. 3795-3799, 2014.custom:[[[-]]] - 11 Z. Yao, "The analysis and prediction of weighted complex networks,"
*Qingdao Technological UniversityQingdao, China*, 2012.custom:[[[-]]] - 12 J. Guo, M. Liu, L. Liu, X. Chen, "An improved community discovery algorithm in weighted social networks,"
*ICIC Express Letters*, vol. 10, no. 1, pp. 35-41, 2016.custom:[[[-]]] - 13 X. Liu, "Community structure detection in complex networks via objective function optimization,"
*National University of Defense T echnologyChangsha, China*, 2012.custom:[[[-]]]