Article Information
Corresponding Author: Doo-Soon Park , parkds@sch.ac.kr
Yixuan Yang, Dept. of Software Convergence, Soonchunhyang University, Asan, Korea, yangyxuan621@gmail.com
Doo-Soon Park, School of Information Science and Engineering, Hebei North University, Zhangjiakou, China, parkds@sch.ac.kr
Fei Hao, School of Computer Science, Shaanxi Normal University, Xi’an, China, feehao@gmail.com
Sony Peng, Dept. of Software Convergence, Soonchunhyang University, Asan, Korea, peng.sony61@gmail.com
Hyejung Lee, Dept. of Innovation and Convergence, Hoseo University, Cheonan, Korea, hjlee100@sch.ac.kr
Min-Pyo Hong, Dept. of Computer Sciences and Engineering, Soonchunhyang University, Asan, Korea, hmp4321@naver.com
Received: November 22 2021
Revision received: March 14 2022
Revision received: September 7 2022
Accepted: October 1 2022
Published (Print): April 30 2023
Published (Electronic): April 30 2023
1. Introduction
In response to the demands of the Fourth Industrial Revolution era, many researchers are directing their attention to social network structures analysis in computer science, with the cliques emerging as a very popular research direction in social network structure mining. Vilakone et al. [1,2] used clique detection and standardized cumulative gains to solve some problems in the movie recommendation system. However, detecting the maximal clique and balanced cliques in social networks, are not only applicable to social media software, but also can be applied to other domains, such as the Internet of Things (IoT), healthcare, biology, and more.
Meanwhile, maximal clique detection can be used to detect the structure of proteins, which can prevent or control diseases. For this reason, we can find a group of the most suitable medical specialists for a target patient to provide the best medical treatment through remote consultation.
In many existing works, cliques are typically detected in undirected and unweight graphs [3,4]. However, some social networks contain positive and negative relationships between users, and are called signed social networks. Significantly, there are some studies about mining cliques from the social network [5,6], while several studies have been conducted using signed social networks [7-9]. However, few studies have used formal concept analysis (FCA) to do the maximal balanced clique detection in the signed social networks.
To solve the maximal balanced clique detection problem (MBCP), it is necessary to propose some more efficient methods; therefore, this paper focuses on designing the modified three-way concept algorithms, which are more suitable for the signed social networks.
The organization of our paper is described as follows. We introduce the research background and our motivation in Section 1, with the related works and some terminologies introduced in Section 2, and then the maximal balanced clique detection algorithms for signed social networks are presented in Section 3. To show its validness and efficiency, we have applied it to some real datasets and synthetic datasets in Section 4; meanwhile, we also compare the experimental results with other methods. Finally, the conclusion and future works are provided in Section 5.
2. Basic Notions
In this section, we introduce several basic concepts of the signed networks, maximal balanced cliques, and three-way concept lattice.
A signed social network G is characterized as a graph structure, and the graph composed of nodes and edges. The nodes represent the users in social network, and edges have weights, which are positive and negative values and represent positive and negative relationships, respectively. For example, as shown in Fig. 1, this is a signed social network, and there are five nodes in a signed social network G, between these nodes, the solid lines represent the positive edges, and the dotted lines represent the negative edges.
To explain the definitions of the maximal balanced network [7], FCA [10], and three-way concept [11,12], we present here an example that lets us easily understand the original FCA and three-way concept.
Definition 1 ([TeX:] $$\text { Operator } ^\leq \text { and } ^\geq$$ [11]). Let F = (O, A, I) be a formal context. For any [TeX:] $${\mathrm{X}, \mathrm{M}} \subseteq \mathrm{O}, \mathrm{Y}, \mathrm{N} \subseteq \mathrm{A},$$ a pair of three-way operators are defined mathematically below:
Based on Definition 1, three-way concepts are defined in detail as follows.
Definition 2 (Three-way concepts [11]). The three-way concepts including two parts: AE-concept and OE-concept.
AE-concept: Let F = (O, A, I) be a formal context. For any [TeX:] $$\mathrm{X}, \mathrm{M} \subseteq \mathrm{O}, \mathrm{Y} \subseteq \mathrm{A},((\mathrm{X}, {\mathrm{M}}), \mathrm{Y})$$ is called an attribute-induced three-way concept (AE-concept), if and only if [TeX:] $$(\mathrm{X}, \mathrm{M})^{\geq}=\mathrm{Y} \text { and } \mathrm{Y}^{\leq}=(\mathrm{X}, \mathrm{M}) \text {, }$$ where (X, M ) is called the extent and Y is called the intent of ((X, M ), Y). The set of all AE-concepts of F = (O, A, I) can form a lattice by some partial order structure.
OE-concept: Let F = (O, A, I) be a formal context. For any [TeX:] $$\mathrm{X} \subseteq \mathrm{O}, {\mathrm{Y}, \mathrm{N}} \subseteq \mathrm{A},(\mathrm{X},(\mathrm{Y}, \mathrm{N}))$$ is called an object-induced three-way concept (OE-concept), if and only if [TeX:] $$\mathrm{X}^{\leq}=(\mathrm{Y}, \mathrm{N}) \text { and }(\mathrm{Y}, \mathrm{N})^{\geq}=\mathrm{X}$$, where X is called the extent and (Y, N) is called the intent of (X, (Y, N)). The set of all OE-concepts of F = (O, A, I) can form a lattice by some partial order structure, and this lattice is called OE lattice.
Example 1. Table 1 is a formal context F. Table 2 is a supplemental formal context [TeX:] $$\mathrm{F}^{\mathrm{c}}$$ for Table 1. Among them, O represents the object of the form context, and A represents the attribute corresponding to object O. Through the traditional FCA method mentioned above, using the formal context of Tables 1 and 2 and the operations in Definition 1 above, we can obtain all object sets with the same attributes (i.e., concepts, which are represented by nodes in Fig. 2), and form a concept lattice of the formal context [TeX:] $$\mathrm{F}^{\mathrm{c}}$$, which is represented as Fig. 2.
Supplemental formal context [TeX:] $$\mathrm{F}^{\mathrm{c}}$$
Concept lattice of the supplement context [TeX:] $$\mathrm{F}^{\mathrm{c}}$$.
Similar to the acquisition process in Fig. 2, the concept lattice of the formal context F can be obtained. Based on the formal contexts F and [TeX:] $$\mathrm{F}^{\mathrm{c}}$$, the OE-concept lattice is obtained by Definition 2, which is shown in Fig. 3.
OE lattice of contexts F and [TeX:] $$\mathrm{F}^{\mathrm{c}}$$.
3. Proposed Maximal Cliques Detection Approach
This section proposes two novel methods for maximal balanced clique detection in the signed networks. The first method obtains the modified OE lattice through the existing three-concept lattice algorithm using the improved three-way formal context and modified formal context. We have detected all maximal balanced cliques in the OE lattice. The second method is to optimize the first method to improve the detection of the maximal balanced cliques using the traditional FCA along with the modified formal context.
In the signed social networks, we applied the formal context and supplemental formal context for representing the positive and negative relationships between nodes, which is denoted as modified formal context.
Definition 3 (Modified formal context). If a signed network g has nodes {v1,v2,...,vn}, each node vi represents an object and its attribute, and edge e = (vi, vj) is the relationship between the objects and attributes, [TeX:] $$\text { I or } \mathrm{I}^{\mathrm{c}}$$. If [TeX:] $$\text { e } \in \mathrm{E}+$$, notes I(vi, vj) = 1, if [TeX:] $$\text { e } \in \mathrm{E}-$$, notes [TeX:] $$\text {I}^\mathrm{c}(\mathrm{vi}, \mathrm{vj})=1.$$ Then, if F = (O, A, I) and [TeX:] $$\mathrm{F}^{\mathrm{c}}=\left(\mathrm{O}, \mathrm{A}, \mathrm{I}^{\mathrm{c}}\right)$$ are modified formal contexts, they can be represented as two adjacency matrixes containing only 0 and 1.
Definition 4 (Modified OE and AE). Taking the modified formal context and supplemental formal context for generating OE-concept and AE-concept in a three-way concept algorithm, we can get a modified OE lattice (denoted MOE) and a modified AE lattice (denoted MAE).
Maximal cliques detection approach 1 (MCDA1): The modified formal context F = (O, A, I) and supplemental formal context [TeX:] $$\mathrm{F}^{\mathrm{c}}=\left(\mathrm{O}, \mathrm{A}, \mathrm{I}^{\mathrm{c}}\right)$$ are used to obtain modified three-way concept lattices, MOE and MAE. When there is a concept in MOE and X = Y, i.e., (X, (X, N)), X and N form the maximal balanced clique represented by {X, N}, and (X, (X, N)) is the maximal balanced OE-concept.
Theorem 1. Let MOE be the OE-concept lattice of modified formal contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^c \text {. }$$ For any [TeX:] $$(\mathrm{X}, (\mathrm{Y}, \mathrm{N})) \in \mathrm{MOE}$$, if X = Y, it is denoted as (X, (X, N)), {X, N} is the maximal balanced clique, such that [TeX:] $$\forall \mathrm{u},\mathrm{v} \in \mathrm{X} \text { or } \mathrm{u}, \mathrm{v} \in \mathrm{N} \rightarrow(\mathrm{u}, \mathrm{v}) \in \mathrm{E}^{+}$$, and [TeX:] $$\forall \mathrm{u} \in X, \mathrm{v} \in \mathrm{N} \text { or } \mathrm{u} \in \mathrm{N}, \mathrm{v} \in X \rightarrow(\mathrm{u}, \mathrm{v}) \in \mathrm{E}^{-}$$.
Proof. There is [TeX:] $$(\mathrm{X}, \mathrm{Y}) \in \mathrm{FC}(\mathrm{F}) \text { and }(\mathrm{M}, \mathrm{N}) \in \mathrm{FC}\left(\mathrm{F}^c\right) \text {, and }(\mathrm{X},(\mathrm{Y}, \mathrm{N})) \in \mathrm{MOE} \text {. }$$
1) If X = Y, we can get [TeX:] $$\mathrm{(X, X)}=\mathrm{(X, Y)} \in \mathrm{F C(F)}$$, while (X, X) is an equi-concept in FC(F) and (X, X) is a maximal clique and [TeX:] $$\forall \mathrm{u}, \mathrm{v} \in \mathrm{X},(\mathrm{u}, \mathrm{v}) \in \mathrm{E}^{+}$$.
2) Because of [TeX:] $$(\mathrm{X},(\mathrm{Y}, \mathrm{N})) \in \mathrm{MOE}$$, if X = Y, we can get [TeX:] $$(\mathrm{X},(\mathrm{Y}, \mathrm{N}))=(\mathrm{X},(\mathrm{X}, \mathrm{N})) \in \mathrm{MOE}$$. According to Definition 4, for a modified OE-concept [TeX:] $$(\mathrm{X},(\mathrm{X}, \mathrm{N})),\left(\mathrm{X}, {\mathrm{N})^{\geq}}=\mathrm{X}^{\downarrow} \cap \mathrm{N}^{\downarrow *}=\mathrm{X}\right. \text {. }$$ Based on the definitions of concept, [TeX:] $$\mathrm{X}^{\downarrow}=\mathrm{Y}=\mathrm{X}, \mathrm{N}^{\downarrow^*}=\mathrm{M},$$, then [TeX:] $$\mathrm{X}^{\downarrow} \cap \mathrm{N}^{\downarrow *}=\mathrm{X} \cap \mathrm{M}=\mathrm{X}$$. Then, [TeX:] $$\mathrm{X} \subseteq \mathrm{M}.$$ Because of [TeX:] $$(\mathrm{M}, \mathrm{N}) \in \mathrm{FC}\left(\mathrm{F}^{\mathrm{c}}\right)$$, [TeX:] $$\forall \mathrm{u} \in \mathrm{M}, \mathrm{v} \in \mathrm{N} \text { or } \mathrm{u} \in \mathrm{N}$$, [TeX:] $$\mathrm{v} \in \mathrm{M} \rightarrow(\mathrm{u}, \mathrm{v}) \in \mathrm{E}^{-}$$, we can get that [TeX:] $$\forall \mathrm{u} \in \mathrm{X} \subseteq \mathrm{M}, \mathrm{v} \in \mathrm{N} \text { or } \mathrm{u} \in \mathrm{N}, \mathrm{v} \in \mathrm{X} \subseteq \mathrm{M} \rightarrow(\mathrm{u}, \mathrm{v}) \in \mathrm{E}^{-}$$.
The working process of MCDA1 is described in Algorithm 1. In Algorithm 1, the sets of maximal balanced cliques and modified OE-concept are initialized in Line 1. Line 3 constructs formal contexts F and Fc for the signed social network G. The Construct _MFC is given by Algorithm 2. Line 4 follows the rules of the traditional three-way concept algorithm according to the modified contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^{\mathrm{c}}$$ to get the modified OE lattice. Lines 5-8 use Theorem 1 to detect the maximal balanced cliques from MOE. Finally, set the MBC returns from Line 9.
The detection of Maximal Balanced Cliques algorithm 1 ( [TeX:] $$\left(G=\left(V, E^{+}, E^{-}\right)\right)$$)
Construct_MFC ( [TeX:] $$\left(G=\left(V, E^{+}, E^{-}\right)\right)$$)
In Line 1 of Algorithm 2, we initialized the matrices of both contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^{\mathrm{c}}$$ to 0. Lines 3-10 construct two matrices, the three-way formal context F, and the supplemental formal context [TeX:] $$\mathrm{F}^{\mathrm{c}}$$, according to Definition 12. Line 11 returns the three-way contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^{\mathrm{c}}$$.
Here, we present an example that lets us easily understand how to use an MCDA1 for the maximal balanced clique detection in the signed social networks.
Example 2. Fig. 4 shows a signed social network graph, which includes 11 nodes, eight positive edges, and 13 negative edges. The modified OE lattice of the signed social network G is obtained using Algorithms 1 and 2, meanwhile, Fig. 5 lists all OE-concepts of the modified OE lattice. The extents and intents of each concept are separated by "#"". According to Theorem 1, it is easy to find four suitable OE-concepts in the following list: (3, 4, 6) # ((3, 4, 6), (2, 10, 11)); (5, 8) # ((5, 8), (7, 9)); (2, 10, 11) # ((2, 10, 11), (3, 4, 6)); (7, 9) # ((7, 9), (5, 8)). Moreover, we obtained two maximal balanced cliques: {{3, 4, 6}, {2, 10, 11}}; {{5, 8}, {7, 9}}.
The signed social network G.
Modified OE-concepts list.
In Fig. 6, the purple and orange parts are two maximal balanced cliques with vertex label numbers consistent with the above results.
Due to the three-way concept method being a NP-completed problem, so the MCDA1 also suffered from a high time complexity problem because if there are massive nodes in a signed social network, this method cannot quickly obtain the modified OE lattice, making it unable to quickly obtain the maximal balanced cliques. Therefore, we analyzed the characteristics of OE-concepts based on the three-way concept method, and proposed another detection algorithm of maximal balanced cliques that can be quickly obtained, starting directly from the traditional concept analysis with the formal context and supplemental formal context, which is described as MCDA2.
Maximal balanced cliques in signed social network G.
Maximal cliques detection approach 2 (MCDA2). The modified formal context F = (O, A, I) and supplement context [TeX:] $$\mathrm{F}^{\mathrm{c}}=\left(\mathrm{O}, \mathrm{A}, \mathrm{I}^{\mathrm{c}}\right)$$ are used as two input formal contexts for the traditional FCA method. According to the input contexts and formal context analysis, we obtained two modified formal concept lattices, [TeX:] $$\Omega(\mathrm{F}) \text { and } \Omega\left(\mathrm{F}^{\mathrm{c}}\right)$$.
Principle 1. Compare with each concept in [TeX:] $$\Omega(\mathrm{F}) \text { and } \Omega\left(\mathrm{F}^{\mathrm{c}}\right)$$. When there are two concepts (X, Y) and (C, D) in the [TeX:] $$\Omega(\mathrm{F}) \text { and } \mathrm{X=Y} \text { and }\mathrm{ C=D} \text {, }$$ they are equi-concepts. Concurrently, if there is (X, D) in the [TeX:] $$\Omega\left(\mathrm{F}^{\mathrm{c}}\right)$$, X is the extent of (X, Y) and (X, D), D is the intent of (C, D) and (X, D), then X and D form a maximal balanced clique, which is denoted as {X, D}.
The detection of Maximal Balanced Cliques algorithm 2 ( [TeX:] $$\left(G=\left(V, E^{+}, E^{-}\right)\right)$$)
Algorithm 3 explains the above processes. In Algorithm 3, Line 1 initializes the sets of maximal balanced cliques, equi-concept, concept lattice [TeX:] $$\Omega(\mathrm{F}), \Omega\left(\mathrm{F}^\mathrm{c}\right)$$. Line 3 constructs the formal contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^{\mathrm{c}}$$ for the signed social network G, while the construct-MFC is given in Algorithm 2. Lines 4-5 obtain the concept lattice of formal contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^{\mathrm{c}}$$. Lines 6-8 obtain the set of equi-concept. Lines 9-12 detect maximal balanced cliques through Approach 2 based on Principle 1. Finally, the set of maximal balanced clique will return in Line 9.
Example 3. Fig. 7 shows the concept lattice of contexts [TeX:] $$\mathrm{F} \text { and } \mathrm{F}^{\mathrm{c}}$$. If we follow Approach 2, we can get four equi-concepts (the number of extent or intent is more than one): (3, 4, 6)#(3, 4, 6); (2, 10, 11)#(2, 10, 11), (5, 8)#(5, 8), (7, 9)#(7, 9). We used the previous four equi-concepts to obtain the four concepts in lattice [TeX:] $$\Omega\left(\mathrm{F}^\mathrm{c}\right)$$, and followed the rules in Approach 2: ((3, 4, 6) # (2, 10, 11)); ((5, 8) # (7, 9)); ((2, 10, 11) # (3, 4, 6)); ((7, 9) # (5, 8)). Then, we obtained two maximal balanced cliques: {{3, 4, 6}, {2, 10, 11}}; {{5, 8}, {7, 9}}. As shown in Fig. 6, the purple and orange parts are two maximal balanced cliques with vertex labels numbers consistent with the above results.
Maximal Balanced Cliques in Signed Social Network G.
4. Experiment and Performance Analysis
This section presents experimental datasets and execution times. The experimental environment consists of the macOS Catalina operating system and two Intel 2.3 GHz Quad-Core Intel Core i5, RAM 16G, and uses the programming language Python 3.8. In this section, we used two approaches proposed in Section 3 to perform experiments.
The modified three-way concept lattice method can deal with symmetric and asymmetric formal contexts, but the modified FCA can only handle the symmetric formal context of maximal balanced detection. Therefore, we first tested approach 1 (modified three-way concept) using three different real-life asymmetric datasets. We used large symmetric real-life datasets to compare approach 1 to approach 2 (modified FCA).
4.1 Time Efficiency Analysis
4.1.1 Asymmetric dataset
In this part, we tested the run-time of three asymmetric datasets by MCDA1. There are three real-life datasets (Table 3), and the first dataset is 11 nodes from AdjWordNet [13], which involves synonyms and antonyms of certain words. The second dataset is a network of intra-organization [14], which is a dataset of a consulting company (46 employees) and records the requests a frequent quantity of information or advice among employees. The third dataset is Freeman’s EIES networks [15], which includes some personal relationships among 46 researchers.
As shown in Fig. 8, comparing the run-time among different datasets, the first dataset is less than that of the other two datasets. Thus, although we can see that MCDA1 is highly sensitive to the number of nodes and edges, it has the advantage of being able to compute the asymmetric formal context that is not limited to social network datasets.
Running time of modified three-way concept (unit: ms).
4.1.2 Symmetric dataset
In this part, we tested the run-times of MCDA1 and MCDA2 by using the Slashdot dataset [16]. Slashdot is known as a specific user community, whose website has the ability to allow users to submit the latest news on the site and for editors to evaluate the content. Additionally, the Slashdot dataset entails friends and relationships between Slashdot users. Slashdot includes 77,350 nodes and 516,575 edges, but in this paper, we take only the first 2,000 nodes of the dataset and experiment in groups of different sizes.
We transform the relationships of a dataset into undirected relationships without considering the positive or negative relationship in a single direction to obtain a symmetric matrix that simulates an undirected weighted social network (Table 4). The edges number of each dataset is counted in Fig. 9.
Fig. 10 shows the comparison of the run-time on datasets 1-7 with the MCDA1, MCDA2 and FCA-based methods (which was proposed in [17]). Note that the algorithm only works for detecting a k-balance clique in the signed social network to make this algorithm comparable with our algorithms, and we made a small improvement on the algorithm. The blue line shows the run-time by the MCDA1, the red line shows the run-time by the MCDA2, and the green line shows the run-time by the FCA-based method. According to Fig. 11, it is observed that the execution time of the MCDA2 method significantly outperforms the other two methods.
Edges number of datasets 1-14.
Execution time of MCDA1, MCDA2, and FCA-based (datasets 1-7).
Execution time of MCDA1, MCDA2, and FCA-based (datasets 8-14).
4.2 Case Study of Detecting Synonyms in Vocabulary
In Section 4.1, we used three datasets to compare the time efficiency of the approaches and demonstrate the utility of our methods. However, we not only focused on the time efficiency of our method, but also cared about the utility. Hence, in the case study, we applied the AdjWordNet dataset to our method, and the results illustrate the utility of our method. AdjWordNet is downloaded from WordNet (www://wor dnet.priceton.edu). This dataset contains 117,000 synonyms and antonyms of certain words. Among them, the synonyms are linked by positive edges, while the antonyms are linked by negative edges, without the presences of edges between unrelated words.
In our case study, to observe the flow and effect of our method more clearly, we selected 25 words randomly from AdjWordNet. The graph of the 25 words is shown in Fig. 12 and Table 5.
Twenty-five random words of AdjWordNet data
Twenty-five words in AdjWordNet.
Fig. 12 shows the 25 words and their relationships, and among them, the orange solid line shows that the two words are synonyms, while the blue dotted line represents that the two words are antonyms, with the words without a line bearing no relationship. Table 5 contains the vocabulary list of the nodes in Fig. 12.
The results which are obtained by our method is shown in Table 6. In this table, the words in [TeX:] $$\mathrm{C}_1 \text { and } \mathrm{C}_2$$ are denoted as synonyms for each line, respectively and each column [TeX:] $$\mathrm{C}_1 \text { and } \mathrm{C}_2$$ represents a group of two antonyms.
We visualized the three maximal balanced cliques in Table 6, as shown in Fig. 13, which denotes the following: the green nodes are the first maximal balanced cliques, the orange nodes are the second maximal balanced cliques, the gray nodes are the third maximal balanced cliques, and the remaining purple nodes are the vocabulary with no synonyms and antonyms in this dataset.
Result of maximal balanced cliques in AdjWordNet
Node visualization of three maximal balanced cliques.
5. Conclusion
In this paper, we proposed an improved three-way concept algorithm MCDA1 and an improved formal concept algorithm MCDA2 for detecting the maximal balanced clique in signed social networks. To demonstrate the feasibility of the proposed method, in the performance evaluation, four real-life signed social networks were used to detect maximal balanced cliques, and synonyms and antonyms were used on the AdjWorldNet dataset. Also, we found that the two methods provide both advantages and dis-advantages. Accordingly, MCDA1 method has a wider application range on asymmetric datasets, whereas the MCDA2 method has a faster computation speed and higher efficiency. This proposed method can be applied in detecting synonyms and antonyms in a vocabulary list, detecting social network user groups, and analyzing the protein molecular structure. Moreover, it can be applied to more scenarios in the future. Going forward, we plan to solve the problem of long run-times in the MCDA1.