## Wei Jia* ** , Qingyi Hua* , Minjun Zhang* , Rui Chen* , Xiang Ji* and Bo Wang* ***## |

Entropy measure | A_{1} | A_{2} | A_{3} | A_{4} | A_{5} | A_{6} | A_{7} | A_{8} | A_{9} | A_{10} |
---|---|---|---|---|---|---|---|---|---|---|

[TeX:] $$E_{\text {Warg }}(A)$$ | 1 | 1 | 0.8333 | 0.8889 | 1 | 1 | 0.875 | 0.8571 | 0.5 | 0.5 |

[TeX:] $$E_{Z e n g}(A)$$ | 1 | 1 | 0.9 | 0.9 | 1 | 1 | 0.9 | 0.9 | 0.6 | 0.5 |

[TeX:] $$E_{L i}(A)$$ | 1 | 1 | 0.9495 | 0.9495 | 1 | 1 | 0.9495 | 0.9495 | 0.768 | 0.6875 |

[TeX:] $$E_{V e r m a}(A)$$ | 1 | 1 | 0.9905 | 0.9905 | 1 | 1 | 0.9905 | 0.9905 | 0.8463 | 0.7588 |

[TeX:] $$E_{W e i}(A)$$ | 1 | 1 | 0.9898 | 0.9957 | 1 | 1 | 0.9945 | 0.9927 | 0.8662 | 0.8662 |

[TeX:] $$E_{G a o}(A)$$ | 1 | 0.5 | 0.5 | 0.74 | 0.58 | 0.52 | 0.62 | 0.54 | 0.44 | 0.5 |

[TeX:] $$E_{L i n}(A)$$ | 1 | 1 | 1 | 0.9999 | 1 | 1 | 0.9999 | 0.9999 | 0.9995 | 0.998 |

[TeX:] $$E(A)$$ | 1 | 0.5 | 0.4525 | 0.6735 | 0.6 | 0.5368 | 0.5694 | 0.4974 | 0.2952 | 0.3125 |

From Table 1, we find that [TeX:] $$E_{\text {zeng }}(A), E_{L i}(A) \text { and } E_{\text {Verma}}(A)$$ cannot distinguish the fuzziness degrees of [TeX:] $$A_{3}, A_{4}, A_{7} \text { and } A_{8}.$$ Moreover, their calculation results show that the fuzziness degrees of [TeX:] $$A_{2}, A_{5}\text { and } A_{6}$$ are equal to 1, which is obviously unreasonable. The reason is that they omit intuitionistic index and only consider the difference between membership degree and non-membership degree. Although [TeX:] $$E_{\text {Warg }}(A), E_{\text {Wei }}(A), E_{\text {Gao }}(A) \text { and } E_{\text {Liu }}(A)$$ consider the intuitionistic index, they do not take full account of the relationship between known information and unknown information. For this reason, their calculation results do not clearly show the differences between these IFSs in terms of fuzziness and intuitionism.

By comparing the calculation results as listed in Table 1, the proposed entropy measure [TeX:] $$E(A)$$ can definitely distinguish the fuzziness degrees of these IFSs. Further, for [TeX:] $$A_{3}, A_{4}, A_{7} \text { and } A_{8},$$ we can see that differences between the membership degrees and the non-membership degrees are the same. In this case, the greater value of [TeX:] $$\pi_{A}\left(x_{i}\right),$$ the fuzzier the corresponding element [TeX:] $$A_{i}.$$ Thus uncertain information of [TeX:] $$A_{4}$$ is the biggest, uncertain information of [TeX:] $$A_{7}$$ is more than that of [TeX:] $$A_{8}.$$ and uncertain information of [TeX:] $$A_{3}$$ is the least. The calculation results of the proposed entropy measure show that [TeX:] $$E\left(A_{3}\right)<E\left(A_{8}\right)<E\left(A_{7}\right)<E\left(A_{4}\right),$$ as we would expect. Hence, the proposed entropy measure is not only mathematically correct but also reasonable.

In this study, EODPSO is proposed in this study to optimize clustering parameters. To improve the performance of the PSO algorithm, we use a hybrid strategy to avoid local minima and accelerate convergence speed. The hybrid strategy contains the following three aspects.

(1) Chaotic opposition-based learning (COL) [39] is used to obtain a better initial population.

(2) CA is combined with PSO algorithm to enhance local search ability.

(3) We present a new population search strategy that combines OL and DE to extend the performance of global search.

A reasonable initial population plays a vital role in helping to avoid premature local optimization and accelerate convergence speed. COL is a population initialization method which combines chaotic systems and OL to obtain initial population. Thus COL is employed in EODPSO algorithm to generate an initial population.

In [39], the authors used the Sine map in their chaotic system. Nevertheless, the Sine map has two problems. Firstly, its chaotic range is limited to a finite interval. Secondly, the data distribution of output chaotic sequences is non-uniform. To avoid these problems, we use an improved 1-dimensional chaotic system [40], called Tent-Sine system, in COL to obtain better result. The Tent-Sine system uses the Tent and Sine maps as seed maps. The Tent-Sine system is defined as follows:

where [TeX:] $$Q_{i}$$ is the output chaotic sequence, [TeX:] $$r^{\prime}$$ is a parameter with range of [TeX:] $$(0,4]$$ mod is the modulo operation, is the iteration number, [TeX:] $$\left\{\begin{array}{cc}{\frac{r^{\prime} Q_{i}}{2}} & {Q_{i}<0.5} \\ {\frac{r^{\prime}\left(1-Q_{i}\right)}{2}} & {Q_{i} \geq 0.5}\end{array}\right.$$ is Tent map, [TeX:] $$\frac{\left(4-r^{\prime}\right)\left(\sin \left(\pi Q_{i}\right)\right)}{4}$$ is Sine map. At iteration , the opposite search of particle is carried out as follows:

where [TeX:] $$y_{i, j}(t)$$ is current position of particle [TeX:] $$i, \overline{y}_{i, j}(t)$$ is the result of reverse search of particle , [TeX:] $$y_{\min , j}(t) \text { and } y_{\max j}(t)$$ are lower and upper bounds for the dimension , respectively.

COL algorithm consists of two main parts. In the first part, we use Tent-Sine system to generate an initial population. In the second part, we use OL to generate an opposite population. Then these two populations are amalgamated and a new initial population is generated.

Suppose all particles fly in D-dimensional space. The steps of COL are described by Algorithm 1.

CA is a space and time-discrete dynamical system that evolves according to a set of local rules and neighbor cells. It is composed of five basic components, which are cell, cell space, cell state, neighborhood, and transition rule. As is shown in Fig. 4, Moore model is one of the well-known neighborhoods in 2- dimensional CA. The black cell is the center one, and those gray ones are its neighbors. At a discrete time step, the update of a cell state obtains by taking into account the states of cells in its local neighborhood. All cells communicate and change states synchronously.

Current studies have shown that CA is a powerful method to enhance the local search ability of the PSO algorithm [24]. In EODPSO algorithm, CA is integrated in particle update to avoid local minima, and each particle is defined as a single cell and is distributed in 2-dimensional space. The CA model for EODPSO algorithm is described as follows:

(1) Cell: each particle is defined as a single cell;

(2) Cell space: the set of all cells;

(3) Cell state: cell state is defined as the particle’s position. The state of cell i is defined as [TeX:] $$s_{i}(t)=y_{i}(t);$$

(4) Neighborhood: the neighborhood of particle is defined as [TeX:] $$H(i)=\{i+1, i+2, \cdots, i+h\},$$ where h is the number of neighbors;

(5) Transition rule: [TeX:] $$s_{i}(t)=f\left(s_{i}(t), s_{i+1}(t), s_{i+2}(t), \cdots, s_{i+h}(t)\right).$$

In order to accurately judge when population search strategy is implemented, we use the entropy of IFSs as population diversity index and utilize the proposed entropy measure to achieve the state of particle swarm in the EODPSO algorithm. According to Theorem 1, the proposed intuitionistic fuzzy entropy measure in PSO algorithm is defined as follows:

**DEFINITION 7:** Let [TeX:] $$A^{\prime}$$ is a particle swarm in a finite set [TeX:] $$X=\left\{x_{i} | i=1,2, \cdots, n\right\}.$$ At iteration , the intuitionistic fuzzy entropy measure in PSO algorithm is defined as:

where [TeX:] $$E^{t}\left(A^{\prime}\right)$$ is in range [TeX:] $$[0,1].$$

It is known from Definition 7 that when the entropy value equals 0, all particles converge to one point. When entropy value equals 1, all particles reach the maximum dispersion. In addition, is used as a threshold to decide whether the population search strategy is implemented. When the entropy value is smaller than , it means that EODPSO algorithm falls into local optima. In this case, we use the proposed population search strategy to enhance global search ability.

DE algorithm is an evolutionary algorithm based on real number coding and the evolution of population. It guides the population towards the vicinity of the global optimum by repeating cycles of mutation, crossover and selection. DE algorithm starts with a population of NP D-dimensional search variable vectors. During the iteration t+1, for each individual [TeX:] $$y_{i}(t)=\left\{y_{i_{1}}(t), y_{i_{2}}(t), \cdots y_{i_{n}}(t)\right\}(i \in\{1,2, \cdots, N P\}),$$ a mutation vector [TeX:] $$\overline{v}_{i}(t)$$ is computed by using the following mutation operator:

where [TeX:] $$\alpha_{1}, \alpha_{2} \text { and } \alpha_{3}$$ are random unequal integers within the set [TeX:] $$\{1, \cdots, i-1, i+1, \cdots, N\}.$$ F is the mutation control parameter.

The trial vector [TeX:] $$y_{i, j}^{\prime}(t+1)$$ is generated by using the following crossover operation:

where [TeX:] $$j=1,2, \cdots, N, y_{i, j}^{\prime}(t+1), \overline{v}_{i, j}(t) \text { and } y_{i, j}(t) \text { are } j^{\text {th }}$$ parameter for the [TeX:] $$t^{\mathrm{th}}$$ trial, mutant and vectors. [TeX:] $$r a n d_{j}$$ is a uniformly distributed random number within [TeX:] $$[0,1].$$ CR is a cross control parameter and is usually selected from within [TeX:] $$(0,1].$$ It controls the diversity of population and helps the DE algorithm to escape from local optima. [TeX:] $$j_{\text {rand }}$$ is a randomly chosen index within [TeX:] $$[0, N].$$ It ensures that [TeX:] $$y_{i, j}^{\prime}(t+1)$$ gets at least one parameter from [TeX:] $$\overline{v}_{i}(t).$$

Moreover, as discussed in Section 3.2.1, OL is also an effective algorithm for searching population. The main idea of OL is to consider an estimate and its corresponding opposite estimate simultaneously. Our new population search strategy makes use of the merits of DE and OL algorithm. It not only helps the PSO algorithm to escape from local optima but also accelerates the convergence speed of the PSO algorithm. In the new population search strategy, OL and DE are applied to obtain a new population. Specifically, we firstly use OL to generate an opposite population, and then we use mutation and crossover to generate a mutation population. Finally, we select S fittest particles from the current population, opposite population and mutation population as a new population.

In EODPSO algorithm, each particle is defined as cell and updates its current status with neighbors’ states. Based on the evolution mechanism of CA, COL is used to generate a reasonable initial population. Then, at each iteration, each particle’s velocity and position are updated by using the evolution mechanism of CA. When the entropy value of the population is smaller than threshold β, then OL and DE are applied to generate an opposite population and a mutation population, respectively. A new population is obtained by selecting S fittest particles from the current population, opposite population and mutation population. At the next iteration, the new population is considered as an initial population. To accelerate convergence speed and improve optimization quality effectively, inertia weight W is calculated by using a linear decreasing way. W is calculated as follows:

where [TeX:] $$w_{\max } \quad \text { and } \quad w_{\min }$$ represent the maximum and minimum of W respectively, t is the current iterative number and [TeX:] $$T_{\max }$$ is the maximum number of iterations.

EODPSO algorithm is shown as follows:

In this section, we propose a SSKFCM clustering method to cluster the MUIP data. In the framework of MUIP clustering, EODPSO algorithm is used to optimize clustering parameters.

First of all, EODPSO algorithm is exploited to optimize the kernel parameter σ of Gaussian kernel function. During the process of optimizing parameter , fitness function is defined to assess the generated solutions. One of the clustering evaluation metric, Davies-Bouldin index (DBI) [41], is used in this fitness function to measure the influence of on clustering quality. The lower value of DBI signifies better clustering result. The fitness function is defined as follows:

where [TeX:] $$\overline{R}$$ is DBI.

The DBI is defined as follows:

where [TeX:] $$e^{\prime}$$ is the number of clusters. [TeX:] $$R_{i} \equiv \text { maximum of } R_{i, j}.$$

where [TeX:] $$d_{i} \text { and } d_{j}$$ are the scatter with the [TeX:] $$i^{\text {th }} \text { center and } j^{\text {th }}$$ center, respectively. [TeX:] $$M_{i, j}$$ is the distance between the [TeX:] $$i^{t h}$$ cluster center and the [TeX:] $$j^{\text {th }}$$ cluster center.

We can see that the smaller the value of [TeX:] $$\overline{R},$$ the greater the fitness value [TeX:] $$f_{1}(x)$$ and the better value of parameter.

Then, considering that the cluster centers and the number of clusters also influence the clustering effect directly, the EODPSO algorithm is reused to initialize the cluster centers and DBI is used as a judgment index to determine the optimal number of clusters. In [42], they discussed the optimal number of clusters. The results of their study show that the optimal number of clusters is in range [TeX:] $$[0, \sqrt{n}],$$ where is the size of the data set [TeX:] $$Z=\left\{z_{1}, z_{2}, \cdots, z_{n}\right\}.$$

A fitness function is defined to assess the generated cluster centers during the initialization phase. The definition of the fitness function is expressed below:

where [TeX:] $$J(U, B)$$ is the objective function of KFCM, as shown in Eq. (5).

It has been expressed clearly that the smaller the value of [TeX:] $$J(U, B),$$ the better the effect of the clustering and the greater the fitness value of [TeX:] $$f_{2}(x).$$

Fig. 5 illustrates the framework of MUIP clustering method. The MUIP clustering includes two stages. In the first stage, initial cluster centers are generated by using EODPSO algorithm. In the second stage, the kernel parameter is optimized by utilizing EODPSO algorithm. Then, the fuzzy partition matrix and cluster centers are updated. If the value of DBI is greater than maximum number of clusters [TeX:] $$e_{\max },$$ clustering method is stopped and the fuzzy partition matrix and cluster centers are output. Otherwise, clustering method continues to optimize the kernel parameter and update the fuzzy partition matrix and cluster centers.

Algorithm 3 shows the specific steps of MUIP clustering. MUIP clustering consists of two loops. The inner loop is used to optimize the parameter and update [TeX:] $$u_{i, j} \text { and } b_{j}.$$ The outer loop is used to determine the fuzzy partition matrix and cluster centers. At each iteration of the inner loop, the clustering method first optimizes the parameter σ by using EODPSO algorithm and fitness function [TeX:] $$f_{1}(x).$$ Then, the fuzzy partition matrix [TeX:] $$u_{i, j} \text { and cluster centers } b_{j}$$ are updated. At each iteration of the outer loop, the value of DBI is calculated and used to determine the number of clusters.

In general, developers select MUIPs according to functional goals. Therefore, we exploit the MUIP clustering method to cluster MUIPs based on the functional goals. Our data set is constructed by collecting 5,397 MUIPs from [43,44] and seven MUIPs websites (http://www.mobile-patterns.com/, http://inspired-ui.com/, http://ui-patterns.com/, http://pttrns.com/, https://www.cocoacontrols.com/, http://ios-patterns.com/, http://mobiledesign.com/). To facilitate testing, we define 51 types of functional goals based on our previous study on MUIP data collection [11].

In our experiments, DBI, accuracy [45], adjusted rand index (ARI) [46] and normalized mutual information (NMI) [47] are employed to evaluate the performance of the experimented methods. DBI is used to measure the within-cluster scatter and the inter-cluster separation. Since ARI is not sensitive to the number of clusters, it is used to compare two partitions with different cluster numbers. Accuracy and NMI are used to measure the quality and efficiency of the cluster method, respectively. Note that the greater values of accuracy, ARI and NMI signify better clustering result. DBI is defined above and the other three metrics are defined as follows:

where n is the total number of data, [TeX:] $$a_{i} \text { and } \overline{a}_{i}$$ are the true class label and the obtained cluster label, [TeX:] $$\delta(a, \overline{a})$$ is a function that equals 1 if [TeX:] $$a=\overline{a}$$ and equals 0 otherwise. Function [TeX:] $$\operatorname{map}(\cdot)$$ is a permutation function that maps each cluster label to a class label. The optimal matching can be found with Hungarian algorithm [48].

where n is the total number of data, [TeX:] $$\eta_{i, j}$$ is the number of data that belong to cluster and cluster , [TeX:] $$\boldsymbol{n}_{i}$$ denotes the number of data that belong to cluster [TeX:] $$i, n_{j}$$ denotes the number of data that belong to cluster [TeX:] $$j \cdot e_{1}$$ is the number of clusters that are obtained after running a clustering method. [TeX:] $$e_{2}$$ is the number of clusters that come from priori partition.

where [TeX:] $$e^{\prime}$$ is the number of clusters, is the total number of data, [TeX:] $$n_{i, j}$$ denotes the number of data in cluster i as well as in cluster [TeX:] $$j, n_{i} \text { and } n_{j}$$ denote the number of data in cluster and cluster , respectively.

Before applying our method to MUIP data, several parameters need to be set. The parameters of our method are set as: [TeX:] $$S=50, \quad K_{\max }=300, \quad w_{\max }=0.9, \quad w_{\min }=0.1, \quad c_{1}=c_{2}=2, \quad T_{\max }=500, \quad h=8, \eta=0.0001, m=2, q_{\max }=800, F=0.5, C R=1.$$ Considering that the change of entropy value affects the performance of the EODPSO algorithm and further affects the clustering result of the proposed clustering method, we evaluate the critical by using the above evaluation metrics. We increase gradually from 0.1 to 0.9 and the proposed clustering method is run 20 times for each value of . We randomly select 10% data as labeled data and the rest as unlabeled data each time. Fig. 6 reports the mean values and standard deviations of the four evaluation metrics for different values of . It should be noted that“Mean”indicates the mean value and “Std” indicates the standard deviation in this paper. It is clear that these four evaluation metrics change significantly when changing the value of from 0.1 to 0.7, but they change very slightly when changing the value of from 0.7 to 0.9. At the same time, we can see from the results that values of Accuracy, ARI and NMI are close to their maximums and values of DBI is close to its minimum after [TeX:] $$\beta=0.7.$$ This means when [TeX:] $$\beta<0.7,$$ EODPSO algorithm falls into local optima. In this case, it requires a population search strategy to jump out of local optima. On the contrary, when [TeX:] $$\beta \geq 0.7,$$ EODPSO algorithm does not fall into local optima. In this case, the population diversity is not much affected by the increasing value of . All these results indicate that the critical value of is 0.7. Therefore, we set [TeX:] $$\beta=0.7$$ in our clustering method.

In this paper, the experimental studies focus on the clustering effect and the convergence of the proposed MUIP clustering method. In order to make it evident to show the performance of EODPSO clustering method, other relevant clustering methods are used as comparisons. Specifically, DBSCAN, GDBSCAN, BIRCH and CLARANS are used in semi-supervised clustering method to cluster MUIP data. Additionally, GA, SA, PSO, and four variant PSO algorithms mentioned in this paper, namely DEPSO, CPSO, CAPSO, and EPSO, are used to replace EODPSO algorithm in the framework of MUIP clustering. Then the seven alternative SSKFCM clustering methods are applied to cluster MUIP data.

The first experiment is to compare the clustering effect of those clustering methods in different ratios of labeled data. All methods are run 20 times independently on MUIP data. Fig. 7 shows the mean values of four evaluation metrics of the clustering method accompanied with standard deviations. The mean value indicates the clustering effect of the methods and the standard deviation indicates the stability of the methods. Mean values of four evaluation metrics are clearly evident that PSO-based SSKFCM clustering methods usually outperform other semi-supervised clustering methods. The reasons are as follows. Firstly, density-based spatial clustering methods cannot discover overlapping clusters. Secondly, BIRCH and CLARANS clustering methods only work well for convex or spherical clusters. Thirdly, GA and CA easily fall into local optima and have slow convergence speed. Moreover, it can be seen that EODPSO clustering method surpasses all other PSO-based clustering methods. Compared to other PSO algorithms, EODPSO algorithm has better global search ability because it utilizes a hybrid strategy to avoid local minima and accelerate convergence speed. Thus EODPSO clustering method achieves better results than other PSO-based clustering methods.

It is worth noting that the performances of all methods are less dependent on labeled data as the ratio of labeled data increases. The reason is that when the number of labeled data reaches a certain ratio, the whole MUIP data space can be represented better and the influence of labeled data on clustering effect decreases gradually.

Tables 2–7 and Fig. 8 show more details on the performance of the clustering methods. The best results are highlighted in bold in tables. As can be seen from Tables 2–5, the overall standard deviations of four evaluation metrics of EODPSO clustering method are lower than that of other methods. It means that EODPSO clustering method has not only better clustering effect but also better stability than all other clustering methods.

Table 2.

Clustering method | Ratio of labeled data (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||

EODPSO | Mean | 0.3985 | 0.3546 | 0.3373 | 0.3305 | 0.3215 | 0.3182 | 0.3143 | 0.3117 | 0.3094 | 0.3076 |

Std | 0.0025 | 0.0041 | 0.0063 | 0.0124 | 0.0039 | 0.0105 | 0.0085 | 0.0068 | 0.0042 | 0.0038 | |

PSO | Mean | 0.6681 | 0.6174 | 0.5911 | 0.5855 | 0.5676 | 0.5456 | 0.5335 | 0.5309 | 0.5248 | 0.5185 |

Std | 0.0125 | 0.0091 | 0.0108 | 0.0174 | 0.0098 | 0.0116 | 0.0145 | 0.0078 | 0.0137 | 0.0074 | |

DEPSO | Mean | 0.6052 | 0.5477 | 0.5234 | 0.5086 | 0.5028 | 0.4847 | 0.4813 | 0.4739 | 0.4672 | 0.4655 |

Std | 0.0115 | 0.0085 | 0.00145 | 0.0095 | 0.0135 | 0.0074 | 0.0095 | 0.0068 | 0.0116 | 0.0061 | |

CPSO | Mean | 0.4968 | 0.4419 | 0.4182 | 0.4047 | 0.4021 | 0.3917 | 0.3878 | 0.3806 | 0.3772 | 0.3737 |

Std | 0.0059 | 0.0133 | 0.0093 | 0.0085 | 0.0105 | 0.0093 | 0.0089 | 0.0145 | 0.0062 | 0.0041 | |

CAPSO | Mean | 0.4765 | 0.4363 | 0.4238 | 0.4114 | 0.3933 | 0.3771 | 0.3727 | 0.3683 | 0.3607 | 0.3579 |

Std | 0.0146 | 0.0139 | 0.0082 | 0.0058 | 0.0092 | 0.0047 | 0.0106 | 0.0095 | 0.0087 | 0.0069 | |

EPSO | Mean | 0.4557 | 0.4033 | 0.3801 | 0.3754 | 0.3609 | 0.3577 | 0.3517 | 0.3452 | 0.3414 | 0.3375 |

Std | 0.0052 | 0.0066 | 0.0085 | 0.0077 | 0.0117 | 0.0106 | 0.0099 | 0.0078 | 0.0119 | 0.0101 | |

DBSCAN | Mean | 0.6423 | 0.6152 | 0.6067 | 0.5823 | 0.5758 | 0.5518 | 0.5425 | 0.5204 | 0.5101 | 0.5025 |

Std | 0.0132 | 0.0105 | 0.0082 | 0.0097 | 0.0068 | 0.0081 | 0.0087 | 0.0105 | 0.0058 | 0.0059 | |

GDBSCAN | Mean | 0.6258 | 0.5905 | 0.5703 | 0.5571 | 0.5409 | 0.5225 | 0.5108 | 0.4905 | 0.4792 | 0.4757 |

Std | 0.0115 | 0.0079 | 0.0087 | 0.0051 | 0.0059 | 0.0068 | 0.0091 | 0.0062 | 0.0084 | 0.0044 | |

BIRCH | Mean | 0.7177 | 0.6905 | 0.6852 | 0.6751 | 0.6658 | 0.6419 | 0.6312 | 0.6218 | 0.6102 | 0.6057 |

Std | 0.0125 | 0.0105 | 0.0112 | 0.0095 | 0.0071 | 0.0078 | 0.0083 | 0.0092 | 0.0085 | 0.0068 | |

CLARANS | Mean | 0.7081 | 0.6849 | 0.6705 | 0.6618 | 0.6603 | 0.6517 | 0.6389 | 0.6154 | 0.6058 | 0.5962 |

Std | 0.0097 | 0.0095 | 0.0118 | 0.0068 | 0.0074 | 0.0066 | 0.0087 | 0.0059 | 0.0093 | 0.0082 | |

GA | Mean | 0.6788 | 0.6525 | 0.6409 | 0.6385 | 0.6197 | 0.5986 | 0.5884 | 0.5728 | 0.5621 | 0.5584 |

Std | 0.0095 | 0.0115 | 0.0093 | 0.0085 | 0.0068 | 0.0087 | 0.0094 | 0.0121 | 0.0075 | 0.0103 | |

SA | Mean | 0.6518 | 0.6347 | 0.6214 | 0.6156 | 0.6033 | 0.5909 | 0.5817 | 0.5604 | 0.5536 | 0.5477 |

Std | 0.0067 | 0.0095 | 0.0094 | 0.0072 | 0.0117 | 0.0102 | 0.0047 | 0.0061 | 0.0093 | 0.0088 |

The best results are highlighted in bold.

Table 3.

Clustering method | Ratio of labeled data (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||

EODPSO | Mean | 0.8194 | 0.8686 | 0.8895 | 0.9043 | 0.9195 | 0.9256 | 0.9307 | 0.9365 | 0.9419 | 0.9448 |

Std | 0.0035 | 0.0052 | 0.0074 | 0.0072 | 0.0115 | 0.0041 | 0.0053 | 0.0107 | 0.0038 | 0.0056 | |

PSO | Mean | 0.6057 | 0.6409 | 0.6628 | 0.6799 | 0.6852 | 0.6994 | 0.7008 | 0.7095 | 0.7132 | 0.7176 |

Std | 0.0081 | 0.0057 | 0.0087 | 0.0093 | 0.0137 | 0.0142 | 0.0096 | 0.0075 | 0.0124 | 0.0088 | |

DEPSO | Mean | 0.6389 | 0.6774 | 0.6998 | 0.7153 | 0.7289 | 0.7347 | 0.7419 | 0.7493 | 0.7559 | 0.7587 |

Std | 0.0079 | 0.0103 | 0.0114 | 0.0096 | 0.0103 | 0.0062 | 0.0081 | 0.0066 | 0.0103 | 0.0065 | |

CPSO | Mean | 0.6731 | 0.7144 | 0.7262 | 0.7337 | 0.7404 | 0.7578 | 0.7642 | 0.7707 | 0.7798 | 0.7843 |

Std | 0.0136 | 0.0107 | 0.0098 | 0.0077 | 0.0096 | 0.0058 | 0.0072 | 0.0073 | 0.0076 | 0.0086 | |

CAPSO | Mean | 0.6896 | 0.7304 | 0.7453 | 0.7585 | 0.7622 | 0.7776 | 0.7815 | 0.7978 | 0.8029 | 0.8067 |

Std | 0.0087 | 0.0108 | 0.0097 | 0.0083 | 0.0107 | 0.0069 | 0.0059 | 0.0061 | 0.0065 | 0.0072 | |

EPSO | Mean | 0.7498 | 0.7884 | 0.8108 | 0.8143 | 0.8198 | 0.8289 | 0.8347 | 0.8425 | 0.8471 | 0.8502 |

Std | 0.0085 | 0.0077 | 0.0083 | 0.0079 | 0.0049 | 0.0102 | 0.0066 | 0.0047 | 0.0041 | 0.0059 | |

DBSCAN | Mean | 0.6059 | 0.6307 | 0.6442 | 0.6537 | 0.6605 | 0.6825 | 0.6919 | 0.7039 | 0.7156 | 0.7288 |

Std | 0.0112 | 0.0105 | 0.0078 | 0.0082 | 0.0093 | 0.0097 | 0.0065 | 0.0104 | 0.0085 | 0.0069 | |

GDBSCAN | Mean | 0.6205 | 0.6524 | 0.6693 | 0.6723 | 0.6899 | 0.6954 | 0.7072 | 0.7161 | 0.7224 | 0.7361 |

Std | 0.0071 | 0.0065 | 0.0061 | 0.0058 | 0.0074 | 0.0096 | 0.0124 | 0.0085 | 0.0049 | 0.0064 | |

BIRCH | Mean | 0.5201 | 0.5525 | 0.5609 | 0.5725 | 0.5905 | 0.6026 | 0.6122 | 0.6281 | 0.6303 | 0.6457 |

Std | 0.0128 | 0.0099 | 0.0113 | 0.0042 | 0.0071 | 0.0085 | 0.0095 | 0.0059 | 0.0067 | 0.0078 | |

CLARANS | Mean | 0.5187 | 0.5454 | 0.5584 | 0.5619 | 0.5767 | 0.5851 | 0.5905 | 0.6048 | 0.6133 | 0.6297 |

Std | 0.0078 | 0.0106 | 0.0092 | 0.0067 | 0.0069 | 0.0075 | 0.0081 | 0.0058 | 0.0095 | 0.0108 | |

GA | Mean | 0.5504 | 0.5891 | 0.5957 | 0.6097 | 0.6147 | 0.6255 | 0.6339 | 0.6587 | 0.6631 | 0.6709 |

Std | 0.0134 | 0.0077 | 0.0089 | 0.0065 | 0.0117 | 0.0095 | 0.0058 | 0.0075 | 0.0071 | 0.0109 | |

SA | Mean | 0.5357 | 0.5625 | 0.5823 | 0.5944 | 0.6036 | 0.6154 | 0.6208 | 0.6374 | 0.6452 | 0.6509 |

Std | 0.0089 | 0.0067 | 0.0084 | 0.0057 | 0.0104 | 0.0108 | 0.0074 | 0.0082 | 0.0065 | 0.0077 |

The best results are highlighted in bold.

Table 4.

Clustering method | Ratio of labeled data (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||

EODPSO | Mean | 0.7911 | 0.8479 | 0.8795 | 0.8846 | 0.8954 | 0.9037 | 0.9138 | 0.9198 | 0.9237 | 0.9296 |

Std | 0.0042 | 0.0107 | 0.0042 | 0.0045 | 0.0066 | 0.0106 | 0.0054 | 0.0069 | 0.0074 | 0.0038 | |

PSO | Mean | 0.4022 | 0.4781 | 0.5028 | 0.5229 | 0.5238 | 0.5284 | 0.5292 | 0.5367 | 0.5399 | 0.5426 |

Std | 0.0155 | 0.0074 | 0.0084 | 0.0069 | 0.0138 | 0.0117 | 0.0133 | 0.0112 | 0.0093 | 0.0136 | |

DEPSO | Mean | 0.4871 | 0.5479 | 0.5595 | 0.5986 | 0.6117 | 0.6208 | 0.6215 | 0.6393 | 0.6432 | 0.6451 |

Std | 0.0106 | 0.0096 | 0.0097 | 0.0057 | 0.0117 | 0.0101 | 0.0067 | 0.0097 | 0.0088 | 0.0072 | |

CPSO | Mean | 0.5569 | 0.6188 | 0.6438 | 0.6767 | 0.6881 | 0.6917 | 0.6991 | 0.7247 | 0.7358 | 0.7404 |

Std | 0.0091 | 0.0078 | 0.0075 | 0.0107 | 0.0114 | 0.0063 | 0.0057 | 0.0092 | 0.0087 | 0.0069 | |

CAPSO | Mean | 0.5855 | 0.6504 | 0.6893 | 0.6935 | 0.7039 | 0.7233 | 0.7386 | 0.7438 | 0.7489 | 0.7517 |

Std | 0.0058 | 0.0053 | 0.0044 | 0.0094 | 0.0063 | 0.0086 | 0.0112 | 0.0075 | 0.0105 | 0.0093 | |

EPSO | Mean | 0.6994 | 0.7539 | 0.7648 | 0.7931 | 0.8087 | 0.8125 | 0.8208 | 0.8367 | 0.8392 | 0.8433 |

Std | 0.0124 | 0.0083 | 0.0046 | 0.0086 | 0.0059 | 0.0058 | 0.0116 | 0.0071 | 0.0076 | 0.0115 | |

DBSCAN | Mean | 0.4261 | 0.4652 | 0.5104 | 0.5289 | 0.5317 | 0.5425 | 0.5596 | 0.5663 | 0.5782 | 0.5809 |

Std | 0.0112 | 0.0095 | 0.0068 | 0.0074 | 0.0105 | 0.0088 | 0.0062 | 0.0069 | 0.0081 | 0.0085 | |

GDBSCAN | Mean | 0.4585 | 0.5017 | 0.5305 | 0.5528 | 0.5607 | 0.5725 | 0.5831 | 0.5966 | 0.6065 | 0.6171 |

Std | 0.0069 | 0.0075 | 0.0058 | 0.0091 | 0.0107 | 0.0074 | 0.0083 | 0.0062 | 0.0079 | 0.0082 | |

BIRCH | Mean | 0.3582 | 0.4028 | 0.4387 | 0.4519 | 0.4611 | 0.4725 | 0.4868 | 0.4993 | 0.5177 | 0.5239 |

Std | 0.0115 | 0.0131 | 0.0085 | 0.0096 | 0.0091 | 0.0082 | 0.0079 | 0.0098 | 0.0106 | 0.0072 | |

CLARANS | Mean | 0.3717 | 0.4359 | 0.4812 | 0.4869 | 0.4927 | 0.5036 | 0.5108 | 0.5193 | 0.5276 | 0.5369 |

Std | 0.0073 | 0.0084 | 0.0109 | 0.0056 | 0.0078 | 0.0092 | 0.0064 | 0.0085 | 0.0105 | 0.0117 | |

GA | Mean | 0.3851 | 0.4125 | 0.4329 | 0.4441 | 0.4525 | 0.4688 | 0.4726 | 0.4886 | 0.4969 | 0.5098 |

Std | 0.0085 | 0.0071 | 0.0105 | 0.0069 | 0.0073 | 0.0098 | 0.0134 | 0.0128 | 0.0076 | 0.0094 | |

SA | Mean | 0.3925 | 0.4426 | 0.4592 | 0.4609 | 0.4722 | 0.4814 | 0.4961 | 0.5005 | 0.5174 | 0.5204 |

Std | 0.0139 | 0.0072 | 0.0103 | 0.0065 | 0.0082 | 0.0087 | 0.0073 | 0.0102 | 0.0095 | 0.0073 |

The best results are highlighted in bold.

Table 5.

2Clustering method | Ratio of labeled data (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||

EODPSO | Mean | 0.8103 | 0.8587 | 0.8795 | 0.8812 | 0.8891 | 0.8925 | 0.8996 | 0.9129 | 0.9153 | 0.9213 |

Std | 0.0067 | 0.0072 | 0.0107 | 0.0059 | 0.0074 | 0.0035 | 0.0113 | 0.0059 | 0.0043 | 0.0041 | |

PSO | Mean | 0.5951 | 0.6422 | 0.6678 | 0.6691 | 0.6899 | 0.7072 | 0.7068 | 0.7155 | 0.7193 | 0.7232 |

Std | 0.0085 | 0.0128 | 0.0105 | 0.0113 | 0.0084 | 0.0079 | 0.0096 | 0.0071 | 0.0062 | 0.0117 | |

DEPSO | Mean | 0.6494 | 0.6895 | 0.7034 | 0.7388 | 0.7568 | 0.7603 | 0.7731 | 0.7771 | 0.7836 | 0.7892 |

Std | 0.0095 | 0.0091 | 0.0129 | 0.0013 | 0.0081 | 0.0069 | 0.0118 | 0.0093 | 0.0072 | 0.0101 | |

CPSO | Mean | 0.6958 | 0.7296 | 0.7481 | 0.7576 | 0.7782 | 0.7903 | 0.8069 | 0.8108 | 0.8182 | 0.8229 |

Std | 0.0131 | 0.0082 | 0.0076 | 0.0075 | 0.0109 | 0.0073 | 0.0047 | 0.0067 | 0.0058 | 0.0102 | |

CAPSO | Mean | 0.6866 | 0.7423 | 0.7622 | 0.7698 | 0.7799 | 0.7873 | 0.8093 | 0.8206 | 0.8278 | 0.8305 |

Std | 0.0087 | 0.0075 | 0.0115 | 0.0097 | 0.0108 | 0.0076 | 0.0085 | 0.0077 | 0.0083 | 0.0073 | |

EPSO | Mean | 0.6866 | 0.7423 | 0.7622 | 0.7698 | 0.7799 | 0.7873 | 0.8093 | 0.8206 | 0.8278 | 0.8305 |

Std | 0.0087 | 0.0075 | 0.0115 | 0.0097 | 0.0108 | 0.0076 | 0.0085 | 0.0077 | 0.0083 | 0.0073 | |

DBSCAN | Mean | 0.6109 | 0.6528 | 0.6792 | 0.6877 | 0.7031 | 0.7185 | 0.7225 | 0.7298 | 0.7346 | 0.7388 |

Std | 0.0102 | 0.0108 | 0.0078 | 0.0082 | 0.0065 | 0.0083 | 0.0095 | 0.0115 | 0.0086 | 0.0069 | |

GDBSCAN | Mean | 0.6232 | 0.6755 | 0.6947 | 0.6994 | 0.7196 | 0.7226 | 0.7359 | 0.7431 | 0.7568 | 0.7605 |

Std | 0.0098 | 0.0076 | 0.0071 | 0.0062 | 0.0098 | 0.0057 | 0.0094 | 0.0108 | 0.0074 | 0.0068 | |

BIRCH | Mean | 0.5389 | 0.5705 | 0.5963 | 0.6193 | 0.6258 | 0.6377 | 0.6425 | 0.6499 | 0.6526 | 0.6615 |

Std | 0.0124 | 0.0098 | 0.0074 | 0.0126 | 0.0109 | 0.0073 | 0.0068 | 0.0077 | 0.0071 | 0.0105 | |

CLARANS | Mean | 0.5402 | 0.5917 | 0.6158 | 0.6351 | 0.6408 | 0.6479 | 0.6504 | 0.6682 | 0.6794 | 0.6823 |

Std | 0.0081 | 0.0095 | 0.0077 | 0.0054 | 0.0061 | 0.0093 | 0.0117 | 0.0079 | 0.0083 | 0.0092 | |

GA | Mean | 0.5525 | 0.6004 | 0.6221 | 0.6387 | 0.6429 | 0.6554 | 0.6627 | 0.6763 | 0.6841 | 0.6909 |

Std | 0.0068 | 0.0074 | 0.0052 | 0.0059 | 0.0091 | 0.0127 | 0.0082 | 0.0088 | 0.0065 | 0.0112 | |

SA | Mean | 0.5769 | 0.6211 | 0.6387 | 0.6401 | 0.6535 | 0.6592 | 0.6675 | 0.6852 | 0.6968 | 0.7055 |

Std | 0.0133 | 0.0114 | 0.0083 | 0.0064 | 0.0081 | 0.0095 | 0.0086 | 0.0105 | 0.0074 | 0.0108 |

The best results are highlighted in bold.

Tables 6, 7 and Fig. 8 show the comparisons of clustering parameters calculated by different optimization algorithms in the framework of MUIP clustering. Table 6 lists the optimal values of σ. We see clearly that EODPSO clustering method gives the best optimal values and has better stability than other clustering methods. Furthermore, we note that since different ratios of labeled data influence the values of the fuzzy partition matrix and cluster centers, the values of the optimized σ vary with the number of labeled data.

Table 6.

Clustering method | Ratio of labeled data (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||

EODPSO | Mean | 1.3347 | 1.3339 | 1.3327 | 1.3331 | 1.3325 | 1.3323 | 1.3319 | 1.3315 | 1.3312 | 1.3309 |

Std | 0.0014 | 0.0017 | 0.0035 | 0.0024 | 0.0022 | 0.0019 | 0.0015 | 0.0023 | 0.0032 | 0.0021 | |

PSO | Mean | 2.1367 | 2.1362 | 2.1355 | 2.1348 | 2.1343 | 2.1341 | 2.1335 | 2.1332 | 2.1329 | 2.1324 |

Std | 0.0053 | 0.0059 | 0.0042 | 0.0035 | 0.0027 | 0.0031 | 0.0022 | 0.0059 | 0.0026 | 0.0034 | |

DEPSO | Mean | 1.9621 | 1.9615 | 1.9576 | 1.9497 | 1.9471 | 1.9425 | 1.9417 | 1.9415 | 1.9411 | 1.9408 |

Std | 0.0035 | 0.0024 | 0.0035 | 0.0047 | 0.0024 | 0.0039 | 0.0026 | 0.0025 | 0.0058 | 0.0047 | |

CPSO | Mean | 1.7553 | 1.7544 | 1.7541 | 1.7537 | 1.7532 | 1.7524 | 1.7519 | 1.7514 | 1.7512 | 1.7508 |

Std | 0.0032 | 0.0019 | 0.0042 | 0.0023 | 0.0029 | 0.0024 | 0.0067 | 0.0053 | 0.0022 | 0.0032 | |

CAPSO | Mean | 1.6455 | 1.6349 | 1.6338 | 1.6317 | 1.6234 | 1.6225 | 1.6221 | 1.6319 | 1.6217 | 1.6216 |

Std | 0.0027 | 0.0024 | 0.0073 | 0.0062 | 0.0033 | 0.0025 | 0.0023 | 0.0036 | 0.0054 | 0.0043 | |

EPSO | Mean | 1.4718 | 1.4632 | 1.4621 | 1.4583 | 1.4565 | 1.4539 | 1.4533 | 1.4527 | 1.4523 | 1.4521 |

Std | 0.0018 | 0.0047 | 0.0024 | 0.0068 | 0.0032 | 0.0051 | 0.0039 | 0.0028 | 0.0037 | 0.0026 | |

GA | Mean | 2.4657 | 2.4633 | 2.4601 | 2.4575 | 2.4557 | 2.4538 | 2.4527 | 2.4518 | 2.4515 | 2.4513 |

Std | 0.0021 | 0.0019 | 0.0035 | 0.0032 | 0.0058 | 0.0042 | 0.0039 | 0.0026 | 0.0027 | 0.0034 | |

SA | Mean | 2.2796 | 2.2771 | 2.2753 | 2.2695 | 2.2649 | 2.2573 | 2.2569 | 2.2557 | 2.2553 | 2.2551 |

Std | 0.0034 | 0.0057 | 0.0051 | 0.0048 | 0.0052 | 0.0026 | 0.0029 | 0.0035 | 0.0031 | 0.0027 |

The best results are highlighted in bold.

Table 7.

Clustering method | Ratio of labeled data (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||

EODPSO | Mean | 46.23 | 47.35 | 47.41 | 48.34 | 52.2 | 50.13 | 51.00 | 50.67 | 51.00 | 51.00 |

Std | 0.58 | 0.41 | 0.26 | 0.35 | 0.27 | 0.35 | 0.00 | 0.23 | 0.00 | 0.00 | |

PSO | Mean | 37.63 | 40.51 | 39.32 | 41.39 | 42.42 | 41.36 | 44.2 | 45.36 | 46.76 | 45.58 |

Std | 0.95 | 0.59 | 0.35 | 0.52 | 0.45 | 0.86 | 0.39 | 0.27 | 0.69 | 0.34 | |

DEPSO | Mean | 39.35 | 41.16 | 41.26 | 43.72 | 41.32 | 44.75 | 43.32 | 46.63 | 46.57 | 48.55 |

Std | 0.74 | 0.52 | 0.29 | 0.37 | 0.67 | 0.47 | 0.42 | 0.52 | 0.47 | 0.49 | |

CPSO | Mean | 40.5 | 41.37 | 43.39 | 42.05 | 43.24 | 45.17 | 47.35 | 47.26 | 45.82 | 52.14 |

Std | 0.67 | 0.55 | 0.34 | 0.41 | 0.84 | 0.38 | 0.68 | 0.49 | 0.58 | 0.68 | |

CAPSO | Mean | 43.15 | 45.84 | 42.35 | 45.41 | 47.19 | 47.63 | 48.29 | 53.74 | 53.06 | 49.3 |

Std | 0.54 | 0.48 | 0.38 | 0.56 | 0.33 | 0.46 | 0.35 | 0.38 | 0.31 | 0.36 | |

EPSO | Mean | 43.35 | 41.24 | 45.52 | 56.42 | 46.36 | 47.52 | 53.29 | 49.41 | 47.37 | 52.76 |

Std | 0.71 | 0.43 | 0.27 | 0.29 | 0.41 | 0.33 | 0.51 | 0.68 | 0.55 | 0.25 | |

GA | Mean | 35.96 | 31.18 | 32.26 | 38.69 | 40.82 | 34.4 | 34.25 | 35.57 | 33.68 | 38.36 |

Std | 0.89 | 0.75 | 0.45 | 0.39 | 0.32 | 0.59 | 0.73 | 0.54 | 0.59 | 0.73 | |

SA | Mean | 37.05 | 37.36 | 35.7 | 36.96 | 38.54 | 35.18 | 36.59 | 37.42 | 37.68 | 39.32 |

Std | 0.86 | 0.71 | 0.44 | 0.64 | 0.57 | 0.36 | 0.68 | 0.97 | 0.62 | 0.58 |

The best results are highlighted in bold.

Table 7 shows the comparison of the numbers of clusters. In our experiments, the true number of clusters in MUIP data collection is 51. It is observed that with the increase of labeled data, the numbers of clusters of all clustering methods tend to the true number of clusters. Especially, the obtained numbers of clusters from EODPSO clustering method are the most close to the true number of clusters. Moreover, EODPSO clustering method shows better stability than other clustering methods.

The results of the experiments mentioned above indicate that the overall performance of SSKFCM clustering method increases with an increasing number of labeled data and the influence of SSKFCM clustering method decreases. Thus, in order to compare the results of initial cluster centers, we randomly select 10% data as labeled data and the rest as unlabeled data in our experiments. Fig. 8 shows the comparison of initial cluster centers computed using different optimization algorithms in the framework of MUIP clustering. Two optimal initial cluster centers are generated by different optimization algorithms. We find that the optimal initial cluster centers from EODPSO clustering method are the most close to the true cluster centers compared to other clustering methods.

From Tables 6, 7 and Fig. 8, we find that EODPSO algorithm produces the best results in terms of optimization of clustering parameters, compared to other optimization algorithms. This is because, first, EODPSO algorithm enhances the local search ability and global search ability through CA and the new population search strategy. Second, COL is used to initialize the population of PSO, which avoids the randomness of the initial solutions and makes EODPSO algorithm more stable than other optimization algorithms.

To further analyze the performance of EODPSO algorithm in the framework of MUIP clustering, we compare the convergences of EODPSO, PSO, DEPSO, CPSO, CAPSO, EPSO, GA, and SA clustering methods in the second experiment. According to the analysis of the first experiment, we concentrate on convergence of these clustering methods with a small number of labeled data. Thus, we randomly select 10% labeled data and 90% unlabeled data, and all clustering methods are run 20 times independently on MUIP data. The mean and standard deviation of optimal values of the objective function are shown in Table 8, where the best results are highlighted in bold. It can be seen from Table 8 that EODPSO clustering method produces the best results in finding global optimum and has the best stability, compared to other clustering methods. It is quite understandable because EODPSO algorithm has better global search ability to optimize clustering parameters.

Table 8.

Clustering | EODPSO | PSO | DEPSO | CPSO | CAPSO | EPSO | GA | SA |
---|---|---|---|---|---|---|---|---|

Mean | 4.6980 | 5.0784 | 4.9578 | 4.9156 | 4.8455 | 4.7566 | 5.2536 | 5.3352 |

Std | 0.0043 | 0.0074 | 0.0066 | 0.0054 | 0.0059 | 0.0046 | 0.0072 | 0.0081 |

The convergence processes of all methods are shown in Fig. 9. Obviously, EODPSO clustering method achieves the global optimal solution at around 40 iterations. By contrast, other clustering methods do not achieve global optimal solutions within 40 iterations. It indicates that the convergence speed of EODPSO clustering method is faster than other methods. The best fitness value is 4.698. In EODPSO algorithm, we use a hybrid strategy to avoid local minima and accelerate convergence speed. Therefore, when falling into local optimum, only EODPSO clustering method can jump out of local optima to continue searching for global optimum.

Through the above comparison, it can be concluded that the proposed clustering method has the best overall performance in terms of clustering effect and convergence. This is because the EODPSO clustering method has the following advantages over other clustering:

(1) Compared to other clustering methods, SSKFCM clustering method is more suitable to cluster MUIP data.

(2) In the framework of MUIP clustering, EODPSO algorithm performs better than PSO, DEPSO, CPSO, CAPSO, EPSO, GA, and SA clustering methods when optimizing the clustering parameters.

In this paper, we propose a MUIP clustering method for the purpose of MUIP data analysis. By studying the advantages and drawbacks of SSKFCM clustering, we apply SSKFCM clustering to cluster MUIP data and utilize PSO algorithm to optimize SSKFCM clustering parameters. Motivated by the recent research on the variant PSO algorithm, we present a hybrid PSO algorithm, namely EODPSO, for the purpose of improving PSO algorithm. In EODPSO algorithm, COL is used to generate the initial population and CA is employed to enhance local search ability. Additionally, to further improve global search ability, EODPSO algorithm makes two improvements. Firstly, an improved intuitionistic fuzzy entropy measure is proposed to accurately measure the diversity of particle swarm. Secondly, a new population search strategy by combing OL and DE is proposed to strengthen the global convergence. The experiments compare the proposed clustering method with other relevant clustering methods on the MUIP data set. The results show that our clustering method achieves better results than other clustering methods in terms of clustering effect and convergence.

In the future work, in order to further improve the performance of our clustering method, we will collect more MUIP data to enrich the data set and strengthen the theoretical analysis of the relationship between labeled and unlabeled. We will use other evaluation metrics to evaluate the performance of our clustering method from different points. Moreover, we will investigate existing retrieval methods and intend to incorporate our clustering method into MUIP selection.

This work is supported by the National Natural Science Foundation of China (No. 61272286), the Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 201261011 10006), Shaanxi Science and Technology Project (No. 2016GY-123), and the Science Research Foundation of Northwest University (No. 15NW31).

He is an associate professor at the Xinhua College, Ningxia University, China. He is currently working towards his PhD degree in the School of Information Science and Technology from Northwest University, China. His research interests include human-computer interaction and user interface engineering.

He is currently working towards his PhD degree in the School of Information Science and Technology from Northwest University, China. His research interests include human-computer interaction and user interface engineering. Mobile User Interface Pattern Clustering Using Improved Semi-Supervised Kernel Fuzzy Clustering Method

She received her B.S. degree in Computer Application from Northwest University in 2001, received her M.S. degree in Computer Application Technology from Xidian University in 2005. She now is studying for a PhD in Northwest University. Her main research interests include wireless sensor network and human-computer interaction.

He is a lecturer at the School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, China. He is currently working towards his PhD degree in the School of Information Science and Technology from Northwest University, China. His research interests include human-computer interaction and user interface engineering.

- 1 T. Wetchakorn, N. Prompoon, "Method for mobile user interface design patterns creation for iOS platform," in
*Proceedings of 2015 12th International Joint Conference on Computer Science and Software Engineering (JCSSE)*, Songkhla, Thailand, 2015;pp. 150-155. custom:[[[-]]] - 2 T. D. Nguyen, J. Vanderdonckt, "User interface master detail pattern on Android," in
*Proceedings of the 4th ACM SIGCHI Symposium on Engineering Interactive Computing Systems*, Copenhagen, Denmark, 2012;pp. 299-304. custom:[[[-]]] - 3 G. Thongmool, M. Phankokkruad, "Analysis of interaction user interface patterns and usability study in computer assisted instruction for Tablet PC," in
*Proceedings of 2014 IEEE International Conference on Control System*, Computing and Engineering (ICCSCE), Batu Ferringhi, Malaysia, 2014;pp. 472-477. custom:[[[-]]] - 4 P. Gomes, F. C. Pereira, P. Paiva, N. Seco, P. Carreiro, J. L. Ferreira, C. Bento,
*in Advances in Case-Based Reasoning*, Heidelberg: Springer, pp. 534-548, 2002.custom:[[[-]]] - 5 L. Pavlic, V. Podgorelec, M. Hericko, "A question-based design pattern advisement approach,"
*Computer Science and Information Systems*, vol. 11, no. 2, pp. 645-664, 2014.doi:[[[10.2298/CSIS130824025P]]] - 6 S. M. H. Hasheminejad, S. Jalili, "Design patterns selection: an automatic two-phase method,"
*Journal of Systems and Software*, vol. 85, no. 2, pp. 408-424, 2012.doi:[[[10.1016/j.jss.2011.08.031]]] - 7 M. Ester, H. P. Kriegel, J. Sander, X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in
*Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining*, Portland, OR, 1996;pp. 226-231. custom:[[[-]]] - 8 G. Andrade, G. Ramos, D. Madeira, R. Sachetto, R. Ferreira, L. Rocha, "G-DBSCAN: a GPU accelerated algorithm for density-based clustering,"
*Procedia Computer Science*, vol. 18, pp. 369-378, 2013.doi:[[[10.1016/j.procs.2013.05.200]]] - 9 T. Zhang, R. Ramakrishnan, M. Livny, "BIRCH: an efficient data clustering method for very large databases," in
*Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data*, Montreal, Canada, 1996;pp. 103-114. custom:[[[-]]] - 10 R. T. Ng, J. Han, "CLARANS: a method for clustering objects for spatial data mining,"
*IEEE Transactions on Knowledge & Data Engineering*, vol. 14, no. 5, pp. 1003-1016, 2002.doi:[[[10.1109/TKDE.2002.1033770]]] - 11 W. Jia, Q. Hua, M. Zhang, R. Chen, X. Ji, "User interface pattern language based on category theory,"
*Journal of Computer Aided Design & Computer Graphics*, vol. 29, no. 1, pp. 79-89, 2017.custom:[[[-]]] - 12 D. Q. Zhang, S. C. Chen, "Clustering incomplete data using kernel-based fuzzy c-means algorithm,"
*Neural Processing Letters*, vol. 18, no. 3, pp. 155-162, 2003.doi:[[[10.1023/B:NEPL.0000011135.19145.1b]]] - 13 D. Zhang, K. Tan, S. Chen,
*in Neural Information Processing*, Heidelberg: Springer, pp. 1229-1234, 2004.custom:[[[-]]] - 14 D. E. Goldberg,
*Genetic Algorithms in Search, Optimization and Machine Learning*, MA: Addison-Wesley, Reading, 1989.custom:[[[-]]] - 15 D. Bertsimas, J. Tsitsiklis,
*Statistical Science, vol*, 8, no. 1, pp. 10-15, 1993.custom:[[[-]]] - 16 J. Kennedy, R. Eberhart, "Particle swarm optimization," in
*Proceedings of the IEEE International Conference on Neural Networks (ICNN)*, Perth, Australia, 1995;pp. 1942-1948. custom:[[[-]]] - 17 S. Yu, Y. M. Wei, J. Fan, X. Zhang, K. Wang, "Exploring the regional characteristics of inter-provincial CO2 emissions in China: an improved fuzzy clustering analysis based on particle swarm optimization,"
*Applied Energy*, vol. 92, pp. 552-562, 2012.doi:[[[10.1016/j.apenergy.2011.11.068]]] - 18 A. Mekhmoukh, K. Mokrani, "Improved fuzzy c-means based particle swarm optimization (PSO) initialization and outlier rejection with level set methods for MR brain image segmentation,"
*Computer Methods and Programs in Biomedicine*, vol. 122, no. 2, pp. 266-281, 2015.doi:[[[10.1016/j.cmpb.2015.08.001]]] - 19 Y. Liu, F. Liu, T. Hou, X. Zhang, "Kernel-based fuzzy C-means clustering method based on parameter optimization,"
*Journal of Jilin University (Engineering and Technology Edition)*, vol. 46, no. 1, pp. 246-251, 2016.doi:[[[10.13229/j.cnki.jdxbgxb201601037]]] - 20 A. Sedki, D. Ouazar, "Hybrid particle swarm optimization and differential evolution for optimal design of water distribution systems,"
*Advanced Engineering Informatics*, vol. 26, no. 3, pp. 582-591, 2012.doi:[[[10.1016/j.aei.2012.03.007]]] - 21 R. Storn, K. Price, "Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces,"
*Journal of Global Optimization*, vol. 11, no. 4, pp. 341-359, 1997.custom:[[[-]]] - 22 F. Liu, Z. Zhou, "A new data classification method based on chaotic particle swarm optimization and least square-support vector machine,"
*Chemometrics and Intelligent Laboratory Systems*, vol. 147, pp. 147-156, 2015.doi:[[[10.1016/j.chemolab.2015.08.015]]] - 23 H. J. Lu, H. M. Zhang, L. H. Ma, "A new optimization algorithm based on chaos,"
*Journal of Zhejiang University-Science A*, vol. 7, no. 4, pp. 539-542, 2006.doi:[[[10.1631/jzus.2006.a0539]]] - 24 Y. Dai, L. Liu, Y. Li, J. Song, "An improved particle swarm optimisation based on cellular automata,"
*International Journal of Computing Science and Mathematics*, vol. 5, no. 1, pp. 94-106, 2014.doi:[[[10.1504/IJCSM.2014.059385]]] - 25 J. L. Schiff,
*Cellular Automata: A Discrete View of the World*, NJ: John Wiley & Sons, Hoboken, 2007.custom:[[[-]]] - 26 Y. Z. Wang, Y. J. Lei, L. Zhou, R. L. Li, "Intuitionistic fuzzy discrete particle swarm algorithm,"
*Control and Decision*, vol. 27, no. 11, pp. 1735-1739, 2012.custom:[[[-]]] - 27 Y. Wang, Y. J. Lei, "A technique for constructing intuitionistic fuzzy entropy,"
*Control and Decisionol. 22*, no. 12, pp. 1390-1394, 2007.custom:[[[-]]] - 28 H. R. Tizhoosh, "Opposition-based learning: a new scheme for machine intelligence," in
*Proceedings of International Conference on Computational Intelligence for Modelling*, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06), Vienna, Austria, 2005;pp. 695-701. custom:[[[-]]] - 29 J. C. Bezdek,
*Pattern Recognition with Fuzzy Objective Function Algorithms*, NY: Plenum, New York, 1981.custom:[[[-]]] - 30 K. T. Atanassov, "Intuitionistic fuzzy sets,"
*Fuzzy Sets and Systems*, vol. 20, no. 1, pp. 87-96, 1986.doi:[[[10.1007/978-3-7908-1870-3_1]]] - 31 W. Zeng, H. Li, "Relationship between similarity measure and entropy of interval valued fuzzy sets,"
*Fuzzy Sets and Systems*, vol. 157, no. 11, pp. 1477-1484, 2006.doi:[[[10.1016/j.fss.2005.11.020]]] - 32 J. Li, G. Deng, H. Li, W. Zeng, "The relationship between similarity measure and entropy of intuitionistic fuzzy sets,"
*Information Sciences*, vol. 188, pp. 314-321, 2012.doi:[[[10.1016/j.ins.2011.11.021]]] - 33 R. Verma, B. D. Sharma, "Exponential entropy on intuitionistic fuzzy sets,"
*Kybernetika*, vol. 49, no. 1, pp. 114-127, 2013.custom:[[[-]]] - 34 C. P. Wei, Z. H. Gao, T. T. Guo, "An intuitionistic fuzzy entropy measure based on trigonometric function,"
*Control and Decision*, vol. 27, no. 4, pp. 571-574, 2012.custom:[[[-]]] - 35 M. M. Gao, T. Sun, J. J. Zhu, "An revised axiomatic definition and structural formula of intuitionistic fuzzy entropy,"
*Control and Decision*, vol. 29, no. 3, pp. 470-474, 2014.custom:[[[-]]] - 36 M. F. Liu, H. P. Ren, "A study of multi-attribute decision making based on a new intuitionistic fuzzy entropy measure,"
*System Engineering Theory and Practice*, vol. 35, no. 11, pp. 2909-2916, 2015.custom:[[[-]]] - 37 E. Szmidt, J. Kacprzyk, "Entropy for intuitionistic fuzzy sets,"
*Fuzzy Sets and Systems*, vol. 118, no. 3, pp. 467-477, 2001.doi:[[[10.1016/S0165-0114(98)00402-3]]] - 38 K. Guo, Q. Song, "On the entropy for Atanassov's intuitionistic fuzzy sets: an interpretation from the perspective of amount of knowledge,"
*Applied Soft Computing*, vol. 24, pp. 328-340, 2014.doi:[[[10.1016/j.asoc.2014.07.006]]] - 39 W. F. Gao, S. Y. Liu, "A modified artificial bee colony algorithm,"
*Computers & Operations Research*, vol. 39, no. 3, pp. 687-697, 2012.doi:[[[10.1016/j.cor.2011.06.007]]] - 40 Y. Zhou, L. Bao, C. P. Chen, "A new 1D chaotic system for image encryption,"
*Signal Processing*, vol. 97, pp. 172-182, 2014.doi:[[[10.1016/j.sigpro.2013.10.034]]] - 41 D. L. Davies, D. W. Bouldin, "A cluster separation measure,"
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 1, no. 2, pp. 224-227, 1979.doi:[[[10.1109/TPAMI.1979.4766909]]] - 42 M. R. Rezaee, B. P. Lelieveldt, J. H. Reiber, "A new cluster validity index for the fuzzy c-mean,"
*Pattern Recognition Letters*, vol. 19, no. 3-4, pp. 237-246, 1998.doi:[[[10.1016/S0167-8655(97)00168-2]]] - 43 T. Neil,
*Mobile Design Pattern Gallery: UI Patterns for Smartphone Apps*, CA: O'Reilly, Sebastopol, 2012.custom:[[[-]]] - 44 S. Hoober, E. Berkman,
*Designing mobile interfaces*, CA: O'Reill, Sebastopol, 2012.custom:[[[-]]] - 45 W. Y. Chen, Y. Song, H. Bai, C. J. Lin, E. Y. Chang, "Parallel spectral clustering in distributed systems,"
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 33, no. 3, pp. 568-586, 2010.doi:[[[10.1109/TPAMI.2010.88]]] - 46 L. Hubert, P. Arabie, "Comparing partitions,"
*Journal of Classification*, vol. 2, no. 1, pp. 193-218, 1985.doi:[[[10.1007/BF01908075]]] - 47 A. Strehl, J. Ghosh, "Cluster ensembles: a knowledge reuse framework for combining multiple partitions,"
*Journal of Machine Learning Research*, vol. 3, pp. 583-617, 2002.custom:[[[-]]] - 48 C. H. Papadimitriou, K. Steiglitz,
*Combinatorial Optimization: Algorithms and Complexity*, NJ: Prentice-Hall, Englewood Cliffs, 1982.custom:[[[-]]]