Mobile User Interface Pattern Clustering Using Improved Semi-Supervised Kernel Fuzzy Clustering Method

Wei Jia* **; Qingyi Hua*; Minjun Zhang*; Rui Chen*; Xiang Ji*; Bo Wang* ***

doi:10.3745/JIPS.04.0131

ISSN: 2092-805X

Volume 15, No 4 (2019), pp. 986 - 1016

10.3745/JIPS.04.0131

Wei Jia* ** , Qingyi Hua* , Minjun Zhang* , Rui Chen* , Xiang Ji* and Bo Wang* ***

Mobile User Interface Pattern Clustering Using Improved Semi-Supervised Kernel Fuzzy Clustering Method

Abstract: Mobile user interface pattern (MUIP) is a kind of structured representation of interaction design knowledge. Several studies have suggested that MUIPs are a proven solution for recurring mobile interface design problems. To facilitate MUIP selection, an effective clustering method is required to discover hidden knowledge of pattern data set. In this paper, we employ the semi-supervised kernel fuzzy c-means clustering (SSKFCM) method to cluster MUIP data. In order to improve the performance of clustering, clustering parameters are optimized by utilizing the global optimization capability of particle swarm optimization (PSO) algorithm. Since the PSO algorithm is easily trapped in local optima, a novel PSO algorithm is presented in this paper. It combines an improved intuitionistic fuzzy entropy measure and a new population search strategy to enhance the population search capability and accelerate the convergence speed. Experimental results show the effectiveness and superiority of the proposed clustering method.

Keywords: Intuitionistic Fuzzy Entropy Measure , Mobile User Interface Pattern , Particle Swarm Optimization , Population Search Strategy , Semi-Supervised Kernel Fuzzy C-Means

1. Introduction

A mobile user interface pattern (MUIP) is a proven solution to a common recurring mobile interface development problem. Meaningful design information such as design problem, functional goal and solution, included in the MUIP, can increase reusability, efficiency, quality and modularization in interface development. It is more useful for developers to design high usability interface with short time and share their knowledge with others [1-3]. In recent years, many attempts have been made toward the pattern selection. Gomes et al. [4] proposed a case-based reasoning (CBR) method for selecting patterns. In [5], the ontology-based technique was employed to represent knowledge on patterns and a set of question-answer pairs was used for reasoning suitable patterns. Hasheminejad and Jalili [6] presented a pattern selection method based on the text classification approach. But to our knowledge, little work has been made for the MUIP data analysis. In fact, data analysis plays a significant role in determining the applicability of a MUIP, because it provides useful information for pattern selection. Especially when there is a large supply of labeled data, unlabeled data and overlapping functional goals between MUIPs, data analysis can effectively extract hidden knowledge from the MUIP data set. Thus an effective data analysis method is needed that facilitates the MUIP selection in the design phase.

Semi-supervised clustering, as an important data analysis method, is widely used in many applications to solve real-world problems. It uses a few labeled samples to guide the clustering process. Various methods have been proposed in the semi-supervised clustering area. However, some clustering methods such as density-based spatial clustering of applications with noise (DBSCAN) [7] and graphics processing unit accelerated DBSCAN (GDBSCAN) [8] fail to discover overlapping clusters because they belong to non-overlapping clustering. Some other clustering methods such as balanced iterative reducing and clustering using hierarchies (BIRCH) [9] and clustering algorithm based on randomized search (CLARANS) [10] only work well for convex or spherical clusters of uniform size.

Fig. 1.

Several overview plus data patterns.

Fig. 2.

MUIP clustering result.

In general, most MUIPs contain multiple sub functional goals that belong to different clusters. For instance, Fig. 1 shows several overview plus data patterns that use charts to summarize the most important information, and a table below with the detailed data. Two sub-functional goals are included in the overview plus data pattern. One is to browse the details of data and the other is to provide visual data. When information browsing and visual information presentation are selected as cluster centers, we can see from Fig. 2 that cluster 1 and cluster 2 cannot be reasonably distinguished due to overlapping area between clusters. Furthermore, researches on the functional goals of MUIPs show that there are a large number of non-linear relationships between MUIPs [11]. In this case, MUIP data are non-linear and the distribution of MUIP data is irregular in space. Due to the above reasons, MUIP data are not linearly separable in input space and it is difficult for the above clustering methods to identify the arbitrary shaped clusters.

Recently, kernel fuzzy c-means (KFCM) clustering [12] has been received much attention in the semisupervised clustering area [13]. The kernel function is utilized to perform a non-linear mapping from the original feature space to a high dimensional feature space. Due to the idea of kernel substitution, semisupervised kernel fuzzy c-means (SSKFCM) clustering can identify non-spherical clusters and has a powerful ability to deal with non-linear data. In addition, fuzzy logic is used in SSKFCM clustering to determine the association of a data point to a cluster with different membership degree. Thus, if a MUIP simultaneously lies in more than one cluster, SSKFCM clustering can detect overlapping clusters. With the advantages above, the application of SSKFCM clustering to MUIP clustering seems natural and appropriate.

However, SSKFCM clustering is sensitive to the kernel parameter, the initial cluster centers and the number of clusters. Due to these defects, it is easily trapped in local minima. Many researchers treat this problem as an optimization problem and metaheuristic optimization algorithms such as genetic algorithm (GA) [14], simulated annealing (SA) [15], and particle swarm optimization (PSO) [16] are used to improve the clustering performance. Compared with other optimization algorithms, the PSO algorithm is more simple and easy to implement. Therefore, we pay more attention to the PSO algorithm in this study. In recent years, many PSO-based clustering methods have been proposed. Yu et al. [17] used the PSO algorithm to obtain the global optimal number of clusters. Mekhmoukh and Mokrani [18] improved the performance of clustering by utilizing the PSO algorithm to initialize the cluster centers. Liu et al. [19] presented an improved KFCM clustering method where, in the original KFCM clustering method, they employed the PSO algorithm for kernel parameter optimization.

The major drawback of PSO algorithm is that it easily falls into local optima and has slow convergence speed. Many variant PSO algorithms have been proposed in order to improve the performance of the PSO algorithm. Sedki and Ouazar [20] proposed a hybrid algorithm based on differential evolution (DE) [21] and PSO algorithm (DEPSO) which integrated DE and PSO algorithm at specified interval of iterations. In [22], a hybrid of chaotic optimization algorithm [23] and PSO algorithm (CPSO) was proposed to enhance the global optimal solution of the PSO algorithm. In [24], the authors proposed a PSO algorithm combined cellular automata (CA) [25] with PSO algorithm (CAPSO) to improve local search ability and global optimization ability. Each particle was considered as a cell and updated its current status with neighbors’ states in the algorithm. Wang et al. [26] proposed an intuitionistic fuzzy entropy-based PSO algorithm (EPSO) which utilized intuitionistic fuzzy sets (IFSs) to describe particles and attempted to take advantage of intuitionistic fuzzy entropy [27] as an index to measure the state of the particle swarm. Their work had shown some preliminary results in improving the performance of the PSO algorithm through the entropy of IFSs. However, existing improved PSO algorithms still have some shortcomings in terms of measuring and maintaining the diversity of the population.

Inspired by the above discussions, we propose an improved SSKFCM clustering method for MUIP clustering. Firstly, a hybrid PSO algorithm combining intuitionistic fuzzy entropy, opposition-based learning (OL) [28], DE and PSO algorithm (EODPSO) is presented that aims to optimize clustering parameters. This work is motivated by the variant PSO algorithms. In EODPSO algorithm, an improved intuitionistic fuzzy entropy measure and a new population search strategy are proposed to obtain better optimization result. Then, we propose a MUIP clustering method and use the EODPSO algorithm to optimize the clustering parameters. Finally, experiments using MUIP data set are reported and the results are compared to other relevant clustering methods.

In summary, the contributions of our work are summarized as follows:

(1) In order to reasonably measure the diversity of particle swarm, we present an improved entropy measure of IFSs that based on geometric interpretation of IFSs.

(2) We propose an improved PSO algorithm to enhance the exploration of the PSO algorithm. Specifically, the improved entropy measure is used to measure the diversity of particle swarm and a new population search strategy is used to strengthen the global convergence.

(3) We propose a SSKFCM clustering method for MUIP clustering where kernel parameter, initialize clustering centers and number of clusters are optimized through the improved PSO algorithm.

The remainder of this paper is organized as follows: Section 2 briefly introduces the SSKFCM clustering method and the PSO algorithm. Section 3 presents the details of the proposed method. We use numerical experiments to verify the performance of the proposed method in Section 4. Finally, a conclusion and future works are given in Section 5.

2. Preliminaries

2.1 Semi-Supervised Kernel Fuzzy C-Means Clustering

Fuzzy c-means (FCM) clustering [29] is a typical clustering method which allows points to belong to two or more clusters with different membership values, ranging from[TeX:] $$[0,1] . \text { Let } Z=\left\{z_{1}, z_{2}, \cdots, z_{n}\right\},$$ where [TeX:] $$z_{i} \in \mathbb{R}^{D}$$ is a data set of size n in D-dimensional space. FCM clustering partitions this data set into e' subsets. The aim of the FCM clustering is to minimize the objective function which is the generalized form of the least-squared errors function:

(1)

[TeX:] $$J(U, B)=\sum_{i=1}^{n} \sum_{j=1}^{e^{\prime}} u_{i, j}^{m}\left\|z_{i}-b_{j}\right\|^{2}$$

where [TeX:] $$\left\|z_{i}-b_{j}\right\|$$ denotes the Euclidean distance between [TeX:] $$z_{i} \text { and } b_{j},$$ m is the fuzzy controlling parameter and is generally chosen as [TeX:] $$2, u_{i, j}$$ is the membership value of [TeX:] $$z_{i}$$ with the respect to cluster [TeX:] $$j \text { and } u_{i, j}$$ must satisfy the following conditions:

(1) The membership value ranges between zero and one, that is, [TeX:] $$u_{i, j} \in[0,1].$$

(2) The sum of the membership values for each data point must be one, that is,[TeX:] $$\sum_{j=1}^{e^{\prime}} u_{i, j}=1.$$

(3) The sum of the all membership values in a cluster must be smaller than the number of data n, that is, [TeX:] $$0<\sum_{i=1}^{n} u_{i, j}<n.$$

In KFCM clustering, a kernel function is used to transfer the original feature space to a kernel-induced distance, and the Euclidean distance of FCM is replaced with kernel-induced distance. The KFCM clustering minimizes the following objective:

(2)

[TeX:] $$J(U, B)=\sum_{i=1}^{n} \sum_{j=1}^{e^{\prime}} u_{i, j}^{m}\left\|\Phi\left(z_{i}\right)-\Phi\left(b_{j}\right)\right\|^{2}$$

where [TeX:] $$\Phi$$ is a non-linear function mapping [TeX:] $$z_{i}$$ from the input space Z to a new space F with higher or even infinite dimensions. The kernel-induced distance is calculated as:

(3)

[TeX:] $$\left\|\Phi\left(z_{i}\right)-\Phi\left(b_{j}\right)\right\|=K\left(z_{i}, z_{i}\right)-K\left(z_{i}, b_{j}\right)+K\left(b_{j}, b_{j}\right)$$

where [TeX:] $$K\left(z_{i}, b_{j}\right)$$ is the kernel function which is defined as the inner product in the new space F with: [TeX:] $$K\left(z_{i}, b_{j}\right)=\Phi\left(z_{i}\right)^{T} \Phi\left(b_{j}\right), \text { for } z_{i}, b_{j}$$ in input space Z. We can see that kernel function can be directly constructed in original input space without knowing the concrete form of [TeX:] $$\Phi.$$ Among the various kernel functions, Gaussian kernel is a typical kernel function which is expressed as follows:

(4)

[TeX:] $$K\left(z_{i}, b_{j}\right)=\exp \left(-\frac{\left\|z_{i}-b_{j}\right\|^{2}}{2 \sigma^{2}}\right)$$

where [TeX:] $$\sigma>0$$ is the parameter of Gaussian kernel.

For Gaussian kernel, we have [TeX:] $$K\left(z_{i}, z_{i}\right)=1.$$ Thus, it is obvious that objective function of KFCM could be defined as:

(5)

[TeX:] $$J(U, B)=\sum_{i=1}^{n} \sum_{j=1}^{e^{\prime}} u_{i, j}^{m}\left\|2-2 K\left(z_{i}, b_{j}\right)\right\|^{2}$$

By optimizing Eq. (5), the fuzzy partition matrix and cluster centers can be obtained as follows:

(6)

[TeX:] $$u_{i, j}=\frac{\left(1 /\left(1-K\left(z_{i}, b_{j}\right)\right)\right)^{1 /(m-1)}}{\sum_{k=1}^{e^{\prime}}\left(1 /\left(1-K\left(z_{i}, b_{k}\right)\right)\right)^{1 /(m-1)}}$$

(7)

[TeX:] $$b_{j}=\frac{\sum_{i=1}^{n} u_{i, j}^{m} K\left(z_{i}, b_{j}\right) x_{i}}{\sum_{i=1}^{n} u_{i, j}^{m} K\left(z_{i}, b_{j}\right)}$$

By integrating supervised and unsupervised information into the objective function of KFCM, SSKFCM clustering method expands KFCM clustering into semi-supervised domain. This method takes advantage of supervised information, such as labeled data or pair-wise constraints, to guide the clustering process. The labeled and unlabeled data can be denoted in a whole matrix form as follows:

(8)

[TeX:] $$Z=\left\{z_{1}^{l^{\prime}}, \cdots, z_{n_{r}}^{l^{\prime}} | z_{1}^{u^{\prime}}, \cdots, z_{n_{u}^{\prime}}^{u^{\prime}}\right\}=Z^{\prime} \cup Z^{u^{\prime}}$$

where the superscript [TeX:] $$l^{\prime} \text { and } u^{\prime}$$ are the labeled and unlabeled data respectively, [TeX:] $$n_{l^{\prime}} \text { and } n_{u^{\prime}}$$ are the number of the labeled and unlabeled data respectively. The total number of data is [TeX:] $$n_{l^{\prime}}+n_{u^{\prime}}.$$

A matrix of the fuzzy c-partition of Z in Eq.(8) is expressed as:

(9)

[TeX:] $$U=\left\{U^{l^{\prime}}=\left\{u_{i, j}^{l^{\prime}}\right\} | U^{u^{\prime}}=\left\{u_{i, j}^{u^{\prime}}\right\}\right\}$$

where the value of [TeX:] $$u_{i, j}^{l^{\prime}}$$ is known before and typically is set to 1 if the data [TeX:] $$z_{i}$$ is labeled with class j, and 0 otherwise.

The initial of set of cluster centers is obtained from [TeX:] $$U^{I^{\prime}}$$ as follows:

(10)

[TeX:] $$b_{j}^{0}=\frac{\sum_{i=1}^{n_{r}}\left(u_{i, j}^{l^{\prime}}\right)^{m} z_{i}^{l^{\prime}}}{\sum_{i=1}^{n_{l}}\left(u_{i, j}^{l^{\prime}}\right)^{m}}$$

Then, the membership [TeX:] $$u_{i j}^{u^{\prime}} \text { in } U^{u^{\prime}}$$ is calculated:

(11)

[TeX:] $$u_{i, j}^{u^{\prime}}=\frac{\left(1 /\left(1-K\left(z_{i}^{u^{\prime}}, b_{j}\right)\right)\right)^{1 /(m-1)}}{\sum_{k=1}^{e^{\prime}}\left(1 /\left(1-K\left(z_{i}^{u^{\prime}}, b_{k}\right)\right)\right)^{1 /(m-1)}}$$

Finally, the cluster centers are written:

(12)

[TeX:] $$b_{j}=\frac{\sum_{i=1}^{n_{l^{\prime}}}\left(\boldsymbol{u}_{i, j}^{l^{\prime}}\right)^{m}K\left(z_{i}^{l^{\prime}}, b_{j}\right) x_{i}^{l^{\prime}}+\sum_{i=1}^{n_{u^{\prime}}}\left(u_{i, j}^{u^{\prime}}\right)^{m}K\left(z_{i}^{u^{\prime}}, b_{j}\right) x_{i}^{u^{\prime}}}{\sum_{i=1}^{n_{l}}\left(u_{i, j}^{l^{\prime}}\right)^{m}K\left(z_{i}^{l^{\prime}}, b_{j}\right)+\sum_{i=1}^{n_{u^{\prime}}}\left(u_{i, j}^{u^{\prime}}\right)^{m}K\left(z_{i}^{u^{\prime}}, b_{j}\right)}$$

2.2 Particle Swarm Optimization

The aim of PSO algorithm is to solve optimization problems and avoid trapping into local minima. In PSO algorithm, a set of potential solutions is represented as particles that fly in D-dimensional space. For a search problem in a D-dimensional space, [TeX:] $$p_{i}(t)$$ is the best position that the [TeX:] $$i^{\text {th }}$$ particle has ever visited, [TeX:] $$p_{g}(t)$$ is the best position that the whole population have visited. The velocity of a particle is updated:

(13)

[TeX:] $$v_{i}(t+1)=w v_{i}(t)+c_{1} r_{1}\left(p_{i}(t)-y_{i}(t)\right)+c_{2} r_{2}\left(p_{g}(t)-y_{i}(t)\right)$$

where [TeX:] $$y_{i}(t) \text { and } v_{i}(t)$$ are the position and the velocity of particle at iteration , respectively, W is inertia weight. It represents the inheritance of the next flight velocity from the current flight velocity. [TeX:] $$c_{1} \text { and } c_{2}$$ are acceleration coefficients. They represent particle’s self-learning capability and capability of learning from excellent companion respectively. In general, [TeX:] $$c_{1}=c_{2}=2 \cdot r_{1} \text { and } r_{2}$$ are two independently uniformly distributed random variables with range [0,1].

The position of a particle is updated as follows:

(14)

[TeX:] $$y_{i}(t+1)=y_{i}(t)+v_{i}(t)$$

Entropy of an IFS describes the fuzziness degree of the IFS. Atanassov [30] generalized the concept of fuzzy sets and defined the IFS as follows:

DEFINITION 1: An IFS A in a finite set [TeX:] $$X=\left\{x_{i} | i=1,2, \cdots, n\right\}$$ is given by:

(15)

[TeX:] $$A=\left\{\left\langle x, \mu_{A}(x), \gamma_{A}(x)\right\rangle | x \in X\right\}$$

where [TeX:] $$\mu_{A} : X \rightarrow[0,1]$$ is the membership function of A and [TeX:] $$\gamma_{A} : X \rightarrow[0,1]$$ s the non-membership function of A, with the condition [TeX:] $$0 \leq \mu_{A}(x)+\gamma_{A}(x) \leq 1, \forall x \in X.$$ The numbers [TeX:] $$\mu_{A}(x) \text { and } \gamma_{A}(x)$$ represent the membership degree and non-membership degree of the element x to the set A, respectively.

DEFINITION 2: For each IFS A in X, if

(16)

[TeX:] $$\pi_{A}(x)=1-\mu_{A}(x)-\gamma_{A}(x)$$

then [TeX:] $$\pi_{A}(x)$$ is called intuitionistic index of x in A. It is a hesitancy degree of x to A , and [TeX:] $$0 \leq \pi_{A}(x) \leq 1$$ for [TeX:] $$\forall x \in X.$$

DEFINITION 3: The complementary set [TeX:] $$A^{c} \text { of } A$$ is defined as:

(17)

[TeX:] $$A^{c}=\left\{\left\langle x, \gamma_{A}(x), \mu_{A}(x)\right\rangle | x \in X\right\}$$

Wang et al. [26] utilized IFS to describe particles and taken the entropy of the IFS as the measurement of the state of particle swarm and the mutation parameter. Its main idea is that there is an effective radius [TeX:] $$R_{i}^{\prime}$$ for the best position [TeX:] $$p_{i}(t)$$ of each particle [TeX:] $$x_{i}$$ at iteration . If a particle is in the range of [TeX:] $$R_{i}^{\prime},$$ then it gather near [TeX:] $$p_{i}(t),$$ otherwise, it does not gather near [TeX:] $$p_{i}(t)$$ and is considered as an isolated point. The related definition of entropy of IFSs in PSO algorithm is defined as follows:

DEFINITION 4: At iteration , the best position [TeX:] $$p_{i}(t)$$ of each particle is selected as the aggregation point. [TeX:] $$F_{k, i}^{\prime}$$ is the distance function which calculates the distance from each particle to aggregation point [TeX:] $$p_{i}(t).$$ The effective radius of [TeX:] $$p_{i}(t) \text { is } R_{i}^{\prime}=m^{\prime} \max \left(F_{k, i}^{\prime}\right),$$ where m' is random value in range [TeX:] $$[0,0.5].$$

DEFINITION 5: Let A' Let [TeX:] $$X=\left\{x_{i} | i=1,2, \cdots, n\right\}.$$ At iteration t [TeX:] $$t, \text { if } F_{k, i}^{\prime} \leq R_{i}^{\prime},$$ then the particle k belongs to the effective radius of [TeX:] $$p_{i}(t) \text { and } T_{k, i}^{\prime}=T_{k, i}^{\prime}+1 . \text { If } F_{k, i}^{\prime}>R_{i}^{\prime},$$ then the particle k is an isolated point and [TeX:] $$T_{x_{i}}^{\prime \prime}=T_{x_{i}}^{\prime \prime}+1.$$ Then membership degree [TeX:] $$\mu_{A i j}^{t}\left(x_{i}\right)=\frac{T_{k, i}^{\prime}}{n},$$ intuitionistic index [TeX:] $$\pi_{A i j}^{t}\left(x_{i}\right)=\frac{T_{x}^{\prime \prime}}{n}$$ and non-membership degree [TeX:] $$\gamma_{A_{k, i}^{\dagger}}^{t}\left(x_{i}\right)=1-\mu_{A_{k, i}^{t}}^{t}\left(x_{i}\right) \quad-\pi_{A_{k, i}^{t}}^{t}\left(x_{i}\right),$$ where [TeX:] $$\mu_{A'_{i}, i}^{\prime}\left(x_{i}\right)$$ s the membership degree of [TeX:] $$p_{i}(t)$$ at iteration t. [TeX:] $$\pi_{A_{k, i}^{\prime}}^{t}\left(x_{i}\right) \text { is }$$ is the intuitionistic index of an isolated point. [TeX:] $$\gamma_{A_{k, i}^{\prime}}^{t}\left(x_{i}\right)$$ is the non-membership degree of [TeX:] $$p_{i}(t)$$ at iteration t.

3. The Proposed Method for Mobile User Interface Pattern Clustering

3.1 Improved Intuitionistic Fuzzy Entropy Measure

Many recent studies have been done on intuitionistic fuzzy entropy measure from different points of view. Zeng and Li [31] discussed the relationship between similarity measure and entropy of IFSs. Their study showed that similarity measures and entropies of IFSs can be transformed to each other. Based on this point of view, they defined the entropy measure as:

(18)

[TeX:] $$E_{z e n g}(A)=\frac{1}{n} \sum_{i=1}^{n}\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|$$

Li et al. [32] investigated the systematic transformation of the entropy into the similarity measure for IFSs and discussed the sufficient conditions of this transformation. Then, the entropy measure was defined as:

(19)

[TeX:] $$E_{L i}(A)=1-\frac{1}{2 n} \sum_{i=1}^{n}\left(\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|^{3}+\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|\right)$$

Verma and Sharma [33] proposed an exponential entropy of IFSs based on the concept of exponential fuzzy entropy. The entropy measure was given by:

(20)

[TeX:] $$E_{V e r m a}(A)=\frac{1}{n(\sqrt{e}-1)} \sum_{i=1}^{n}\left[\theta e^{1-\theta}+(1-\theta) e^{\theta}-1\right]$$

in which [TeX:] $$\theta=\left(\frac{\mu_{A}\left(x_{i}\right)+1-\gamma_{A}\left(x_{i}\right)}{2}\right).$$

In [27], an entropy measure for IFSs based on geometric interpretation was shown as:

(21)

[TeX:] $$E_{\text {Warg }}(A)=\frac{1}{n} \sum_{i=1}^{n} \frac{\min \left(\mu\left(x_{i}\right), \gamma\left(x_{i}\right)\right)+\pi_{A}\left(x_{i}\right)}{\max \left(\mu\left(x_{i}\right), \gamma\left(x_{i}\right)\right)+\pi_{A}\left(x_{i}\right)}$$

In [34], an entropy measure based on a trigonometric function was given by:

(22)

[TeX:] $$E_{W e i}(A)=\frac{1}{n} \sum_{i=1}^{n} \cos \frac{\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)}{2\left(1+\pi_{A}\left(x_{i}\right)\right)} \pi$$

A different axiomatic definition of entropy for IFSs was given in [35], and the entropy measure was defined as:

(23)

[TeX:] $$E_{G o o}(A)=\frac{1}{n} \sum_{i=1}^{n} \frac{1-\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|^{2}+\pi_{A}\left(x_{i}\right)}{2}$$

Liu and Ren [36] discussed the hesitation information of IFSs and given the entropy measure for IFSs as:

(24)

[TeX:] $$E_{L i u}(A)=\frac{1}{n} \sum_{i=1}^{n} \cos \frac{\left(\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right)\left(1-\pi_{A}\left(x_{i}\right)\right)}{2} \pi$$

However, the existing entropy measures cannot reasonably measure the fuzziness degree of IFSs in some cases. To solve this problem, it is necessary to introduce an improved intuitionistic fuzzy entropy measure. In [37], the authors presented a geometric interpretation of an IFS and defined the distances between intuitionistic fuzzy sets. Inspired by this study, we propose an effective intuitionistic fuzzy entropy measure based on geometric interpretation to overcome the disadvantages of existing entropy measures. The proposed entropy measure takes fully into account many sides, including intuitionistic index, membership degree, non-membership degree and relationships among them.

A geometric interpretation of an IFS is shown in Fig. 3, each element of an IFS X is mapped into a point in an equilateral triangle [TeX:] $$A^{\prime} B^{\prime} D^{\prime}$$ in their opinion. The triangle [TeX:] $$A^{\prime} B^{\prime} C^{\prime}$$ s an orthogonal projection of the triangle [TeX:] $$A^{\prime} B^{\prime} D^{\prime}.$$ Segment AB represents a fuzzy set described by two parameters [TeX:] $$\mu \text { and } \gamma.$$ All points which are above the segment [TeX:] $$A^{\prime} B^{\prime}$$ A^{\prime} B^{\prime} [TeX:] $$D^{\prime}.$$ When the hesitancy margin for [TeX:] $$D^{\prime}$$ is equal to 1, it is difficult to judge if this point belongs or does not belong to the set. When the distance from [TeX:] $$D^{\prime} \text { to } A^{\prime}$$ is equal to the distance to [TeX:] $$B^{\prime},$$ the fuzziness degree for [TeX:] $$D^{\prime}$$ is equal to 100%.

Fig. 3.

A 3-dimensional representation of an IFS.

In Fig. 3, we draw a perpendicular line from point [TeX:] $$D^{\prime}$$ on a line segment [TeX:] $$A^{\prime} B^{\prime},$$ two line intersect at point [TeX:] $$M^{\prime},$$ and a perpendicular bisector [TeX:] $$D^{\prime} M^{\prime}$$ is obtained. Obviously, [TeX:] $$\mu_{A}(x) \text { equals } \gamma_{A}(x)$$ on any point of [TeX:] $$D^{\prime} M^{\prime},$$ and triangle [TeX:] $$D^{\prime} B^{\prime} M^{\prime}$$ is complementary with triangle [TeX:] $$D^{\prime} A^{\prime} M^{\prime}.$$ It is worth noticing that E(A) is higher with higher [TeX:] $$\pi_{A}(x)$$ and is lower with higher [TeX:] $$\mu_{A}(x) \text { or } \gamma_{A}(x).$$ From Fig. 3, we can safely come to the following conclusions about entropy of IFSs E(A):

(1) [TeX:] $$E(A)=1, \text { iff } \mu_{A}(x)=\gamma_{A}(x)=0, \pi_{A}(x)=1 \text { for } \forall x \in X;$$

(2) [TeX:] $$E(A)=0, \text { if } \mu_{A}(x)=1, \gamma_{A}(x)=\pi_{A}(x)=0 \text { or } \gamma_{A}(x)=1, \mu_{A}(x)=\pi_{A}(x)=0 \text { for } \forall x \in X;$$

(3) [TeX:] $$E(A) \text { is only equal to } E\left(A^{c}\right) \text { on } A^{\prime} B^{\prime} D^{\prime};$$

(4) [TeX:] $$E(A) \text { becomes higher with the point of } D^{\prime} M^{\prime} \text { approaches } D^{\prime}$$

Based on the existing studies and conclusions above [37,38], we give an axiomatic definition of entropy measure of IFSs as follows:

DEFINITION 6: Let [TeX:] $$X=\left\{x_{i} | i=1,2, \cdots, n\right\}$$ be a finite universe of discourse. A real function E : [TeX:] $$\operatorname{IFS}(X) \rightarrow[0,1]$$ is called an entropy measure of an IFS(X), if E has the following properties:

(1) [TeX:] $$E(A)=0 \text { iff } A \text { is a crisp set };$$

(2) [TeX:] $$E(A)=0 \text { iff } \mu_{A}\left(x_{i}\right)=\gamma_{A}\left(x_{i}\right)=0 \text { for } \forall x_{i} \in X;$$

(3) [TeX:] $$E(A) \leq E(G) \text { if } A \text { is less fuzzy than } G \quad, \quad \text { i.e., when } \quad \pi_{A}\left(x_{i}\right)=\pi_{G}\left(x_{i}\right), \\ \mu_{A}\left(x_{i}\right) \leq \mu_{G}\left(x_{i}\right) \leq \gamma_{G}\left(x_{i}\right) \leq \gamma_{A}\left(x_{i}\right) \quad \text { or } \quad \mu_{A}\left(x_{i}\right) \geq \mu_{G}\left(x_{i}\right) \geq \gamma_{G}\left(x_{i}\right) \geq \gamma_{A}\left(x_{i}\right) \quad, \quad \text { or } \quad \text { when } \\ \pi_{A}\left(x_{i}\right) \leq \pi_{G}\left(x_{i}\right),\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|=\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right| \text { for } \forall x_{i} \in X;$$

(4) [TeX:] $$E(A)=E\left(A^{c}\right).$$

Through above analysis, we develop an effective entropy measure as follows:

(25)

[TeX:] $$E(A)=\frac{1}{n} \sum_{i=1}^{n} \frac{2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right)}{3+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|}$$

THEOREM 1: Let [TeX:] $$X=\left\{x_{i} | i=1,2, \cdots, n\right\}$$ be a finite universe of discourse. A real function E : [TeX:] $$\operatorname{IFS}(X) \rightarrow[0,1]$$ defined by the Eq. (25) is an entropy measure of the IFS.

Proof: As an intuitionistic fuzzy entropy measure, it should satisfy the axioms 1–4 in Definition 6.

(1) When A is a crisp set, we have [TeX:] $$\mu_{A}\left(x_{i}\right)=1, \quad \gamma_{A}\left(x_{i}\right)=\pi_{A}\left(x_{i}\right)=0 \quad \text { or } \quad \gamma_{A}\left(x_{i}\right)=1,\mu_{A}\left(x_{i}\right)=\pi_{A}\left(x_{i}\right)=0.$$ Hence E(A)=0. On the other hand, suppose E(A)=0. Then [TeX:] $$\frac{2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right)}{3+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|}=0.$$ So we have [TeX:] $$2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right)=0.$$ Since [TeX:] $$0 \leq 2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right| \leq 2 \text { and } 0 \leq \pi_{A}^{2}\left(x_{i}\right) \leq 1,$$ we then get [TeX:] $$0 \leq 2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|=0$$ and [TeX:] $$\pi_{A}^{2}\left(x_{i}\right)=0.$$ Thus we obtain [TeX:] $$\mu_{A}\left(x_{i}\right)=1, \gamma_{A}\left(x_{i}\right)=\pi_{A}\left(x_{i}\right)=0 \text { or } \gamma_{A}\left(x_{i}\right)=1, \mu_{A}\left(x_{i}\right)=\pi_{A}\left(x_{i}\right)=0.$$ Therefore A is a crisp set.

(2) We now suppose E(A)=1 Then we get [TeX:] $$\frac{2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right)}{3+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|}=1.$$ Thus we can obtain [TeX:] $$1+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+3\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|=\pi_{A}^{2}\left(x_{i}\right).$$ Since [TeX:] $$0 \leq \pi_{A}^{2}\left(x_{i}\right) \leq 1 \text { and } 1 \leq 1+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+3\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|,,$$ we then get [TeX:] $$\mu_{A}\left(x_{i}\right)=\gamma_{A}\left(x_{i}\right)=0.$$ Let us suppose [TeX:] $$\mu_{A}\left(x_{i}\right)=\gamma_{A}\left(x_{i}\right)=0,$$ we have E(A)=1.

(3) When [TeX:] $$\pi_{A}\left(x_{i}\right)=\pi_{G}\left(x_{i}\right), \text { and } \mu_{A}\left(x_{i}\right) \leq \mu_{G}\left(x_{i}\right) \leq \gamma_{G}\left(x_{i}\right) \leq \gamma_{A}\left(x_{i}\right) \text { or } \mu_{A}\left(x_{i}\right) \geq \mu_{G}\left(x_{i}\right) \geq \gamma_{G}\left(x_{i}\right) \geq\gamma_{A}\left(x_{i}\right).$$ So we get [TeX:] $$1-\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right| \leq 1-\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right| \text { and } \mu_{A}\left(x_{i}\right)=\mu_{G}\left(x_{i}\right)+\gamma_{G}\left(x_{i}\right).$$ Then we can have [TeX:] $$2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right) \leq 2-2\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right|+\pi_{G}^{2}\left(x_{i}\right) \quad \text { and }3+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right| \geq 3+\mu_{G}\left(x_{i}\right)+\gamma_{G}\left(x_{i}\right)+\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right|.$$ Thus [TeX:] $$E(A) \leq E(G).$$ When [TeX:] $$\pi_{A}\left(x_{i}\right) \leq \pi_{G}\left(x_{i}\right) \text { and }\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|=\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right|,$$ we optain [TeX:] $$\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right) \geq \mu_{G}\left(x_{i}\right)+\gamma_{G}\left(x_{i}\right).$$ Then we can get [TeX:] $$2-2\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right) \leq 2-2\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right|+\pi_{G}^{2}\left(x_{i}\right) \text { and } 3+\mu_{A}\left(x_{i}\right)+\gamma_{A}\left(x_{i}\right)+\left|\mu_{A}\left(x_{i}\right)-\gamma_{A}\left(x_{i}\right)\right| \geq 3+\mu_{G}\left(x_{i}\right)+\gamma_{G}\left(x_{i}\right)+\left|\mu_{G}\left(x_{i}\right)-\gamma_{G}\left(x_{i}\right)\right| \cdot \operatorname{So} E(A) \leq E(G).$$

(4) It is clear that [TeX:] $$A^{c}=\left\{\left\langle x_{i}, \gamma_{A}\left(x_{i}\right), \mu_{A}\left(x_{i}\right)\right\rangle | x_{i} \in X\right\} \quad \text { and } \quad \pi_{A^{c}}\left(x_{i}\right)=\pi_{A}\left(x_{i}\right).$$ then [TeX:] $$E\left(A^{c}\right)=\frac{1}{n} \sum_{i=1}^{n} \frac{2-2\left|\gamma_{A}\left(x_{i}\right)-\mu_{A}\left(x_{i}\right)\right|+\pi_{A}^{2}\left(x_{i}\right)}{3+\gamma_{A}\left(x_{i}\right)+\mu_{A}\left(x_{i}\right)+\left|\gamma_{A}\left(x_{i}\right)-\mu_{A}\left(x_{i}\right)\right|}.$$ Thus, we have [TeX:] $$E\left(A^{c}\right)=E(A).$$

Hence, we complete the proof of Theorem 1.

In the following, we demonstrate the performance of E(A) by the comparison of E(A) and the existing intuitionistic fuzzy entropy measures.

Let us calculate the entropy for IFSs: [TeX:] $$A_{1}=\left\{\left\langle x_{i}, 0,0\right\rangle | x_{i} \in X\right\}, A_{2}=\left\{\left\langle x_{i}, 0.5,0.5\right\rangle | x_{i} \in X\right\}, A_{3}=\left\{\left\langle x_{i}, 0.5,0.4\right\rangle | x_{i} \in X\right\}, A_{4}=\left\{\left\langle x_{i}, 0.2,0.1\right\rangle | x_{i} \in X\right\},A_{5}=\left\{\left\langle x_{i}, 0.3,0.3\right\rangle | x_{i} \in X\right\}, A_{6}=\left\{\left\langle x_{i}, 0.4,0.4\right\rangle | x_{i}\right.\in X\}, A_{7}=\left\{\left\langle x_{i}, 0.3,0.2\right\rangle | x_{i} \in X\right\}, A_{8}=\left\{\left\langle x_{i}, 0.4,0.3\right\rangle | x_{i} \in X\right\}, A_{9}=\left\{\left\langle x_{i}, 0.2,0.6\right\rangle | x_{i} \in X\right\} \text { and } A_{10}=\left\{\left\langle x_{i}, 0,0.5\right\rangle | x_{i} \in X\right\}.$$ According to the geometric interpretation of IFSs, it is clear that the fuzziness degrees of these IFSs are different. Table 1 presents the calculation results of different entropy measures.

Table 1.

Comparison of the different entropy measures

Entropy measure	A₁	A₂	A₃	A₄	A₅	A₆	A₇	A₈	A₉	A₁₀
[TeX:] $$E_{\text {Warg }}(A)$$	1	1	0.8333	0.8889	1	1	0.875	0.8571	0.5	0.5
[TeX:] $$E_{Z e n g}(A)$$	1	1	0.9	0.9	1	1	0.9	0.9	0.6	0.5
[TeX:] $$E_{L i}(A)$$	1	1	0.9495	0.9495	1	1	0.9495	0.9495	0.768	0.6875
[TeX:] $$E_{V e r m a}(A)$$	1	1	0.9905	0.9905	1	1	0.9905	0.9905	0.8463	0.7588
[TeX:] $$E_{W e i}(A)$$	1	1	0.9898	0.9957	1	1	0.9945	0.9927	0.8662	0.8662
[TeX:] $$E_{G a o}(A)$$	1	0.5	0.5	0.74	0.58	0.52	0.62	0.54	0.44	0.5
[TeX:] $$E_{L i n}(A)$$	1	1	1	0.9999	1	1	0.9999	0.9999	0.9995	0.998
[TeX:] $$E(A)$$	1	0.5	0.4525	0.6735	0.6	0.5368	0.5694	0.4974	0.2952	0.3125

From Table 1, we find that [TeX:] $$E_{\text {zeng }}(A), E_{L i}(A) \text { and } E_{\text {Verma}}(A)$$ cannot distinguish the fuzziness degrees of [TeX:] $$A_{3}, A_{4}, A_{7} \text { and } A_{8}.$$ Moreover, their calculation results show that the fuzziness degrees of [TeX:] $$A_{2}, A_{5}\text { and } A_{6}$$ are equal to 1, which is obviously unreasonable. The reason is that they omit intuitionistic index and only consider the difference between membership degree and non-membership degree. Although [TeX:] $$E_{\text {Warg }}(A), E_{\text {Wei }}(A), E_{\text {Gao }}(A) \text { and } E_{\text {Liu }}(A)$$ consider the intuitionistic index, they do not take full account of the relationship between known information and unknown information. For this reason, their calculation results do not clearly show the differences between these IFSs in terms of fuzziness and intuitionism.

By comparing the calculation results as listed in Table 1, the proposed entropy measure [TeX:] $$E(A)$$ can definitely distinguish the fuzziness degrees of these IFSs. Further, for [TeX:] $$A_{3}, A_{4}, A_{7} \text { and } A_{8},$$ we can see that differences between the membership degrees and the non-membership degrees are the same. In this case, the greater value of [TeX:] $$\pi_{A}\left(x_{i}\right),$$ the fuzzier the corresponding element [TeX:] $$A_{i}.$$ Thus uncertain information of [TeX:] $$A_{4}$$ is the biggest, uncertain information of [TeX:] $$A_{7}$$ is more than that of [TeX:] $$A_{8}.$$ and uncertain information of [TeX:] $$A_{3}$$ is the least. The calculation results of the proposed entropy measure show that [TeX:] $$E\left(A_{3}\right)<E\left(A_{8}\right)<E\left(A_{7}\right)<E\left(A_{4}\right),$$ as we would expect. Hence, the proposed entropy measure is not only mathematically correct but also reasonable.

3.2 Improved Particle Swarm Optimization Algorithm

In this study, EODPSO is proposed in this study to optimize clustering parameters. To improve the performance of the PSO algorithm, we use a hybrid strategy to avoid local minima and accelerate convergence speed. The hybrid strategy contains the following three aspects.

(1) Chaotic opposition-based learning (COL) [39] is used to obtain a better initial population.

(2) CA is combined with PSO algorithm to enhance local search ability.

(3) We present a new population search strategy that combines OL and DE to extend the performance of global search.

3.2.1 Generation of initial population

A reasonable initial population plays a vital role in helping to avoid premature local optimization and accelerate convergence speed. COL is a population initialization method which combines chaotic systems and OL to obtain initial population. Thus COL is employed in EODPSO algorithm to generate an initial population.

In [39], the authors used the Sine map in their chaotic system. Nevertheless, the Sine map has two problems. Firstly, its chaotic range is limited to a finite interval. Secondly, the data distribution of output chaotic sequences is non-uniform. To avoid these problems, we use an improved 1-dimensional chaotic system [40], called Tent-Sine system, in COL to obtain better result. The Tent-Sine system uses the Tent and Sine maps as seed maps. The Tent-Sine system is defined as follows:

(26)

[TeX:] $$Q_{i+1}=\left\{\begin{aligned} \bmod \left(\frac{r^{\prime} Q_{i}}{2}+\frac{\left(4-r^{\prime}\right)\left(\sin \left(\pi Q_{i}\right)\right)}{4}, 1\right) & Q_{i}<0.5 \\ \bmod \left(\frac{r^{\prime}\left(1-Q_{i}\right)}{2}+\frac{\left(4-r^{\prime}\right)\left(\sin \left(\pi Q_{i}\right)\right)}{4}, 1\right) & Q_{i} \geq 0.5 \end{aligned}\right.$$

where [TeX:] $$Q_{i}$$ is the output chaotic sequence, [TeX:] $$r^{\prime}$$ is a parameter with range of [TeX:] $$(0,4]$$ mod is the modulo operation, is the iteration number, [TeX:] $$\left\{\begin{array}{cc}{\frac{r^{\prime} Q_{i}}{2}} & {Q_{i}<0.5} \\ {\frac{r^{\prime}\left(1-Q_{i}\right)}{2}} & {Q_{i} \geq 0.5}\end{array}\right.$$ is Tent map, [TeX:] $$\frac{\left(4-r^{\prime}\right)\left(\sin \left(\pi Q_{i}\right)\right)}{4}$$ is Sine map. At iteration , the opposite search of particle is carried out as follows:

(27)

[TeX:] $$\overline{y}_{i, j}(t)=y_{\min , j}(t)+y_{\max , j}(t)-y_{i, j}(t)$$

where [TeX:] $$y_{i, j}(t)$$ is current position of particle [TeX:] $$i, \overline{y}_{i, j}(t)$$ is the result of reverse search of particle , [TeX:] $$y_{\min , j}(t) \text { and } y_{\max j}(t)$$ are lower and upper bounds for the dimension , respectively.

COL algorithm consists of two main parts. In the first part, we use Tent-Sine system to generate an initial population. In the second part, we use OL to generate an opposite population. Then these two populations are amalgamated and a new initial population is generated.

Suppose all particles fly in D-dimensional space. The steps of COL are described by Algorithm 1.

Algorithm 1

COL algorithm

3.2.2 Enhancement of local search ability

CA is a space and time-discrete dynamical system that evolves according to a set of local rules and neighbor cells. It is composed of five basic components, which are cell, cell space, cell state, neighborhood, and transition rule. As is shown in Fig. 4, Moore model is one of the well-known neighborhoods in 2- dimensional CA. The black cell is the center one, and those gray ones are its neighbors. At a discrete time step, the update of a cell state obtains by taking into account the states of cells in its local neighborhood. All cells communicate and change states synchronously.

Fig. 4.

Moore model.

Current studies have shown that CA is a powerful method to enhance the local search ability of the PSO algorithm [24]. In EODPSO algorithm, CA is integrated in particle update to avoid local minima, and each particle is defined as a single cell and is distributed in 2-dimensional space. The CA model for EODPSO algorithm is described as follows:

(1) Cell: each particle is defined as a single cell;

(2) Cell space: the set of all cells;

(3) Cell state: cell state is defined as the particle’s position. The state of cell i is defined as [TeX:] $$s_{i}(t)=y_{i}(t);$$

(4) Neighborhood: the neighborhood of particle is defined as [TeX:] $$H(i)=\{i+1, i+2, \cdots, i+h\},$$ where h is the number of neighbors;

(5) Transition rule: [TeX:] $$s_{i}(t)=f\left(s_{i}(t), s_{i+1}(t), s_{i+2}(t), \cdots, s_{i+h}(t)\right).$$

3.2.3 Population search strategy

In order to accurately judge when population search strategy is implemented, we use the entropy of IFSs as population diversity index and utilize the proposed entropy measure to achieve the state of particle swarm in the EODPSO algorithm. According to Theorem 1, the proposed intuitionistic fuzzy entropy measure in PSO algorithm is defined as follows:

DEFINITION 7: Let [TeX:] $$A^{\prime}$$ is a particle swarm in a finite set [TeX:] $$X=\left\{x_{i} | i=1,2, \cdots, n\right\}.$$ At iteration , the intuitionistic fuzzy entropy measure in PSO algorithm is defined as:

(28)

[TeX:] $$E^{t}\left(A^{\prime}\right)=\frac{1}{n} \sum_{i=1}^{n} \frac{2-2\left|\mu_{A_{k,i}^{\prime}}^{t}\left(x_{i}\right)-\gamma_{A_{k,i}^{\prime}}^{t}\left(x_{i}\right)\right|+\pi_{A_{k,i}^{\prime}, i}^{t}\left(x_{i}\right)}{3+\mu_{A_{k,i}^{\prime}}^{t}\left(x_{i}\right)+\gamma_{A_{k,i}^{\prime}}^{t}\left(x_{i}\right)+\left|\mu_{A_{k,i}^{\prime}}^{t}\left(x_{i}\right)-\gamma_{A_{k, i}^{\prime}}^{t}\left(x_{i}\right)\right|}$$

where [TeX:] $$E^{t}\left(A^{\prime}\right)$$ is in range [TeX:] $$[0,1].$$

It is known from Definition 7 that when the entropy value equals 0, all particles converge to one point. When entropy value equals 1, all particles reach the maximum dispersion. In addition, is used as a threshold to decide whether the population search strategy is implemented. When the entropy value is smaller than , it means that EODPSO algorithm falls into local optima. In this case, we use the proposed population search strategy to enhance global search ability.

DE algorithm is an evolutionary algorithm based on real number coding and the evolution of population. It guides the population towards the vicinity of the global optimum by repeating cycles of mutation, crossover and selection. DE algorithm starts with a population of NP D-dimensional search variable vectors. During the iteration t+1, for each individual [TeX:] $$y_{i}(t)=\left\{y_{i_{1}}(t), y_{i_{2}}(t), \cdots y_{i_{n}}(t)\right\}(i \in\{1,2, \cdots, N P\}),$$ a mutation vector [TeX:] $$\overline{v}_{i}(t)$$ is computed by using the following mutation operator:

(29)

[TeX:] $$\overline{v}_{i}(t)=y_{\alpha_{1}}(t)+F\left(y_{\alpha_{2}}(t)-y_{\alpha_{3}}(t)\right)$$

where [TeX:] $$\alpha_{1}, \alpha_{2} \text { and } \alpha_{3}$$ are random unequal integers within the set [TeX:] $$\{1, \cdots, i-1, i+1, \cdots, N\}.$$ F is the mutation control parameter.

The trial vector [TeX:] $$y_{i, j}^{\prime}(t+1)$$ is generated by using the following crossover operation:

(30)

[TeX:] $$y_{i, j}^{\prime}(t+1)=\left\{\begin{array}{lc}{\overline{v}_{i, j}(t)} & {\text { if }\left(r a n d_{j} \leq C R\right) \text { or }\left(j=j_{\text {rand}}\right)} \\ {y_{i, j}(t)} & {\text { otherwise }}\end{array}\right.$$

where [TeX:] $$j=1,2, \cdots, N, y_{i, j}^{\prime}(t+1), \overline{v}_{i, j}(t) \text { and } y_{i, j}(t) \text { are } j^{\text {th }}$$ parameter for the [TeX:] $$t^{\mathrm{th}}$$ trial, mutant and vectors. [TeX:] $$r a n d_{j}$$ is a uniformly distributed random number within [TeX:] $$[0,1].$$ CR is a cross control parameter and is usually selected from within [TeX:] $$(0,1].$$ It controls the diversity of population and helps the DE algorithm to escape from local optima. [TeX:] $$j_{\text {rand }}$$ is a randomly chosen index within [TeX:] $$[0, N].$$ It ensures that [TeX:] $$y_{i, j}^{\prime}(t+1)$$ gets at least one parameter from [TeX:] $$\overline{v}_{i}(t).$$

Moreover, as discussed in Section 3.2.1, OL is also an effective algorithm for searching population. The main idea of OL is to consider an estimate and its corresponding opposite estimate simultaneously. Our new population search strategy makes use of the merits of DE and OL algorithm. It not only helps the PSO algorithm to escape from local optima but also accelerates the convergence speed of the PSO algorithm. In the new population search strategy, OL and DE are applied to obtain a new population. Specifically, we firstly use OL to generate an opposite population, and then we use mutation and crossover to generate a mutation population. Finally, we select S fittest particles from the current population, opposite population and mutation population as a new population.

3.2.4 EODPSO algorithm

In EODPSO algorithm, each particle is defined as cell and updates its current status with neighbors’ states. Based on the evolution mechanism of CA, COL is used to generate a reasonable initial population. Then, at each iteration, each particle’s velocity and position are updated by using the evolution mechanism of CA. When the entropy value of the population is smaller than threshold β, then OL and DE are applied to generate an opposite population and a mutation population, respectively. A new population is obtained by selecting S fittest particles from the current population, opposite population and mutation population. At the next iteration, the new population is considered as an initial population. To accelerate convergence speed and improve optimization quality effectively, inertia weight W is calculated by using a linear decreasing way. W is calculated as follows:

(31)

[TeX:] $$w(t)=w_{\max }-\frac{t\left(w_{\max }-w_{\min }\right)}{T_{\max }}$$

where [TeX:] $$w_{\max } \quad \text { and } \quad w_{\min }$$ represent the maximum and minimum of W respectively, t is the current iterative number and [TeX:] $$T_{\max }$$ is the maximum number of iterations.

EODPSO algorithm is shown as follows:

Algorithm 2

EODPSO algorithm

3.3 Mobile User Interface Pattern Clustering

In this section, we propose a SSKFCM clustering method to cluster the MUIP data. In the framework of MUIP clustering, EODPSO algorithm is used to optimize clustering parameters.

First of all, EODPSO algorithm is exploited to optimize the kernel parameter σ of Gaussian kernel function. During the process of optimizing parameter , fitness function is defined to assess the generated solutions. One of the clustering evaluation metric, Davies-Bouldin index (DBI) [41], is used in this fitness function to measure the influence of on clustering quality. The lower value of DBI signifies better clustering result. The fitness function is defined as follows:

(32)

[TeX:] $$f_{1}(x)=\frac{1}{1+\overline{R}}$$

where [TeX:] $$\overline{R}$$ is DBI.

The DBI is defined as follows:

(33)

[TeX:] $$\overline{R} \equiv \frac{1}{e^{\prime}} \sum_{i=1}^{e^{\prime}} R_{i}$$

where [TeX:] $$e^{\prime}$$ is the number of clusters. [TeX:] $$R_{i} \equiv \text { maximum of } R_{i, j}.$$

(34)

[TeX:] $$R_{i, j} \equiv \frac{d_{i}+d_{j}}{M_{i, j}}$$

where [TeX:] $$d_{i} \text { and } d_{j}$$ are the scatter with the [TeX:] $$i^{\text {th }} \text { center and } j^{\text {th }}$$ center, respectively. [TeX:] $$M_{i, j}$$ is the distance between the [TeX:] $$i^{t h}$$ cluster center and the [TeX:] $$j^{\text {th }}$$ cluster center.

We can see that the smaller the value of [TeX:] $$\overline{R},$$ the greater the fitness value [TeX:] $$f_{1}(x)$$ and the better value of parameter.

Then, considering that the cluster centers and the number of clusters also influence the clustering effect directly, the EODPSO algorithm is reused to initialize the cluster centers and DBI is used as a judgment index to determine the optimal number of clusters. In [42], they discussed the optimal number of clusters. The results of their study show that the optimal number of clusters is in range [TeX:] $$[0, \sqrt{n}],$$ where is the size of the data set [TeX:] $$Z=\left\{z_{1}, z_{2}, \cdots, z_{n}\right\}.$$

A fitness function is defined to assess the generated cluster centers during the initialization phase. The definition of the fitness function is expressed below:

(35)

[TeX:] $$f_{2}(x)=\frac{1}{1+J(U, B)}$$

where [TeX:] $$J(U, B)$$ is the objective function of KFCM, as shown in Eq. (5).

It has been expressed clearly that the smaller the value of [TeX:] $$J(U, B),$$ the better the effect of the clustering and the greater the fitness value of [TeX:] $$f_{2}(x).$$

Fig. 5 illustrates the framework of MUIP clustering method. The MUIP clustering includes two stages. In the first stage, initial cluster centers are generated by using EODPSO algorithm. In the second stage, the kernel parameter is optimized by utilizing EODPSO algorithm. Then, the fuzzy partition matrix and cluster centers are updated. If the value of DBI is greater than maximum number of clusters [TeX:] $$e_{\max },$$ clustering method is stopped and the fuzzy partition matrix and cluster centers are output. Otherwise, clustering method continues to optimize the kernel parameter and update the fuzzy partition matrix and cluster centers.

Fig. 5.

Framework of MUIP clustering.

Algorithm 3 shows the specific steps of MUIP clustering. MUIP clustering consists of two loops. The inner loop is used to optimize the parameter and update [TeX:] $$u_{i, j} \text { and } b_{j}.$$ The outer loop is used to determine the fuzzy partition matrix and cluster centers. At each iteration of the inner loop, the clustering method first optimizes the parameter σ by using EODPSO algorithm and fitness function [TeX:] $$f_{1}(x).$$ Then, the fuzzy partition matrix [TeX:] $$u_{i, j} \text { and cluster centers } b_{j}$$ are updated. At each iteration of the outer loop, the value of DBI is calculated and used to determine the number of clusters.

Algorithm 3

MUIP clustering

4. Experiments and Analysis

4.1 Data Collection

In general, developers select MUIPs according to functional goals. Therefore, we exploit the MUIP clustering method to cluster MUIPs based on the functional goals. Our data set is constructed by collecting 5,397 MUIPs from [43,44] and seven MUIPs websites (http://www.mobile-patterns.com/, http://inspired-ui.com/, http://ui-patterns.com/, http://pttrns.com/, https://www.cocoacontrols.com/, http://ios-patterns.com/, http://mobiledesign.com/). To facilitate testing, we define 51 types of functional goals based on our previous study on MUIP data collection [11].

4.2 Evaluation Metrics

In our experiments, DBI, accuracy [45], adjusted rand index (ARI) [46] and normalized mutual information (NMI) [47] are employed to evaluate the performance of the experimented methods. DBI is used to measure the within-cluster scatter and the inter-cluster separation. Since ARI is not sensitive to the number of clusters, it is used to compare two partitions with different cluster numbers. Accuracy and NMI are used to measure the quality and efficiency of the cluster method, respectively. Note that the greater values of accuracy, ARI and NMI signify better clustering result. DBI is defined above and the other three metrics are defined as follows:

(36)

[TeX:] $$\text {Accuracy}=\frac{\sum_{i=1}^{n} \delta\left(a_{i}, \operatorname{map}\left(\overline{a}_{i}\right)\right)}{n}$$

where n is the total number of data, [TeX:] $$a_{i} \text { and } \overline{a}_{i}$$ are the true class label and the obtained cluster label, [TeX:] $$\delta(a, \overline{a})$$ is a function that equals 1 if [TeX:] $$a=\overline{a}$$ and equals 0 otherwise. Function [TeX:] $$\operatorname{map}(\cdot)$$ is a permutation function that maps each cluster label to a class label. The optimal matching can be found with Hungarian algorithm [48].

(37)

[TeX:] $$A R I=\frac{\sum_{i=1}^{\mathfrak{q}} \sum_{j=1}^{e_{2}}\left(\begin{array}{c}{n_{i, j}} \\ {2}\end{array}\right)-\left(\begin{array}{c}{n} \\ {2}\end{array}\right)^{-1} \sum_{i=1}^{\mathfrak{q}}\left(\begin{array}{c}{n_{i}} \\ {2}\end{array}\right) \sum_{j=1}^{e_{2}}\left(\begin{array}{c}{n_{j}} \\ {2}\end{array}\right)}{\frac{1}{2}\left[\sum_{i=1}^{e_{1}}\left(\begin{array}{c}{n_{i}} \\ {2}\end{array}\right)+\sum_{j=1}^{e_{2}}\left(\begin{array}{c}{n_{j}} \\ {2}\end{array}\right)\right]-\left(\begin{array}{c}{n} \\ {2}\end{array}\right)^{-1} \sum_{i=1}^{n}\left(\begin{array}{c}{n_{i}} \\ {2}\end{array}\right) \frac{e_{i}}{j=1}\left(\begin{array}{c}{n_{j}} \\ {2}\end{array}\right)}$$

where n is the total number of data, [TeX:] $$\eta_{i, j}$$ is the number of data that belong to cluster and cluster , [TeX:] $$\boldsymbol{n}_{i}$$ denotes the number of data that belong to cluster [TeX:] $$i, n_{j}$$ denotes the number of data that belong to cluster [TeX:] $$j \cdot e_{1}$$ is the number of clusters that are obtained after running a clustering method. [TeX:] $$e_{2}$$ is the number of clusters that come from priori partition.

(38)

[TeX:] $$N M I=\frac{\sum_{i=1}^{e^{\prime}} \sum_{j=1}^{e^{\prime}} n_{i, j} \log \frac{n n_{i, j}}{n_{i} n_{j}}}{\sqrt{\sum_{i=1}^{e^{\prime}} n_{i} \log \frac{n_{i}}{n} \sum_{j=1}^{e^{\prime}} n_{j} \log \frac{n_{j}}{n}}}$$

where [TeX:] $$e^{\prime}$$ is the number of clusters, is the total number of data, [TeX:] $$n_{i, j}$$ denotes the number of data in cluster i as well as in cluster [TeX:] $$j, n_{i} \text { and } n_{j}$$ denote the number of data in cluster and cluster , respectively.

4.3 Parameters Setting

Before applying our method to MUIP data, several parameters need to be set. The parameters of our method are set as: [TeX:] $$S=50, \quad K_{\max }=300, \quad w_{\max }=0.9, \quad w_{\min }=0.1, \quad c_{1}=c_{2}=2, \quad T_{\max }=500, \quad h=8, \eta=0.0001, m=2, q_{\max }=800, F=0.5, C R=1.$$ Considering that the change of entropy value affects the performance of the EODPSO algorithm and further affects the clustering result of the proposed clustering method, we evaluate the critical by using the above evaluation metrics. We increase gradually from 0.1 to 0.9 and the proposed clustering method is run 20 times for each value of . We randomly select 10% data as labeled data and the rest as unlabeled data each time. Fig. 6 reports the mean values and standard deviations of the four evaluation metrics for different values of . It should be noted that“Mean”indicates the mean value and “Std” indicates the standard deviation in this paper. It is clear that these four evaluation metrics change significantly when changing the value of from 0.1 to 0.7, but they change very slightly when changing the value of from 0.7 to 0.9. At the same time, we can see from the results that values of Accuracy, ARI and NMI are close to their maximums and values of DBI is close to its minimum after [TeX:] $$\beta=0.7.$$ This means when [TeX:] $$\beta<0.7,$$ EODPSO algorithm falls into local optima. In this case, it requires a population search strategy to jump out of local optima. On the contrary, when [TeX:] $$\beta \geq 0.7,$$ EODPSO algorithm does not fall into local optima. In this case, the population diversity is not much affected by the increasing value of . All these results indicate that the critical value of is 0.7. Therefore, we set [TeX:] $$\beta=0.7$$ in our clustering method.

Fig. 6.

The effect of parameter on the proposed clustering method.

4.4 Results and Comparisons

In this paper, the experimental studies focus on the clustering effect and the convergence of the proposed MUIP clustering method. In order to make it evident to show the performance of EODPSO clustering method, other relevant clustering methods are used as comparisons. Specifically, DBSCAN, GDBSCAN, BIRCH and CLARANS are used in semi-supervised clustering method to cluster MUIP data. Additionally, GA, SA, PSO, and four variant PSO algorithms mentioned in this paper, namely DEPSO, CPSO, CAPSO, and EPSO, are used to replace EODPSO algorithm in the framework of MUIP clustering. Then the seven alternative SSKFCM clustering methods are applied to cluster MUIP data.

The first experiment is to compare the clustering effect of those clustering methods in different ratios of labeled data. All methods are run 20 times independently on MUIP data. Fig. 7 shows the mean values of four evaluation metrics of the clustering method accompanied with standard deviations. The mean value indicates the clustering effect of the methods and the standard deviation indicates the stability of the methods. Mean values of four evaluation metrics are clearly evident that PSO-based SSKFCM clustering methods usually outperform other semi-supervised clustering methods. The reasons are as follows. Firstly, density-based spatial clustering methods cannot discover overlapping clusters. Secondly, BIRCH and CLARANS clustering methods only work well for convex or spherical clusters. Thirdly, GA and CA easily fall into local optima and have slow convergence speed. Moreover, it can be seen that EODPSO clustering method surpasses all other PSO-based clustering methods. Compared to other PSO algorithms, EODPSO algorithm has better global search ability because it utilizes a hybrid strategy to avoid local minima and accelerate convergence speed. Thus EODPSO clustering method achieves better results than other PSO-based clustering methods.

Fig. 7.

Comparison of all clustering methods under different ratio of labeled data: (a) DBI, (b) accuracy, (c) ARI, and (d) NMI.

It is worth noting that the performances of all methods are less dependent on labeled data as the ratio of labeled data increases. The reason is that when the number of labeled data reaches a certain ratio, the whole MUIP data space can be represented better and the influence of labeled data on clustering effect decreases gradually.

Tables 2–7 and Fig. 8 show more details on the performance of the clustering methods. The best results are highlighted in bold in tables. As can be seen from Tables 2–5, the overall standard deviations of four evaluation metrics of EODPSO clustering method are lower than that of other methods. It means that EODPSO clustering method has not only better clustering effect but also better stability than all other clustering methods.

Table 2.

Mean and standard deviation results of DBI for all clustering methods

Clustering method		Ratio of labeled data (%)
Clustering method		10	20	30	40	50	60	70	80	90	100
EODPSO	Mean	0.3985	0.3546	0.3373	0.3305	0.3215	0.3182	0.3143	0.3117	0.3094	0.3076
EODPSO	Std	0.0025	0.0041	0.0063	0.0124	0.0039	0.0105	0.0085	0.0068	0.0042	0.0038
PSO	Mean	0.6681	0.6174	0.5911	0.5855	0.5676	0.5456	0.5335	0.5309	0.5248	0.5185
PSO	Std	0.0125	0.0091	0.0108	0.0174	0.0098	0.0116	0.0145	0.0078	0.0137	0.0074
DEPSO	Mean	0.6052	0.5477	0.5234	0.5086	0.5028	0.4847	0.4813	0.4739	0.4672	0.4655
DEPSO	Std	0.0115	0.0085	0.00145	0.0095	0.0135	0.0074	0.0095	0.0068	0.0116	0.0061
CPSO	Mean	0.4968	0.4419	0.4182	0.4047	0.4021	0.3917	0.3878	0.3806	0.3772	0.3737
CPSO	Std	0.0059	0.0133	0.0093	0.0085	0.0105	0.0093	0.0089	0.0145	0.0062	0.0041
CAPSO	Mean	0.4765	0.4363	0.4238	0.4114	0.3933	0.3771	0.3727	0.3683	0.3607	0.3579
CAPSO	Std	0.0146	0.0139	0.0082	0.0058	0.0092	0.0047	0.0106	0.0095	0.0087	0.0069
EPSO	Mean	0.4557	0.4033	0.3801	0.3754	0.3609	0.3577	0.3517	0.3452	0.3414	0.3375
EPSO	Std	0.0052	0.0066	0.0085	0.0077	0.0117	0.0106	0.0099	0.0078	0.0119	0.0101
DBSCAN	Mean	0.6423	0.6152	0.6067	0.5823	0.5758	0.5518	0.5425	0.5204	0.5101	0.5025
DBSCAN	Std	0.0132	0.0105	0.0082	0.0097	0.0068	0.0081	0.0087	0.0105	0.0058	0.0059
GDBSCAN	Mean	0.6258	0.5905	0.5703	0.5571	0.5409	0.5225	0.5108	0.4905	0.4792	0.4757
GDBSCAN	Std	0.0115	0.0079	0.0087	0.0051	0.0059	0.0068	0.0091	0.0062	0.0084	0.0044
BIRCH	Mean	0.7177	0.6905	0.6852	0.6751	0.6658	0.6419	0.6312	0.6218	0.6102	0.6057
BIRCH	Std	0.0125	0.0105	0.0112	0.0095	0.0071	0.0078	0.0083	0.0092	0.0085	0.0068
CLARANS	Mean	0.7081	0.6849	0.6705	0.6618	0.6603	0.6517	0.6389	0.6154	0.6058	0.5962
CLARANS	Std	0.0097	0.0095	0.0118	0.0068	0.0074	0.0066	0.0087	0.0059	0.0093	0.0082
GA	Mean	0.6788	0.6525	0.6409	0.6385	0.6197	0.5986	0.5884	0.5728	0.5621	0.5584
GA	Std	0.0095	0.0115	0.0093	0.0085	0.0068	0.0087	0.0094	0.0121	0.0075	0.0103
SA	Mean	0.6518	0.6347	0.6214	0.6156	0.6033	0.5909	0.5817	0.5604	0.5536	0.5477
SA	Std	0.0067	0.0095	0.0094	0.0072	0.0117	0.0102	0.0047	0.0061	0.0093	0.0088