## Yongli Liu* , Hengda Wang* , Tianyi Duan* , Jingli Chen* and Hao Chao*## |

Notation | Description |

[TeX:] $$\boldsymbol{C}, \boldsymbol{N}$$ | Numbers of clusters, objects |

[TeX:] $$u_{c i}$$ | Fuzzy object partitioning membership |

[TeX:] $$v_{c}$$ | Centroid/Medoid of the c^{th} cluster |

[TeX:] $$x_{i}$$ | The i^{th} object |

[TeX:] $$m$$ | FCM user-defined parameters |

[TeX:] $$w$$ | Weights of objects |

For describing within-cluster compactness and between-cluster separation at the same time, Wu et al. [5] proposed the fuzzy total scatter matrix S_{FT}, the fuzzy within-cluster scatter matrix SFW and fuzzy between-cluster scatter matrix S_{FB}. These three matrices are defined as:

where m is the weighting exponent (m>1), [TeX:] $$\overline{x}=\sum_{i=1}^{N} x_{i} / N$$, and we restrict [TeX:] $$\sum_{c=1}^{C} u_{c i}=1, u_{c i} \in[0,1]$$. Furthermore, [TeX:] $$S_{F T}=S_{F W}+S_{F B}$$ is satisfied among [TeX:] $$S_{F T}, S_{F W} \text { and } S_{F B}$$. For minimizing the within-cluster compactness measure and simultaneously maximizing the between-cluster separation measure, the objective function of FCS is defined as:

where ||xi-v_{c}||^{2} is the square of Euclidean distance between the i^{th} data point and the centroid of the c^{th} cluster. To guarantee that no two of the cluster kernels will overlap, ηc is chosen as Eq. (5) such that the parameter will control the influence of between-cluster separation.

where 0≤β≤1.0, k=1,...,C.

For minimizing the objective function as Eq. (4), we make some restrictions on it.

For the given data point [TeX:] $$\left\|x_{i}-v_{c}\right\|^{2} \leq \eta_{c}\left\|v_{c}-\overline{x}\right\|^{2}$$, then [TeX:] $$u_{c i}=1, \text { and } u_{c' i}=0$$. That is, each cluster in FCS will have a crisp boundary such that all data points inside this boundary will have a crisp membership value [TeX:] $$u_{i j} \in\{0,1\}$$ and other data points outside this boundary will have fuzzy membership values [TeX:] $$u_{i j} \in[0,1]$$, as shown in Fig. 1.

2.2 Incremental Fuzzy Clustering Algorithm Based on FCMIn this section, we introduce two incremental fuzzy clustering algorithms based on FCM, called SPFCM and OFCM. These two algorithms combine single-pass clustering and online clustering methods separately with the fundamental algorithm weighted fuzzy C-means (WFCM).

2.2.1 WFCM algorithm WFCM algorithm is a modification of FCM, which weights the centroids obtained by each iteration in the FCM algorithm to ensure that centroids with higher weights are more representative than those with lower weights. For a given dataset [TeX:] $$X=\left[x_{1}, \ldots, x_{n}\right]$$, which needs to be clustered into C groups, the goal of WFCM is to minimize the objective function J_{WFCM},

where w_{i} is the weight of the i-th data point.

The restrictions below are needed so that Eq. (8) could achieve the minimum value.

Traditional clustering algorithms calculate the entire dataset directly, but they will be not available if the capacity of a single memory is not enough to store the dataset. To solve this problem, Hore et al. [18] proposed two incremental algorithms, SPFCM and OFCM.

In SPFCM, WFCM algorithm is used for clustering. In the starting point, the weight of each point is set to 1, [TeX:] $$W_{d a t a}=[1,1, \dots, 1]^{T}$$. The dataset is then divided into several chunks, [TeX:] $$X=\left[X_{1}, \ldots, X_{h}\right]$$. When the first chunk comes (t=1), the X_{1}, the centroid is obtained as [TeX:] $$\Delta=\left[v_{1}, \ldots, v_{c}\right]$$ and SPFCM calculates its weight as [TeX:] $$w_{c}=\sum_{i=1}^{n}\left(u_{c i}\right) w_{d a t a}$$, where n is the number of data points in the first chunk. After the first chunk is processed (t>1), a new chunk is generated by merging the previous centroid into the next one, [TeX:] $$X^{t^{\prime}}=\left[\Delta^{t-1}, X^{t}\right]$$, and the weight of the new centroid is updated as [TeX:] $$w_{c}^{t}=\sum_{i=1}^{n}\left(u_{c i}\right)\left[w_{c}^{t-1}, w_{d a t a}\right]$$.

Different from SPFCM, the OFCM algorithm classifies each chunk of data individually using FCM, and then centroids of all the chunks are collected and grouped by performing clustering again. Then the centroids of each chunk are updated to [TeX:] $$w=\left[w_{1}^{1}, ., w_{c}^{1}, w_{1}^{2}, \ldots, w_{c}^{2}, \ldots, w_{1}^{h}, \ldots, w_{c}^{h}\right]$$.

In this section, we introduce two incremental fuzzy clustering algorithms based on a fuzzy scatter matrix, called SPFCS and OFCS, respectively.

For applying single-pass and online clustering methods to FCS, the data points in FCS should be weighted. First, the weighted within-cluster and between-cluster scatter matrices are defined as

Based on above analysis, the objective function of incremental FCS algorithm is designed as:

restricted by the following constraints:

where

The constrained optimization of IFCS in Eq. (13) can be solved by applying Lagrange multiplier method, constrained by Eq. (14), yielding the new objective function as,

Taking the partial derivative of J(U,V) in Eq. (16) with respect to U, and setting the gradient to zero we have

Solving the above Eq. (17), subject to the constraints in Eq. (14), yield the formula for u_{ci} as

Likewise, taking the partial derivative of J(U, V) in Eq. (16) with respect to V, and setting the gradient to zero we have

Solving the above equation, we get v_{c} as

The incremental method for SPFCS and OFCS is similar to SPFCM and OFCM, using IFCS as the clustering method.

We implement experiments on six datasets, including two artificial datasets and four real datasets, whose information is shown as Table 2. In Table 2, the first two datasets are artificial datasets, and the others are real datasets. The two artificial datasets are constructed like the work of Wu et al. [5], and all the four real datasets are from UCI database. The four UCI datasets contain 150, 2310, 3498 and 20000 sample objects, respectively, and can help to show clustering performance of our algorithms under different data scale.

In the Unequal Sample Size dataset, there are 100 data points with two attributes. The samples can be divided into two groups, one of which has 97 samples and the other has 3 samples.

The Noise dataset is a two-dimensional dataset with the sample size n=400. The data points can be divided into two clusters, one of which has a noisy point.

The Iris dataset is a four-dimensional dataset with 150 data points consisted of 50 points from each of three clusters. Each cluster is linearly separable with the other two clusters.

The Statlog Segmentation dataset was drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. It’s a 19-dimensional dataset with 2310 data points. All these samples can be divided into 7 clusters, each of which contains 330 points.

The Pen-Based Recognition of Handwritten Digits dataset was created by collecting 250 samples from 44 writers. We randomly select 3498 samples for the experiment. These 16-dimensional samples can be divided into 10 clusters.

The Letter Recognition dataset is used to identify 26 letters. It’s a 16-dimensional dataset with 20,000 samples which can be divided into 26 clusters.

Table 2.

Datasets | Sample size | Attributes | Clusters |

Unequal Sample Size (USS) 100 2 2 | 100 | 2 | 2 |

Noise | 400 | 2 | 2 |

Iris | 150 | 4 | 3 |

Statlog Segmentation (SS) | 2310 | 19 | 7 |

Pen Digits (PD) | 3498 | 16 | 10 |

Letter Recognition (LR) | 20000 | 16 | 26 |

The clustering results of the algorithm are evaluated by F-measure [20] and entropy.

4.2.1 F-measureF-measure, also known as F-score, is the weighted harmonic mean of Precision and Recall. The value of F-measure is in [0,1]. The greater the value is, the better the clustering performs. The term is defined as

where P denotes the precision, which is the probability that a (randomly selected) retrieved document is relevant, and R denotes the recall, which is the probability that a (randomly selected) relevant document is retrieved in a search.

4.2.2 EntropyThe concept of entropy was originally proposed by the German physicist, Clausius [21] in 1865. It is used to represent the degree of internal chaos of the system. The entropy is defined as

where p_{i} represents the i^{th} occurrence probability of the ith sample. The smaller the entropy values, the better the clustering results.

For the USS dataset, we implement SPFCM, SPFCS, OFCM and OFCS algorithms separately with different values combinations of m for (2, 6) and β for 0.1 [10]. The initial sample size of each chunk is set to 75 points. This group of experimental results are illustrated as Fig. 2. In Fig. 2, the blue circles denote objects of one cluster, the red crosses denote objects of the other cluster, and the dark squares are centroids of the corresponding cluster. The two attributes of this dataset correspond to abscissa valuing from 20 to 100 and ordinate valuing from 20 to 70, respectively. Fig. 2(a) and (b) display clustering results of SPFCM with m=2 and m=6, respectively. We can intuitively see that there should be two clusters, however some objects that should be in one cluster are incorrectly divided into the other. In Fig. 2(a) and (b), objects in the right cluster are sparse, and they are more like noisy data in clustering. In this perspective, SPFCM is easily affected by noisy data. Furthermore, by comparing Fig. 2(a) and (b), SPFCM with different values of m generates results with great difference, because centroid of the second cluster has an apparent displacement. Fig. 2(c) and (d) illustrate clustering results of SPFCS with m=2 and m=6, respectively. The results are obviously better than results of SPFCM. In Fig. 2(c) and (d), objects are grouped into two clusters, in accordance with human's cognition. In other words, the so-called noisy data has very little influence on the SPFCS. Simultaneously, Fig. 2(c) is very similar to Fig. 2(d), which shows that different values of m also have very little influence on the SPFCS.

Fig. 2(e) and (f) show clustering results of OFCM with m=2 and m=6, respectively. We observe that clustering performance of OFCM is very similar to that of spFCM. And therefore, the similar conclusion can be drawn that OFCM is sensitive to noisy data and the value of m. In Fig. 2(g) and (h), clustering results of OFCS with m=2 and m=6 are introduced, respectively. In terms of clustering performance in this group of experiments, OFCS is very similar to SPFCS. Different from SPFCM and OFCM, SPFCS and OFCS are more robust to noise and insensitive to the fuzzy index m.

In the Noise dataset, objects have two attributes that correspond to abscissa valuing from -3 to 7 and ordinate valuing from -2 to 2, respectively. These objects are divided into two clusters, separated by 2 in the horizontal axis, shown as Fig. 3. Based on this dataset, we add a noisy data whose coordinate is (100, 0). Fig. 3(a), (c), (e), and (g) illustrate experimental results of SPFCM, SPFCS, OFCM, and OFCS with m=2, respectively, and Fig. 3(b), (d), (f), and (h) display clustering results of SPFCM, SPFCS, OFCM, and OFCS with m=6, respectively. The experimental results show that when m=2, SPFCM and OFCM algorithms obtain accurate clustering results, while when m=6, there exists one data point that is put into wrong cluster for the noisy data. However, SPFCS and OFCS both obtain accurate clustering separation when m is equal to either 2 or 6. It shows that SPFCS and OFCS algorithms are more robust to noise and more insensitive to the value of m than SPFCM and OFCM.

On the following four real datasets, we implemented six algorithms—SPFCM, single-pass hyperspherical fuzzy C-means (SPHFCM) [10], SPFCS, OFCM, online hyperspherical fuzzy C-means (OHFCM) [10], and OFCS. On the Iris dataset, the values of m in these six algorithms value 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, and 6. The sample size of each chunk is set to 30 points. And we choose F-measure as the criteria to evaluate the performance of the clustering algorithms. As shown in Fig. 4, when the value of m becomes larger, the performance of SPFCS and OFCS remain steady, while the F-measure of other algorithms drops down. The extreme deviation values of F-Measure of SPFCS, SPFCM, and SPHFCM are 0.047, 0.086, and 0.148, respectively, and their standard deviation values are 0.018, 0.035, and 0.047, respectively. In the OFCS, OFCM, and OHFCM group, the extreme deviation values are 0.011, 0.173, and 0.068, respectively, and the standard deviation values are 0.004, 0.066, and 0.023, respectively. Obviously, in these two experimental groups, the fluctuation of SPFCS and OFCS with different m value is the lowest. Hence, SPFCS and OFCS are more robust to the value of m than the others.

Tables 3 and 4 show the performances of SPFCM, SPHFCM, SPFCS, OFCM, OHFCM and OFCS on the four real datasets, in terms of F-measure and entropy, respectively. It can be seen from these two table that SPFCS and OFCS perform better than the other four algorithms in most cases. Mostly, the F-measure values of SPFCS and OFCS are larger than SPFCM, SPHFCM and OFCM, OHFCM separately and their entropy values are relatively lower. Especially in the PD dataset, the average F-measure value of SPFCS is larger than SPFCM and SPHFCM by 70.21% and 118.70% separately. And the average entropy value of SPFCS is smaller than SPFCM and SPHFCM by 81.38% and 61.14% separately. In the OFCS, OFCM, OHFCM group, the average deviation is 47.04%, 54.99% and 48.08%, 42.48%. So, we conclude that our SPFCS and OFCS algorithms perform better than the other four algorithms in terms of clustering accuracy on the four real datasets.

Table 3.

Dataset | Sample size | SPFCM | SPHFCM | SPFCS | OFCM | OHFCM | OFCS |

Iris | 10% | 0.81 | 0.76 | 0.91 | 0.89 | 0.80 | 0.89 |

20% | 0.84 | 0.88 | 0.91 | 0.90 | 0.82 | 0.90 | |

50% | 0.90 | 0.68 | 0.89 | 0.89 | 0.90 | 0.88 | |

SS | 10% | 0.51 | 0.42 | 0.53 | 0.52 | 0.42 | 0.60 |

20% | 0.50 | 0.34 | 0.52 | 0.46 | 0.39 | 0.60 | |

50% | 0.44 | 0.34 | 0.56 | 0.49 | 0.42 | 0.61 | |

PD | 10% | 0.42 | 0.43 | 0.65 | 0.37 | 0.43 | 0.69 |

20% | 0.44 | 0.28 | 0.66 | 0.43 | 0.44 | 0.64 | |

50% | 0.34 | 0.26 | 0.70 | 0.34 | 0.40 | 0.71 | |

LR | 10% | 0.14 | 0.15 | 0.19 | 0.15 | 0.17 | 0.21 |

20% | 0.14 | 0.14 | 0.18 | 0.15 | 0.17 | 0.20 | |

50% | 0.10 | 0.10 | 0.11 | 0.11 | 0.15 | 0.18 |

Table 4.

Dataset | Sample size | SPFCM | SPHFCM | SPFCS | OFCM | OHFCM | OFCS |

10% | Iris | 0.16 | 0.17 | 0.10 | 0.12 | 0.15 0.12 | |

20% | 0.14 | 0.11 | 0.11 | 0.10 | 0.16 | 0.10 | |

50% | 0.12 | 0.27 | 0.12 | 0.12 | 0.10 | 0.13 | |

SS | 10% | 0.53 | 0.62 | 0.48 | 0.53 | 0.62 | 0.42 |

20% | 0.54 | 0.74 | 0.49 | 0.56 | 0.65 | 0.41 | |

50% | 0.61 | 0.76 | 0.45 | 0.56 | 0.64 | 0.40 | |

PD | 10% | 0.62 | 0.64 | 0.35 | 0.70 | 0.60 | 0.34 |

20% | 0.61 | 0.84 | 0.36 | 0.62 | 0.60 | 0.40 | |

50% | 0.76 | 0.88 | 0.33 | 0.75 | 0.65 | 0.32 | |

LR | 10% | 1.25 | 1.23 | 1.13 | 1.25 | 1.17 | 1.08 |

20% | 1.28 | 1.26 | 1.16 | 1.24 | 1.20 | 1.09 | |

50% | 1.37 | 1.38 | 1.32 | 1.35 | 1.26 | 1.15 |

The SPFCM, SPHFCM, OFCM and OHFCM are FCM-type clustering algorithms. Although they are all incremental clustering algorithms and able to process large-scale data, they only try to minimize the within-cluster scatter matrix trace like FCM. However, both SPFCS and OFCS try to minimize the withincluster scatter matrix trace and maximize the between-cluster scatter matrix trace, which concerns the within-cluster compactness and the between-cluster separation simultaneously. Therefore, these two algorithms are easy to get higher clustering accuracy.

In the last part of our experiments, taking the SS dataset as an example, we implement an experiment with SPFCS and OFCS to investigate the influence of noise. By adding 5%, 10%, 20% and 40% noisy points to the dataset separately, we get the results shown as Fig. 5. Note that, when the proportion of noise added into the dataset increases, the clustering performance of SPFCS, OFCS, SPHFCM and OHFCM remains steady while SPFCM and OFCM declines. By calculating the standard deviation of F-measure value, we can illustrate the fluctuation of these algorithms with different proportion of noise. In Fig. 5(a), the standard deviation values of SPFCS, SPFCM and SPHFCM are 0.018, 0.067 and 0.015 separately. In Fig. 5(b), the values of OFCS, OFCM and OHFCM are 0.015, 0.045 and 0.026 separately. The experimental results show that, compared with SPFCM and OFCM, SPFCS and OFCS are more robust to noise.

For revealing latent knowledge hidden behind large-scale data, clustering technique has gained wide attention. In the process of clustering, the FCS algorithm considers synthetically within-cluster compactness and between-cluster separation, and therefore easily produces more accurate clustering results. However, this algorithm is difficult to process large-scale data.

Based on a fuzzy scatter matrix, we extend FCS and propose two incremental fuzzy clustering algorithms, SPFCS and OFCS. First, we weight the centroids obtained from each iteration of the FCS algorithm so that the weighted algorithm can be combined with ‘single-pass’ and ‘online’ algorithms. Then, we implement experiments with several artificial datasets and real datasets. Experimental results show that, compared with SPFCM, OFCM, SPHFCM and OHFCM, the SPFCS and OFCS algorithms have better clustering performance. Also, SPFCS and OFCS are more robust to the fuzzy index m and noise than the SPFCM and OFCM algorithms.

In clustering processes of SPFCS and OFCS, the number of clustering results needs to be specified in advance. It encourages us to design better K-value prediction algorithms in future studies.

The authors would like to thank the support of Foundation for University Key Teacher by Henan Province (No. 2015GGJS-068), Fundamental Research Funds for the Universities of Henan Province (No. NSFRF1616), and Foundation for scientific and technological project of Henan Province (No. 172102210279).

- 1 Y. Liu, X. Wan, "Information bottleneck based incremental fuzzy clustering for large biomedical data,"
*Journal of Biomedical Informatics*, vol. 62, pp. 48-58, 2016.doi:[[[10.1016/j.jbi.2016.05.009]]] - 2 A. K. Jain,
*Data clustering: 50 years beyond K-means*, Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, 2009.doi:[[[10.1007/978-3-540-87479-9_3]]] - 3 J. C. Bezdek, R. Ehrlich, W. Full, "FCM: the fuzzy c-means clustering algorithm,"
*Computers & Geosciences*, vol. 10, no. 2-3, pp. 191-203, 1984.doi:[[[10.1016/0098-3004(84)90020-7]]] - 4 Y. Zhou, H. F. Zuo, J. Feng, "A clustering algorithm based on feature weighting fuzzy compactness and separation,"
*Algorithms*, vol. 8, no. 2, pp. 128-143, 2015.doi:[[[10.3390/a8020128]]] - 5 K. L. Wu, J. Yu, M. S. Yang, "A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests,"
*Pattern Recognition Letters*, vol. 26, no. 5, pp. 639-652, 2005.doi:[[[10.1016/j.patrec.2004.09.016]]] - 6 C. Y. Chen, S. C. Hwang, Y. J. Oyang,
*An incremental hierarchical data clustering algorithm based on gravity theory*, in Advances in Knowledge Discovery and Data Mining. Heidelberg: Springer, pp. 237-250, 2002.doi:[[[10.1007/3-540-47887-6_23]]] - 7 S. Young, I. Arel, T. P. Karnowski, D. Rose, "A fast and stable incremental clustering algorithm," in
*Proceedings of 2010 7th International Conference on Information Technology: New Generations (ITNG)*, Las Vegas, NV, 2010;pp. 204-209. doi:[[[10.1109/ITNG.2010.148]]] - 8 S. Chakraborty, N. K. Nagwani,
*Analysis and study of incremental k-means clustering algorithm*, High Performance Architecture and Grid Computing. Heidelberg: Springer, pp. 338-341, 2011.doi:[[[10.1007/978-3-642-22577-2_46]]] - 9 L. E. Aik, T. W. Choon, "An incremental clustering algorithm based on Mahalanobis distance," in
*AIP Conference Proceedings*, 2014;vol. 1635, pp. 788-793. doi:[[[10.1063/1.4903672]]] - 10 J. P. Mei, Y. Wang, L. Chen, C. Miao, "Incremental fuzzy clustering for document categorization," in
*Proceedings of 2014 IEEE International Conference on Fuzzy Systems*, Beijing, China, 2014;pp. 1518-1525. doi:[[[10.1109/FUZZ-IEEE.2014.6891554]]] - 11 Y. Wang, L. Chen, J. P. Mei., "Incremental fuzzy clustering with multiple medoids for large data,"
*IEEE Transactions on Fuzzy Systems*, vol. 22, no. 6, pp. 1557-1568, 2014.doi:[[[10.1109/TFUZZ.2014.2298244]]] - 12 M. F. K. Minhas, R. A. Abbasi, N. R. Aljohani, A. A. Albeshri, M. Mushtaq, "INTWEEMS: a framework for incremental clustering of tweet streams," in
*Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services*, Brussels, Belgium, 2015;doi:[[[10.1145/2837185.2843853]]] - 13 L. Pradeep, A. M. Sowjanya, "Multi-density based incremental clustering,"
*International Journal of Computer Applications*, vol. 116, no. 17, pp. 6-9, 2015.doi:[[[10.5120/20426-2742]]] - 14 O. Shmueli, L. Shnaiderman, "Incremental clustering of indexed XML data,"
*U.S. Patent 8930407 Jan 6*, 2015.custom:[[[-]]] - 15 F. Cambi, P. Crescenzi, L. Pagli, "Analyzing and comparing on-line news sources via (two-layer) incremental clustering," in
*Proceedings of the 8th International Conference on Fun with Algorithms*, La Maddalena, Italy, 2016;doi:[[[10.4230/LIPIcs.FUN.2016.9]]] - 16 L. Chen, M. Liu, C. Wu, A. Xu, "A novel clustering algorithm and its incremental version for large-scale text collection,"
*Information Technology and Control*, vol. 45, no. 2, pp. 136-147, 2016.doi:[[[10.5755/j01.itc.45.2.8666]]] - 17 D. Wang, A. H. Tan, "Self-regulated incremental clustering with focused preferences," in
*Proceedings of 2016 International Joint Conference on Neural Networks (IJCNN)*, Vancouver, Canada, 2016;pp. 1297-1304. doi:[[[10.1109/IJCNN.2016.7727347]]] - 18 P. Hore, L. O. Hall, D. B. Goldgof, "Single pass fuzzy c means," in
*Proceedings of 2007 International Fuzzy Systems Conference*, London, UK, 2007;pp. 1-7. doi:[[[10.1109/FUZZY.2007.4295372]]] - 19 P. Hore, L. O. Hall, D. B. Goldgof, W. Cheng, "Online fuzzy c means," in
*Proceedings of 2008 Annual Meeting of the North American Fuzzy Information Processing Society*, New York, NY, 2008;pp. 1-5. doi:[[[10.1109/NAFIPS.2008.4531233]]] - 20 A. Maratea, A. Petrosino, M. Manzo, "Adjusted F-Measure and kernel scaling for imbalanced data learning,"
*Information Sciences*, vol. 257, pp. 331-341, 2014.doi:[[[10.1016/j.ins.2013.04.016]]] - 21 R. Clausius, "Ueber verschiedene für die Anwendung bequeme Formen der Hauptgleichungen der mechanischen Wärmetheorie,"
*Annalen der Physik1865*, vol. 201, no. 7, pp. 353-400. doi:[[[10.1002/andp.18652010702]]]