# br This property is equivalent to the statement that

This property is equivalent to the statement that for some data sample, in the E-step of each iteration, a non-null mem-bership assignment is more likely to be assigned a higher confidence. To formally address this statement, the following lemma is introduced:

Now we turn to the proof of Property 3. Without loss of generality, we prove for the first component.

which completes the proof.

Property 4. BPCM can assign proper membership values for overlapping samples and outlier samples.

Proof. Without loss of generality, the total number of clusters is set to 2. By applying Property 3 iteratively with = 1 for both clusters we observe that an overlapping sample with distances to both centroids lower than a threshold

Remark. This is a remarkable distinction between BPCM and other models listed in Table 3. An outlier is an observation that is distant from other observations and is more likely to be generated by noise [20]. An overlapping sample is an observation which is not an outlier but has similar distances to multiple cluster centroids. In data clustering, the outlier samples are preferred to be assigned very small membership values for all the clusters and the overlapping samples are preferred to be assigned large membership values to the clusters close enough and small membership values to the rest. In FCM and GMM, since the distances between the outlier samples and all the cluster centroids are similar (a large value), they Rottlerin are likely to
be assigned a membership vector of
1

T
in the E-step, which equally influence the estimation of the cluster

NC

centroids. Moreover, outlier samples and overlapping ones are indistinguishable in FCM.

To illustrate the membership assignment by BPCM for these two kinds of data points, we assume NC = 2 for simplicity and analyze the synthetic butterfly dataset shown in Fig. 1. The membership distributions of several data points by the proposed BPCM in the E-step are demonstrated with heatmap in Fig. 2, where the preference for possible weight is illus-trated by hues. The vertical and horizontal axes denote the memberships to the clusters centered at the 4th and the 11th observation respectively.

It is interesting to note that for the data point 7 and 15, their distances to both centroids are the same, and thus FCM, PCM, and GMM would treat them equally in the E-step. However, the proposed BPCM distinguishes them by assigning the highest confidence to u7 = (1, 1)T and u15 = (0, 0)T, since the grids (1,1) and (0,0) attract the highest confidence in Fig. 2(c) and (d) respectively. Hence, BPCM successfully distinguishes the overlapping samples and the outliers and assigns appropriate membership values for them.

Fig. 1. The indexed butterfly data set, where the 4th and 11th data points are the cluster centroids.

Fig. 2. Heat maps that reflect the confidence assigned to different areas in the membership vector space. (a)–(d) denotes the assignment for points indexed 1,4,7,15, respectively. The vertical and horizontal axes denote the memberships to the clusters centered at the 4th and the 11th observation respectively.

Fig. 3. The flowchart of missing attributes estimation.

3.3. Missing attribute estimation by BPCM

The missing attribute estimation procedure is illustrated in Fig. 3, where represents the data clustering algorithm adopted. It includes the following steps:

1. Perform data clustering on the complete data subset and obtain the representative patterns (in the form of centroids);

2. For any data sample in the incomplete data subset, find its closest centroid based on the provided components in the sample vector. Assign the missing values with the corresponding component in the centroid;

Fig. 4. The bagging module.

Fig. 5. The metadata generating module and the ensemble module. (For interpretation of the references to color in the text, the reader is referred to the web version of altitudinal gradient article.)