Cardinal characteristic of the continuum: Difference between revisions

Revision as of 18:18, 26 October 2013

The Dunn index (DI) (introduced by J. C. Dunn) in 1974 is a metric for evaluating clustering algorithms.^[1] This is part of a group of validity indices including the Davies–Bouldin index, in that it is an internal evaluation scheme, where the result is based on the clustered data itself. As do all other such indices, the aim is to identify sets of clusters that are compact, with a small variance between members of the cluster, and well separated, where the means of different clusters are sufficiently far apart, as compared to the within cluster variance. For a given assignment of clusters, a higher Dunn index indicates better clustering. One of the drawbacks of using this, is the computational cost as the number of clusters and dimensionality of the data increase.

Preliminaries

There are many ways to define the size or diameter of a cluster. It could be the distance between the farthest two points inside a cluster, it could be the mean of all the pairwise distances between data points inside the cluster, or it could as well be the distance of each data point from the cluster centroid. Each of these formulations are mathematically shown below:

Let C_i be a cluster of vectors. Let x and y be any two n dimensional feature vectors assigned to the same cluster C_i.

\Delta _{i}={\underset {x,y\in C_{i}}{\text{max}}}d(x,y)

, which calculates the maximum distance.

\Delta _{i}={\dfrac {1}{|C_{i}||C_{i}-1|}}{\underset {x,y\in C_{i},x\neq y}{\sum }}d(x,y)

, which calculates the mean distance between all pairs.

\Delta _{i}={\dfrac {{\underset {x\in C_{i}}{\sum }}d(x,\mu )}{|C_{i}|}},\mu ={\dfrac {{\underset {x\in C_{i}}{\sum }}x}{|C_{i}|}}

, calculates distance of all the points from the mean.

This can also be said about the intercluster distance, where similar formulations can be made, using either the closest two data points, one in each cluster, or the farthest two, or the distance between the centroids and so on. The definition of the index includes any such formulation, and the family of indices so formed are called Dunn-like Indices. Let

\delta (C_{i},C_{j})

be this intercluster distance metric, between clusters C_i and C_j.

Definition

With the above notation, if there are m clusters, then the Dunn Index for the set is defined as:

DI_{m}={\underset {1\leqslant i\leqslant m}{\text{min}}}\left\{{\underset {1\leqslant j\leqslant m,j\neq i}{\text{min}}}\left\{{\frac {\delta (C_{i},C_{j})}{{\underset {1\leqslant k\leqslant m}{\text{max}}}\Delta _{k}}}\right\}\right\}

.

Explanation

Being defined in this way, the DI depends on m, the number of clusters in the set. If the number of clusters is not known apriori, the m for which the DI is the highest can be chosen as the number of clusters. There is also some flexibility when it comes to the definition of d(x,y) where any of the well known metrics can be used, like Manhattan distance or Euclidean distance based on the geometry of the clustering problem. This formulation has a peculiar problem, in that if one of the clusters is badly behaved, where the others are tightly packed, since the denominator contains a 'max' term instead of an average term, the Dunn Index for that set of clusters will be uncharacteristically low. This is thus some sort of a worst case indicator, and has to be used keeping that in mind. There are ready implementation of Dunn index in some vector based programming languages like MATLAB, R (programming language) and Apache Mahout.^[2] ^[3] ^[4]

Notes and references

External links

[1] Template:Cite doi

[2] Template:Cite web

[3] Template:Cite web

[4] Template:Cite web

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
-<br><br>Can XXR 521 rims make my car look better? If this describes the question you embark asking, compared to article will provide you with the details you have to have a better idea if these wheels are befitting for your stay on. Just so you know, these rims are to be able to update vehicle's appearance clear that and also essentially the aggressive and gutsy image. Furthermore, their low weight and race inspired construction will admit above average handling and road sensation.<br><br>A minor inconvenience my partner and i noted was that it's what I consider end up being a slight design defect. The stainless trim on the glass door wraps around inside the oven, meaning it can heat moving upward. It doesn't help make the handle itself get hot, but you to be weary of not to touch the trim that gets exposed for the inner stove. I don't consider this to viewed as major flaw, but it's something that in order to to keep in mind created the furnace.<br><br>Racing requires extreme use, such as hard cornering which wheels should be able to control. Divert column is one of the challenges you may face competition, but may possibly be prevented if your use of wheel racing Tenzor. Contain the opportunity to withstand severe driving a racing without damage so that it will be worth difficult earned money.<br><br>Currently this Underwire Mio design by Calvin Klein comes in three exquisite solid colors in a piece brazilian bikini. They offer first and foremost regular [http://www.5leji.blogspot.com/2012/05/matte-black-wraps.html تجليد أسود مطفي] which will is the ideal color for virtually any thin seem. Also offered is a taupe color may call Mink, which looks absolutely stunning for an added formal investigation. The last color offered can be a striking Turquoise which suites a bolder taste incredibly well. This exquisite underwire swimwear comes from a full selection of sizes from four to sixteen. It retails for around one hundred dollars, therefore you are persistent to search around online you should find it much less expensive than this.<br><br>The VAIO X's touchpad was a decently sized 2.1 x 1.6 inches and offered little chaffing. However, like the keyboard, it would have been a smidgen larger especially considering it's capable of recognizing multitouch gestures, like pinch and zoom. Two mouse buttons below likewise small but responsive.<br><br>An accurate thermometer is suggested. You will use it to time your cooking, to decide when in order to fuel or adjust the temperature and then determine as soon as the food is done. If the thermometer that provides your smoker is not accurate (which many seem to be not, irrespective of how much obtain a for the smoker), get one separately. Its worth the amount.<br><br>There are extensive reasons to consider an electric barbeque grill and not charcoal or gas grills, the first benefit being that you won't have to concern yourself any igniters, filling the gas tank or the soot from charcoal. There isn't mess the new Dimplex electric grill additionally it only wants a few minutes to connect, switch as well as have the grill ready for barbequing.
+The '''Dunn index (DI)''' (introduced by J. C. Dunn) in 1974 is a metric for evaluating [[clustering algorithm]]s.<ref>{{cite doi|10.1080/01969727308546046}}</ref> This is part of a group of validity indices including the [[Davies–Bouldin index]], in that it is an internal evaluation scheme, where the result is based on the clustered data itself. As do all other such indices, the aim is to identify sets of clusters that are compact, with a small variance between members of the cluster, and well separated, where the means of different clusters are sufficiently far apart, as compared to the within cluster variance. For a given assignment of clusters, a higher Dunn index indicates better clustering. One of the drawbacks of using this, is the computational cost as the number of clusters and dimensionality of the data increase.
+==Preliminaries==
+There are many ways to define the size or diameter of a cluster. It could be the distance between the farthest two points inside a cluster, it could be the mean of all the pairwise distances between data points inside the cluster, or it could as well be the distance of each data point from the cluster centroid. Each of these formulations are mathematically shown below:
+Let ''C''<sub>''i''</sub> be a cluster of vectors. Let ''x'' and ''y'' be any two n dimensional feature vectors assigned to the same cluster ''C''<sub>''i''</sub>.
+: <math> \Delta_i =   \underset{x , y \in C_i}{\text{max}} d(x,y) </math>  , which calculates the maximum distance.
+: <math> \Delta_i =   \dfrac{1}{|C_i| |C_i - 1|} \underset{x , y \in C_i, x \neq y}{\sum} d(x,y) </math>  , which calculates the mean distance between all pairs.
+: <math> \Delta_i =   \dfrac{\underset{x \in C_i}{\sum} d(x,\mu)}{|C_i|} , \mu =   \dfrac{\underset{x \in C_i}{\sum} x}{|C_i|}  </math>  , calculates distance of all the points from the mean.
+This can also be said about the intercluster distance, where similar formulations can be made, using either the  closest two data points, one in each cluster, or the farthest two, or the distance between the centroids and so on. The definition of the index includes any such formulation, and the family of indices so formed are called Dunn-like Indices. Let
+: <math> \delta(C_i,C_j) </math> be this intercluster distance metric, between clusters ''C''<sub>''i''</sub> and ''C''<sub>''j''</sub>.
+==Definition==
+With the above notation, if there are ''m'' clusters, then the Dunn Index for the set is defined as:
+: <math> DI_m = \underset{ 1 \leqslant i \leqslant m}{\text{min}} \left\{ \underset{ 1 \leqslant j \leqslant m, j \neq i}{\text{min}} \left\{ \frac{\delta(C_i,C_j)}{ \underset{ 1 \leqslant k \leqslant m}{\text{max}} \Delta_k} \right\} \right\} </math>.
+==Explanation==
+Being defined in this way, the ''DI'' depends on ''m'', the number of clusters in the set. If the number of clusters is not known apriori, the ''m'' for which the ''DI'' is the highest can be chosen as the number of clusters. There is also some flexibility when it comes to the definition of ''d(x,y)'' where any of the well known metrics can be used, like [[Manhattan distance]] or [[Euclidean distance]] based on the geometry of the clustering problem. This formulation has a peculiar problem, in that if one of the clusters is badly behaved, where the others are tightly packed, since the denominator contains a 'max' term instead of an average term, the Dunn Index for that set of clusters will be uncharacteristically low. This is thus some sort of a worst case indicator, and has to be used keeping that in mind. There are ready implementation of Dunn index in some vector based programming languages like [[MATLAB]], [[R (programming language)]] and [[Apache Mahout]].<ref>{{cite web|url=http://www.mathworks.com/matlabcentral/fileexchange/27859-dunns-index |title=MATLAB implementation of the Dunn Index |accessdate=5 December 2011}}</ref> <ref>{{cite web|last=Lukasz|first=Nieweglowski|title=Package ‘clv’|url=http://cran.r-project.org/web/packages/clv/clv.pdf|work=R project|publisher=CRAN|accessdate=2 April 2013}}</ref> <ref>{{cite web|title=Apache Mahout|url=http://mahout.apache.org/|publisher=Apache Software Foundation|accessdate=9 May 2013}}</ref>
+== Notes and references ==
+<references/>
+==External links==
+* http://www.sciencedirect.com/science/article/pii/S0031320303002838
+* http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=499469
+* http://machaon.karanagai.com/validation_algorithms.html
+[[Category:Clustering criteria]]

Cardinal characteristic of the continuum: Difference between revisions

Revision as of 18:18, 26 October 2013

Contents

Preliminaries

Definition

Explanation

Notes and references

External links

Navigation menu

Cardinal characteristic of the continuum: Difference between revisions

Revision as of 18:18, 26 October 2013

Preliminaries

Definition

Explanation

Notes and references

External links

Navigation menu

Search