Magnetization dynamics: Difference between revisions

Latest revision as of 17:19, 11 December 2012

Template:Expert-subject Template:One source SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger.^[1] It is a subspace clustering algorithm that builds on the density-based clustering algorithm DBSCAN. SUBCLU can find clusters in axis-parallel subspaces, and uses a bottom-up, greedy strategy to remain efficient.

Approach

SUBCLU uses a monotonicity criteria: if a cluster is found in a subspace $S$ , then each subspace $T \subseteq S$ also contains a cluster. However, a cluster $C \subseteq D B$ in subspace $S$ is not necessarily a cluster in $T \subseteq S$ , since clusters are required to be maximal, and more objects might be contained in the cluster in $T$ that contains $C$ . However, a density-connected set in a subspace $S$ is also a density-connected set in $T \subseteq S$ .

This downward-closure property is utilized by SUBCLU in a way similar to the Apriori algorithm: first, all 1-dimensional subspaces are clustered. All clusters in a higher-dimensional subspace will be subsets of the clusters detected in this first clustering. SUBCLU hence recursively produces $k + 1$ -dimensional candidate subspaces by combining $k$ -dimensional subspaces with clusters sharing $k - 1$ attributes. After pruning irrelevant candidates, DBSCAN is applied to the candidate subspace to find out if it still contains clusters. If it does, the candidate subspace is used for the next combination of subspaces. In order to improve the runtime of DBSCAN, only the points known to belong to clusters in one $k$ -dimensional subspace (which is chosen to contain as little clusters as possible) are considered. Due to the downward-closure property, other point cannot be part of a $k + 1$ -dimensional cluster anyway.

Pseudocode

SUBCLU takes two parameters, $ϵ$ and $M i n P t s$ , which serve the same role as in DBSCAN. In a first step, DBSCAN is used to find 1D-clusters in each subspace spanned by a single attribute:

$S U B C L U (D B, e p s, M i n P t s)$

S_{1} : = \emptyset

C_{1} : = \emptyset

f o r e a c h a \in A t t r i b u t e s

C^{{a}} = D B S C A N (D B, {a}, e p s, M i n P t s)

i f (C^{{a}} \neq \emptyset)

S_{1} : = S_{1} \cup {a}

C_{1} : = C_{1} \cup C^{{a}}

e n d i f

e n d f o r

In a second step, $k + 1$ -dimensional clusters are built from $k$ -dimensional ones:

k : = 1

w h i l e (C_{k} \neq \emptyset)

C a n d S_{k + 1} : = G e n e r a t e C a n d i d a t e S u b s p a c e s (S_{k})

f o r e a c h c a n d \in C a n d S_{k + 1}

b e s t S u b s p a c e : = \min_{s \in S_{k} \land s \subset c a n d} \sum_{C_{i} \in C^{s}} | C_{i} |

C^{c a n d} : = \emptyset

f o r e a c h c l u s t e r c l \in C^{b e s t S u b s p a c e}

C^{c a n d} : = C^{c a n d} \cup D B S C A N (c l, c a n d, e p s, M i n P t s)

i f (C^{c a n d} \neq \emptyset)

S_{k + 1} : = S_{k + 1} \cup c a n d

C_{k + 1} : = C_{k + 1} \cup C^{c a n d}

e n d i f

e n d f o r

e n d f o r

k : = k + 1

e n d w h i l e

$e n d$

The set $S_{k}$ contains all the $k$ -dimensional subspaces that are known to contain clusters. The set $C_{k}$ contains the sets of clusters found in the subspaces. The $b e s t S u b s p a c e$ is chosen to minimize the runs of DBSCAN (and the number of points that need to be considered in each run) for finding the clusters in the candidate subspaces.

Candidate subspaces are generated much alike the Apriori algorithm generates the frequent itemset candidates: Pairs of the $k$ -dimensional subspaces are compared, and if they differ in one attribute only, they form a $k + 1$ -dimensional candidate. However, a number of irrelevant candidates are found as well; they contain a $k$ -dimensional subspace that does not contain a cluster. Hence, these candidates are removed in a second step:

$G e n e r a t e C a n d i d a t e S u b s p a c e s (S_{k})$

C a n d S_{k + 1} : = \emptyset

f o r e a c h s_{1} \in S_{k}

f o r e a c h s_{2} \in S_{k}

i f (s_{1} a n d s_{2} d i f f e r i n e x a c t e l y o n e a t t r i b u t e)

C a n d S_{k + 1} : = C a n d S_{k + 1} \cup {s_{1} \cup s_{2}}

e n d i f

e n d f o r

e n d f o r

// Pruning of irrelevant candidate subspaces

f o r e a c h c a n d \in C a n d S_{k + 1}

f o r e a c h k - e l e m e n t s \subset c a n d

i f (s \in̸ S_{k})

C a n d S_{k + 1} = C a n d S_{k + 1} ∖ {c a n d}

e n d i f

e n d f o r

e n d f o r

$e n d$

Availability

An example implementation of SUBCLU is available in the ELKI framework.

References

↑ Karin Kailing, Hans-Peter Kriegel and Peer Kröger. Density-Connected Subspace Clustering for High-Dimensional Data. In: Proc. SIAM Int. Conf. on Data Mining (SDM'04), pp. 246-257, 2004.

[1] Karin Kailing, Hans-Peter Kriegel and Peer Kröger. Density-Connected Subspace Clustering for High-Dimensional Data. In: Proc. SIAM Int. Conf. on Data Mining (SDM'04), pp. 246-257, 2004.

[1]

@@ Line 1: / Line 1: @@
-Throughout misfortune of financial crises it's a great idea to switch your hard earned money into gold and silver for example precious metal, silver or jewelry.<br>It relies on which of them is the most stable in the marketplace. The value of the rare metal won't ever fall and you can make certain that anytime value of a certain foreign currency is in totally free fall - you may support the same value when you trade the cash back again. For instance you [http://Data.Gov.uk/data/search?q=purchase purchase] some gold regarding One hundred $ and the day after the value  [http://tinyurl.com/ku6vjks ugg boots usa] of the buck falls 2 times after that time that day you�ll be able to sell that gold for 200 US dollars.<br><br>Keeping your cash in rare metal equivalents just like silver coins worth is a great idea to save lots of for a possible financial meltdown. Numerous  [http://tinyurl.com/ku6vjks ugg boots] wise people from around the globe traded almost all their cash in metals your day the financial meltdown was announced in 08.<br>They had simply to win from this. Not only because they saved exactly the same money value while everyone had to shed but also since the value of the metals elevated as a result of a crazy demand later on. In this way, these people were able to produce a lot of cash.<br><br>If you're interested in the silver coins worth then you need to look into the web for specialised web sites that will be capable of giving you a straight solution and many [http://Www.bing.com/search?q=exchange+choices&form=MSNNWS&mkt=en-us&pq=exchange+choices exchange choices]. The value of silver dollars hasn�t changed a lot over the past many years which suggests its a feasible choices for those that are seeking to lead a reliable life without much headache.<br><br>Additionally it is recommended to know the value of old coins. When you've got incredible choices of old money then you can effortlessly offer them for ridiculous amounts of cash in an instant. For anyone who is serious to understand more about this topic then you can visit the best site on the world wide web upon this subject which is known as Coins Worth Money.<br>It is  [http://tinyurl.com/ku6vjks http://tinyurl.com/ku6vjks] usually found at  [http://tinyurl.com/ku6vjks http://tinyurl.com/ku6vjks] the next website address oinsworthmoney.org. Anyone can very easily check out the value of silver coins and also get timely announcements on the state of gold coins for sale. The handy manuals [http://tinyurl.com/ku6vjks uggs on sale] how to buy silver can keep you wanting much more.<br><br>Lose no longer some time and browse the silver dollar values.
+{{expert-subject|Statistics|date=February 2010}}
+{{one source|date= February 2010}}
+'''SUBCLU''' is an algorithm for [[clustering high-dimensional data]] by Karin Kailing, [[Hans-Peter Kriegel]] and Peer Kröger.<ref>Karin Kailing, [[Hans-Peter Kriegel]] and Peer Kröger. ''Density-Connected Subspace Clustering for High-Dimensional Data''. In: ''Proc. SIAM Int. Conf. on Data Mining (SDM'04)'', pp. 246-257, 2004.</ref> It is a [[subspace clustering]] algorithm that builds on the density-based clustering algorithm [[DBSCAN]]. SUBCLU can find [[cluster analysis|clusters]] in [[axis-parallel]] subspaces, and uses a [[Top-down and bottom-up design|bottom-up]], [[greedy algorithm|greedy]] strategy to remain efficient.
+==Approach==
+SUBCLU uses a [[monotonicity]] criteria: if a cluster is found in a subspace <math>S</math>, then each subspace <math>T \subseteq S</math> also contains a cluster. However, a cluster <math>C \subseteq DB</math> in subspace <math>S</math> is not necessarily a cluster in <math>T \subseteq S</math>, since clusters are required to be maximal, and more objects might be contained in the cluster in <math>T</math> that contains <math>C</math>. However, a [[DBSCAN|density-connected set]] in a subspace <math>S</math> is also a density-connected set in <math>T \subseteq S</math>.
+This ''downward-closure property'' is utilized by SUBCLU in a way similar to the [[Apriori algorithm]]: first, all 1-dimensional subspaces are clustered. All clusters in a higher-dimensional subspace will be subsets of the clusters detected in this first clustering. SUBCLU hence recursively produces <math>k+1</math>-dimensional candidate subspaces by combining <math>k</math>-dimensional subspaces with clusters sharing <math>k-1</math> attributes. After pruning irrelevant candidates, [[DBSCAN]] is applied to the candidate subspace to find out if it still contains clusters. If it does, the candidate subspace is used for the next combination of subspaces. In order to improve the runtime of [[DBSCAN]], only the points known to belong to clusters in one <math>k</math>-dimensional subspace (which is chosen to contain as little clusters as possible) are considered. Due to the downward-closure property, other point cannot be part of a <math>k+1</math>-dimensional cluster anyway.
+==Pseudocode==
+SUBCLU takes two parameters, <math>\epsilon\!\,</math> and <math>MinPts</math>, which serve the same role as in [[DBSCAN]]. In a first step, DBSCAN is used to find 1D-clusters in each subspace spanned by a single attribute:
+<math>\!\,SUBCLU(DB, eps, MinPts)</math>
+:<math>S_1 := \emptyset</math>
+:<math>C_1 := \emptyset</math>
+:<math>for\, each\, a \in Attributes</math>
+::<math>C^{\{a\}} = DBSCAN(DB, \{a\}, eps, MinPts)\!\,</math>
+::<math>if (C^{\{a\}} \neq \emptyset)</math>
+:::<math>S_1 := S_1 \cup \{a\}</math>
+:::<math>C_1 := C_1 \cup C^{\{a\}}</math>
+::<math>end\, if</math>
+:<math>end\, for</math>
+In a second step, <math>k+1</math>-dimensional clusters are built from <math>k</math>-dimensional ones:
+:<math>k := 1\!\,</math>
+:<math>while (C_k \neq \emptyset)</math>
+::<math>CandS_{k+1} := GenerateCandidateSubspaces(S_k)\!\,</math>
+::<math>for\, each\, cand \in CandS_{k+1}</math>
+:::<math>bestSubspace := \min_{s \in S_k \wedge s \subset cand} \sum_{C_i \in C^s} |C_i|</math>
+:::<math>C^{cand} := \emptyset</math>
+:::<math>for\, each\, cluster\, cl \in C^{bestSubspace}</math>
+::::<math>C^{cand} := C^{cand} \cup DBSCAN(cl, cand, eps, MinPts)</math>
+::::<math>if\, (C^{cand} \neq \emptyset)</math>
+:::::<math>S_{k+1} := S_{k+1} \cup cand</math>
+:::::<math>C_{k+1} := C_{k+1} \cup C^{cand}</math>
+::::<math>end\, if</math>
+:::<math>end\, for</math>
+::<math>end\, for</math>
+::<math>k:=k+1\!\,</math>
+:<math>end\, while</math>
+<math>end\!\,</math>
+The set <math>S_k</math> contains all the <math>k</math>-dimensional subspaces that are known to contain clusters. The set <math>C_k</math> contains the sets of clusters found in the subspaces. The <math>bestSubspace</math> is chosen to minimize the runs of DBSCAN (and the number of points that need to be considered in each run) for finding the clusters in the candidate subspaces.
+Candidate subspaces are generated much alike the [[Apriori algorithm]] generates the frequent itemset candidates: Pairs of the <math>k</math>-dimensional subspaces are compared, and if they differ in one attribute only, they form a <math>k+1</math>-dimensional candidate. However, a number of irrelevant candidates are found as well; they contain a <math>k</math>-dimensional subspace that does not contain a cluster. Hence, these candidates are removed in a second step:
+<math>\,\!GenerateCandidateSubspaces(S_k)</math>
+:<math>CandS_{k+1} := \emptyset</math>
+:<math>for\,each\,s_1 \in S_k</math>
+::<math>for\,each\,s_2 \in S_k</math>
+:::<math>if\,(s_1\,and\,s_2\,\,\mathit{differ}\,\,in\,\,exactely\,\,one\,\,attribute)</math>
+::::<math>CandS_{k+1} := CandS_{k+1} \cup \{s_1 \cup s_2\}</math>
+:::<math>end\,if</math>
+::<math>end\,for</math>
+:<math>end\,for</math>
+:''// Pruning of irrelevant candidate subspaces''
+:<math>for\,each\,cand \in CandS_{k+1}</math>
+::<math>for\, each\, k-element\,s \subset cand</math>
+:::<math>if \, (s \not \in S_k)</math>
+::::<math>CandS_{k+1} = CandS_{k+1} \setminus \{cand\}</math>
+:::<math>end\,if</math>
+::<math>end\,for</math>
+:<math>end\,for</math>
+<math>end\,\!</math>
+==Availability==
+An example implementation of SUBCLU is available in the [[Environment for DeveLoping KDD-Applications Supported by Index-Structures|ELKI framework]].
+==References==
+<references/>
+[[Category:Data clustering algorithms]]

Magnetization dynamics: Difference between revisions

Latest revision as of 17:19, 11 December 2012

Contents

Approach

Pseudocode

Availability

References

Navigation menu

Magnetization dynamics: Difference between revisions

Latest revision as of 17:19, 11 December 2012

Approach

Pseudocode

Availability

References

Navigation menu

Search