Positively separated sets: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Konradek
mNo edit summary
 
en>Addbot
m Bot: Migrating 1 interwiki links, now provided by Wikidata on d:q7233308
 
Line 1: Line 1:
Hi there. My name is Sophia Meagher even though it is not the title on my birth certificate. As a lady what she truly likes is style and she's been doing it for fairly a whilst. Office supervising is my occupation. Some time in the past she chose to live in Alaska and her parents reside close psychic solutions by lynne ([http://fashionlinked.com/index.php?do=/profile-13453/info/ look at this web-site]).
An '''alternating decision tree''' (ADTree) is a [[machine learning]] method for classification. It generalizes [[Decision tree learning|decision trees]] and has connections to [[boosting (machine learning)|boosting]].
 
==History==
ADTrees were introduced by [[Yoav Freund]] and Llew Mason.<ref name="Freund99">Yoav Freund and Llew Mason.  The
Alternating Decision Tree Algorithm.  Proceedings of the 16th International Conference on Machine Learning, pages 124-133 (1999)
</ref>  However, the algorithm as presented had several typographical errors.  Clarifications and optimizations were later presented by Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby.<ref name="Pfahringer">Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby.  Optimizing the Induction of Alternating Decision Trees.  Proceedings of the Fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.  2001, pp. 477-487</ref>  Implementations are available in [[Weka (machine learning)|Weka]] and JBoost.
 
==Motivation==
Original [[Boosting (meta-algorithm)|boosting]] algorithms typically used either [[decision stump]]s
or decision trees as weak hypotheses.  As an example, boosting [[decision stump]]s creates
a set of <math>T</math> weighted decision stumps (where <math>T</math>
is the number of boosting iterations), which then vote on the final classification according to their weights.  Individual decision stumps are weighted according to their ability to classify the data. 
 
Boosting a simple learner results in an unstructured set of <math>T</math> hypotheses, making it difficult to infer [[correlation]]s between attributes.  Alternating decision trees introduce structure to the set of hypotheses by requiring that they build off a hypothesis that was produced in an earlier iteration.  The resulting set of hypotheses can be visualized in a tree based on the relationship between a hypothesis and its "parent."
 
Another important feature of boosted algorithms is that the data is given a different [[Frequency distribution|distribution]] at each iteration.  Instances that are misclassified are given a larger weight while accurately classified instances are given reduced weight.
 
==Alternating decision tree structure==
An alternating decision tree consists of decision nodes and prediction nodes.  '''Decision nodes''' specify a predicate condition.  '''Prediction nodes''' contain a single number.  ADTrees always have prediction nodes as both root and leaves.  An instance is classified by an ADTree by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed.  This is different from binary classification trees such as CART ([[Classification and regression tree]]) or [[C4.5]] in which an instance follows only one path through the tree.
 
===Example===
The following tree was constructed using JBoost on the spambase dataset<ref>{{cite web|url=http://www.ics.uci.edu/~mlearn/databases/spambase/ |title=UCI Machine Learning Repository |publisher=Ics.uci.edu |date= |accessdate=2012-03-16}}</ref> (available from the UCI Machine Learning Repository).<ref>{{cite web|url=http://www.ics.uci.edu/~mlearn/MLRepository.html |title=UCI Machine Learning Repository |publisher=Ics.uci.edu |date= |accessdate=2012-03-16}}</ref>  In this example, spam is coded as <math>1</math> and regular email is coded as <math>-1</math>. 
 
[[Image:spambase adtree.png|center|800px|An ADTree for 6 iterations on the Spambase dataset.]]
 
The following table contains part of the information for a single instance.
 
{| class="wikitable"
|+ An instance to be classified
|-
! Feature
! Value
|-
| char_freq_bang
| 0.08
|-
| word_freq_hp
| 0.4
|-
| capital_run_length_longest
| 4
|-
| char_freq_dollar
| 0
|-
| word_freq_remove
| 0.9
|-
| word_freq_george
| 0
|-
| Other features
| ...
|}
 
The instance is scored by summing all of the prediction nodes through which it passes.  In the case of the instance above, the score is
calculate as
{| class="wikitable"
|+ Score for the above instance
|-
! Iteration
| 0
| 1
| 2
| 3
| 4
| 5
| 6
|-
! Instance values
| N/A
| .08 < .052 = f
| .4 < .195 = f
| 0 < .01 = t
| 0 < 0.005 = t
| N/A
| .9 < .225 = f
|-
! Prediction
| -0.093
| 0.74
| -1.446
| -0.38
| 0.176
| 0
| 1.66
|}
The final score of <math>0.657</math> is positive, so the instance is classified as spam.  The magnitude of the value is a measure of confidence in the prediction. The original authors list three potential levels of interpretation for the set of attributes identified by an ADTree:
* Individual nodes can be evaluated for their own predictive ability.
* Sets of nodes on the same path may be interpreted as having a joint effect
* The tree can be interpreted as a whole.
Care must be taken when interpreting individual nodes as the scores reflect a re weighting of the data in each iteration.
 
==Description of the algorithm==
The inputs to the alternating decision tree algorithm are:
* A set of inputs <math>(x_1,y_1),\ldots,(x_m,y_m)</math> where <math>x_i</math> is a vector of attributes and <math>y_i</math> is either -1 or 1.  Inputs are also called instances.
* A set of weights <math>w_i</math> corresponding to each instance.
 
The fundamental element of the ADTree algorithm is the rule.  A single
rule consists of a precondition, a condition, and two scores.  A
condition is a predicate of the form "attribute <comparison> value."
A precondition is simply a [[logical conjunction]] of conditions.
Evaluation of a rule involves a pair of nested if statements:
 
1  '''if'''(precondition)
2      '''if'''(condition)
3          '''return''' score_one
4      '''else'''
5          '''return''' score_two
6      '''end if'''
7  '''else'''
8      '''return''' 0
9  '''end if'''
 
Several auxiliary functions are also required by the algorithm:
* <math>W_+(c)</math> returns the sum of the weights of all positively labeled examples that satisfy predicate <math>c</math>
* <math>W_-(c)</math> returns the sum of the weights of all negatively labeled examples that satisfy predicate <math>c</math>
* <math>W(c) = W_+(c) + W_-(c)</math> returns the sum of the weights of all  examples that satisfy predicate <math>c</math>
 
The algorithm is as follows:
1  '''function''' ad_tree
2  '''input''' Set of <math>m</math> training instances
3
4  <math>w_i = 1/m</math> for all <math>i</math>
5  <math>a = 1/2 \textrm{ln}\frac{W_+(true)}{W_-(true)}</math>
6  <math>R_0 =</math> a rule with scores <math>a</math> and <math>0</math>, precondition "true" and condition "true."
7  <math>\mathcal{P} = \{true\}</math>
8  <math>\mathcal{C} = </math> the set of all possible conditions
9  '''for'''<math>j = 1 \dots T</math>
10      <math>p \in \mathcal{P}, c \in \mathcal{C} </math> get values that minimize <math> z = 2 \left(  \sqrt{W_+(p \wedge c) W_-(p \wedge c)} + \sqrt{W_+(p \wedge \neg c) W_-(p \wedge \neg c)} \right) +W(\neg p) </math>
11      <math>\mathcal{P} += p \wedge c + p \wedge \neg c</math>
12      <math>a_1=\frac{1}{2}\textrm{ln}\frac{W_+(p\wedge c)+1}{W_-(p \wedge c)+1}</math>
13      <math>a_2=\frac{1}{2}\textrm{ln}\frac{W_+(p\wedge \neg c)+1}{W_-(p \wedge \neg c)+1}</math>
14      <math>R_j = </math> new rule with precondition <math>p</math>, condition <math>c</math>, and weights <math>a_1</math> and <math>a_2</math>
15      <math>w_i = w_i e^{ -y_i R_j(x_i)  }</math>
16  '''end for'''
17  '''return''' set of <math>R_j</math>
 
The set <math>\mathcal{P}</math> grows by two preconditions in each iteration, and it is possible to derive the tree structure of a set of rules by making note of the precondition that is used in each successive rule.
 
==Empirical results==
Figure 6 in the original paper<ref name="Freund99" /> demonstrates that ADTrees are typically as robust as boosted decision trees and boosted [[decision stump]]s.  Typically, equivalent accuracy can be achieved with a much simpler tree structure than recursive partitioning algorithms.
 
==References==
{{reflist}}
 
==External links==
* [http://jboost.sourceforge.net/presentations/BoostingLightIntro.pdf An introduction to Boosting and ADTrees]  (Has many graphical examples of alternating decision trees in practice).
* [http://jboost.sourceforge.net/ JBoost] software implementing ADTrees.
 
{{DEFAULTSORT:Alternating Decision Tree}}
[[Category:Decision trees]]
[[Category:Classification algorithms]]

Latest revision as of 02:17, 17 March 2013

An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.

History

ADTrees were introduced by Yoav Freund and Llew Mason.[1] However, the algorithm as presented had several typographical errors. Clarifications and optimizations were later presented by Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby.[2] Implementations are available in Weka and JBoost.

Motivation

Original boosting algorithms typically used either decision stumps or decision trees as weak hypotheses. As an example, boosting decision stumps creates a set of T weighted decision stumps (where T is the number of boosting iterations), which then vote on the final classification according to their weights. Individual decision stumps are weighted according to their ability to classify the data.

Boosting a simple learner results in an unstructured set of T hypotheses, making it difficult to infer correlations between attributes. Alternating decision trees introduce structure to the set of hypotheses by requiring that they build off a hypothesis that was produced in an earlier iteration. The resulting set of hypotheses can be visualized in a tree based on the relationship between a hypothesis and its "parent."

Another important feature of boosted algorithms is that the data is given a different distribution at each iteration. Instances that are misclassified are given a larger weight while accurately classified instances are given reduced weight.

Alternating decision tree structure

An alternating decision tree consists of decision nodes and prediction nodes. Decision nodes specify a predicate condition. Prediction nodes contain a single number. ADTrees always have prediction nodes as both root and leaves. An instance is classified by an ADTree by following all paths for which all decision nodes are true and summing any prediction nodes that are traversed. This is different from binary classification trees such as CART (Classification and regression tree) or C4.5 in which an instance follows only one path through the tree.

Example

The following tree was constructed using JBoost on the spambase dataset[3] (available from the UCI Machine Learning Repository).[4] In this example, spam is coded as 1 and regular email is coded as 1.

An ADTree for 6 iterations on the Spambase dataset.
An ADTree for 6 iterations on the Spambase dataset.

The following table contains part of the information for a single instance.

An instance to be classified
Feature Value
char_freq_bang 0.08
word_freq_hp 0.4
capital_run_length_longest 4
char_freq_dollar 0
word_freq_remove 0.9
word_freq_george 0
Other features ...

The instance is scored by summing all of the prediction nodes through which it passes. In the case of the instance above, the score is calculate as

Score for the above instance
Iteration 0 1 2 3 4 5 6
Instance values N/A .08 < .052 = f .4 < .195 = f 0 < .01 = t 0 < 0.005 = t N/A .9 < .225 = f
Prediction -0.093 0.74 -1.446 -0.38 0.176 0 1.66

The final score of 0.657 is positive, so the instance is classified as spam. The magnitude of the value is a measure of confidence in the prediction. The original authors list three potential levels of interpretation for the set of attributes identified by an ADTree:

  • Individual nodes can be evaluated for their own predictive ability.
  • Sets of nodes on the same path may be interpreted as having a joint effect
  • The tree can be interpreted as a whole.

Care must be taken when interpreting individual nodes as the scores reflect a re weighting of the data in each iteration.

Description of the algorithm

The inputs to the alternating decision tree algorithm are:

  • A set of inputs (x1,y1),,(xm,ym) where xi is a vector of attributes and yi is either -1 or 1. Inputs are also called instances.
  • A set of weights wi corresponding to each instance.

The fundamental element of the ADTree algorithm is the rule. A single rule consists of a precondition, a condition, and two scores. A condition is a predicate of the form "attribute <comparison> value." A precondition is simply a logical conjunction of conditions. Evaluation of a rule involves a pair of nested if statements:

1  if(precondition)
2      if(condition)
3          return score_one
4      else
5          return score_two
6      end if
7  else
8      return 0
9  end if

Several auxiliary functions are also required by the algorithm:

  • W+(c) returns the sum of the weights of all positively labeled examples that satisfy predicate c
  • W(c) returns the sum of the weights of all negatively labeled examples that satisfy predicate c
  • W(c)=W+(c)+W(c) returns the sum of the weights of all examples that satisfy predicate c

The algorithm is as follows:

1  function ad_tree
2  input Set of m training instances
3 
4  wi=1/m for all i
5  a=1/2lnW+(true)W(true)
6  R0= a rule with scores a and 0, precondition "true" and condition "true."
7  𝒫={true} 
8  𝒞= the set of all possible conditions
9  forj=1T
10      p𝒫,c𝒞 get values that minimize z=2(W+(pc)W(pc)+W+(p¬c)W(p¬c))+W(¬p)
11      𝒫+=pc+p¬c
12      a1=12lnW+(pc)+1W(pc)+1
13      a2=12lnW+(p¬c)+1W(p¬c)+1
14      Rj= new rule with precondition p, condition c, and weights a1 and a2
15      wi=wieyiRj(xi)
16  end for
17  return set of Rj

The set 𝒫 grows by two preconditions in each iteration, and it is possible to derive the tree structure of a set of rules by making note of the precondition that is used in each successive rule.

Empirical results

Figure 6 in the original paper[1] demonstrates that ADTrees are typically as robust as boosted decision trees and boosted decision stumps. Typically, equivalent accuracy can be achieved with a much simpler tree structure than recursive partitioning algorithms.

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

External links

  1. 1.0 1.1 Yoav Freund and Llew Mason. The Alternating Decision Tree Algorithm. Proceedings of the 16th International Conference on Machine Learning, pages 124-133 (1999)
  2. Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby. Optimizing the Induction of Alternating Decision Trees. Proceedings of the Fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2001, pp. 477-487
  3. Template:Cite web
  4. Template:Cite web