Posted by: kaising | March 15, 2010

notes on decision tree learning algorithms

Goal of decision tree learning:

Take a set of data entries & the results of a judgement/classification on them
–> be able to classify new data entries

Concept Learning System (CLS)

     Let C be the set of all data entries/example “instances”.
     Termination
         If they all have evaluation “Yes”, make a Yes node and it is done.
         If they all have evaluation “No”, make a No node and it is done.
         (etc.)
     Otherwise
         Create a decision node.
         Pick a property X (column of the data entries).
         Partition C into C1, C2, … Cn based on this discretized value/property X.
         recurse on each subset of C.

Quinlan’s ID3 Algorithm

     CLS plus a particular way of (greedily) selecting which feature to branch on.

     It is run on a fixed set of training data, and is not incremental.
     It is expected to be inductive, i.e. forming rules based on a small sample is hoped
     to lead to correctly classifying the entire space of possibilities.
         It may misclassify data.

     Pick the feature that allows for a classification with the highest information gain.
     Specifically, pick the feature that has the lowest entropy, i.e. lowest
     disorganization of the feedback data.

     A measure of entropy is obtained as follows, assuming there are two evaluation
     judgement options [C.R. Dyer]:
         log_2 |S| = expected work to guess an element in a set S, with size |S|.
         if the evaluation options are Y and N, then:

Quinlan’s C4.5 Algorithm

Sources:

http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm
CLS and ID3

http://www.cis.temple.edu/~ingargio/cis587/readings/id3-c45.html
ID3 and CS4.5

http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html
ID3 plus example

http://en.wikipedia.org/wiki/Ross_Quinlan#ID3
ID3

http://en.wikipedia.org/wiki/C4.5_algorithm
C4.5

http://pages.cs.wisc.edu/~dyer/cs540/notes/learning.html
Course notes on machine learning, and decision trees. Also has an algorithm for decision trees.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: