Goal of decision tree learning:
Take a set of data entries & the results of a judgement/classification on them
–> be able to classify new data entries
Concept Learning System (CLS)
Let C be the set of all data entries/example “instances”.
Termination
If they all have evaluation “Yes”, make a Yes node and it is done.
If they all have evaluation “No”, make a No node and it is done.
(etc.)
Otherwise
Create a decision node.
Pick a property X (column of the data entries).
Partition C into C1, C2, … Cn based on this discretized value/property X.
recurse on each subset of C.
Quinlan’s ID3 Algorithm
CLS plus a particular way of (greedily) selecting which feature to branch on.
It is run on a fixed set of training data, and is not incremental.
It is expected to be inductive, i.e. forming rules based on a small sample is hoped
to lead to correctly classifying the entire space of possibilities.
It may misclassify data.
Pick the feature that allows for a classification with the highest information gain.
Specifically, pick the feature that has the lowest entropy, i.e. lowest
disorganization of the feedback data.
A measure of entropy is obtained as follows, assuming there are two evaluation
judgement options [C.R. Dyer]:
log_2 |S| = expected work to guess an element in a set S, with size |S|.
if the evaluation options are Y and N, then:
Quinlan’s C4.5 Algorithm
Sources:
http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm
CLS and ID3
http://www.cis.temple.edu/~ingargio/cis587/readings/id3-c45.html
ID3 and CS4.5
http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html
ID3 plus example
http://en.wikipedia.org/wiki/Ross_Quinlan#ID3
ID3
http://en.wikipedia.org/wiki/C4.5_algorithm
C4.5
http://pages.cs.wisc.edu/~dyer/cs540/notes/learning.html
Course notes on machine learning, and decision trees. Also has an algorithm for decision trees.