Goal of decision tree learning:
Take a set of data entries & the results of a judgement/classification on them
–> be able to classify new data entries
Concept Learning System (CLS)
Let C be the set of all data entries/example “instances”.
If they all have evaluation “Yes”, make a Yes node and it is done.
If they all have evaluation “No”, make a No node and it is done.
Create a decision node.
Pick a property X (column of the data entries).
Partition C into C1, C2, … Cn based on this discretized value/property X.
recurse on each subset of C.
Quinlan’s ID3 Algorithm
CLS plus a particular way of (greedily) selecting which feature to branch on.
It is run on a fixed set of training data, and is not incremental.
It is expected to be inductive, i.e. forming rules based on a small sample is hoped
to lead to correctly classifying the entire space of possibilities.
It may misclassify data.
Pick the feature that allows for a classification with the highest information gain.
Specifically, pick the feature that has the lowest entropy, i.e. lowest
disorganization of the feedback data.
A measure of entropy is obtained as follows, assuming there are two evaluation
judgement options [C.R. Dyer]:
log_2 |S| = expected work to guess an element in a set S, with size |S|.
if the evaluation options are Y and N, then:
Quinlan’s C4.5 Algorithm
ID3 plus example
Course notes on machine learning, and decision trees. Also has an algorithm for decision trees.