Decision Trees

Decision trees are a class of predictive data mining tools which predict either a categorical or continuous response variable. They get their name from the structure of the models built. A series of decisions are made to segment the data into homogeneous subgroups. This is also called recursive partitioning. When drawn out graphically, the model can resemble a tree with branches.

A decision tree is comprised of nodes and splits to the data. The tree starts with all training data residing in the first node. An initial split is made using a predictor variable, segmenting the data into 2 or more child nodes. Splits can then be made from the child nodes. A terminal node is one where no more splits are made. Predictions are made based on the make-up of terminal nodes.

Several tools fall into the category of decision tree including Classification and Regression Trees (C&RT) , Chi Square Automatic Interaction Detector (CHAID), Random Forests and Boosted Trees. Each of these tools has unique qualities while sharing the principles of decision trees. C&RT and CHAID both build only one tree, while Random Forest and Boosted Trees build multiple.