Decision tree pruning: Difference between revisions
Statistic-based pruning |
ForgotMyPW (talk | contribs) Undid revision 941733575 by 194.182.74.227 (talk) Please seek consensus on talk page before re-posting this content, Dmitry (see section on Decision Stream editing campaign). |
||
Line 19: | Line 19: | ||
# The subtree that minimizes <math>\frac{\operatorname{err}(\operatorname{prune}(T,t),S)-\operatorname{err}(T,S)}{\left\vert\operatorname{leaves}(T)\right\vert-\left\vert\operatorname{leaves}(\operatorname{prune}(T,t))\right\vert}</math> is chosen for removal. |
# The subtree that minimizes <math>\frac{\operatorname{err}(\operatorname{prune}(T,t),S)-\operatorname{err}(T,S)}{\left\vert\operatorname{leaves}(T)\right\vert-\left\vert\operatorname{leaves}(\operatorname{prune}(T,t))\right\vert}</math> is chosen for removal. |
||
The function {{tmath|\operatorname{prune}(T,t)}} defines the tree obtained by pruning the subtrees {{tmath|t}} from the tree {{tmath|T}}. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. |
The function {{tmath|\operatorname{prune}(T,t)}} defines the tree obtained by pruning the subtrees {{tmath|t}} from the tree {{tmath|T}}. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. |
||
===Statistic-based pruning=== |
|||
The statistic-based pruning of leaves from the same and/or different levels of predictive model was proposed as a part of [[Decision Stream]] learning technique. Every group of leaves containing data samples similar according to the test statistics is merged into a new leaf, increasing the number of samples in nodes of trained model and reducing the tree width. The predictive model is growing till no improvements are achievable, considering different data recombinations, and resulting in deep directed acyclic graph architecture and statistically-significant data partition.<ref name="Decision stream">{{cite journal|author1=Ignatov, D.Yu.|author2=Ignatov, A.D.|title=Decision Stream: Cultivating Deep Decision Trees|journal=IEEE ICTAI|pages=905-912|doi=10.1109/ICTAI.2017.00140|date=2017|arxiv=1704.07657|url=https://www.researchgate.net/publication/316471270_Decision_Stream_Cultivating_Deep_Decision_Trees}}</ref> |
|||
==See also== |
==See also== |
Revision as of 20:34, 26 February 2020
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (May 2008) |
Pruning is a technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.
Introduction
One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information.[1]
Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance.
Techniques
Pruning can occur in a top down or bottom up fashion. A top down pruning will traverse nodes and trim subtrees starting at the root, while a bottom up pruning will start at the leaf nodes. Below are two popular pruning algorithms.
Reduced error pruning
One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed.
Cost complexity pruning
Cost complexity pruning generates a series of trees where is the initial tree and is the root alone. At step , the tree is created by removing a subtree from tree and replacing it with a leaf node with value chosen as in the tree building algorithm. The subtree that is removed is chosen as follows:
- Define the error rate of tree over data set as .
- The subtree that minimizes is chosen for removal.
The function defines the tree obtained by pruning the subtrees from the tree . Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation.
See also
References
- Judea Pearl, Heuristics, Addison-Wesley, 1984
- Pessimistic Decision tree pruning based on Tree size[2]
- ^ Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer: 2001, pp. 269-272
- ^ Mansour, Y. (1997), "Pessimistic decision tree pruning based on tree size", Proc. 14th International Conference on Machine Learning: 195–201