Jump to content

MNIST database: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Abionab (talk | contribs)
No edit summary
Abionab (talk | contribs)
No edit summary
Line 12: Line 12:
Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks.<ref name="Multideep" /> The highest error rate listed<ref name="LeCun" /> on the original website of the database is 12 percent, which is achieved with no preprocessing using a [[Support vector machine|SVM]] with a 1-layer neural network.<ref name="Gradient" />
Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks.<ref name="Multideep" /> The highest error rate listed<ref name="LeCun" /> on the original website of the database is 12 percent, which is achieved with no preprocessing using a [[Support vector machine|SVM]] with a 1-layer neural network.<ref name="Gradient" />


A best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.<ref>{{cite journal|last=Kussul|first=Ernst|author2=Tatiana Baidyk|title=Improved method of handwritten digit recognition tested on MNIST database|journal=Image and Vision Computing|year=2004|volume=22|pages=971–981|doi=10.1016/j.imavis.2004.03.008|url=https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf|accessdate=20 September 2013}}</ref>
In 2004, A best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.<ref>{{cite journal|last=Kussul|first=Ernst|author2=Tatiana Baidyk|title=Improved method of handwritten digit recognition tested on MNIST database|journal=Image and Vision Computing|year=2004|volume=22|pages=971–981|doi=10.1016/j.imavis.2004.03.008|url=https://vlabdownload.googlecode.com/files/Image_VisionComputing.pdf|accessdate=20 September 2013}}</ref>


Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either [[Affine transformation|affine distortions]] or [[Elastic deformation|elastic distortions]].<ref name="LeCun" /> Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.<ref>{{cite journal|last=Ranzato|first=Marc’Aurelio|author2=Christopher Poultney |author3=Sumit Chopra |author4=Yann LeCun |title=Efficient Learning of Sparse Representations with an Energy-Based Model|journal=Advances in Neural Information Processing Systems|year=2006|volume=19|pages=1137–1144|url=http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf|accessdate=20 September 2013}}</ref>
Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either [[Affine transformation|affine distortions]] or [[Elastic deformation|elastic distortions]].<ref name="LeCun" /> Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.<ref>{{cite journal|last=Ranzato|first=Marc’Aurelio|author2=Christopher Poultney |author3=Sumit Chopra |author4=Yann LeCun |title=Efficient Learning of Sparse Representations with an Energy-Based Model|journal=Advances in Neural Information Processing Systems|year=2006|volume=19|pages=1137–1144|url=http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf|accessdate=20 September 2013}}</ref>
Line 23: Line 23:
{| class="wikitable sortable"
{| class="wikitable sortable"
|-
|-
! Type !! Classifier !! [[Preprocessing]] !! Error rate (%)
! Type !! Classifier !! Distortion !! [[Preprocessing]] !! Error rate (%)
|-
|-
| [[Linear classifier]] || [[Linear discriminant analysis|Pairwise linear classifier]] || Deskewing || 7.6<ref name="Gradient" />
| [[Linear classifier]] || [[Linear discriminant analysis|Pairwise linear classifier]] || {{okay|None}} || Deskewing || 7.6<ref name="Gradient" />
|-
|-
| [[K-Nearest Neighbors]] || K-NN with non-linear deformation (P2DHMDM) || Shiftable edges || 0.52<ref>{{cite journal|last=Keysers|first=Daniel|author2=Thomas Deselaers |author3=Christian Gollan |author4=Hermann Ney |title=Deformation models for image recognition|journal=EEE Transactions on Pattern Analysis and Machine Intelligence|date=August 2007|volume=29|issue=8|pages=1422–1435|url=http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E0F3BDC7642FBA1D8E2811526BD0E596?doi=10.1.1.106.3963&rep=rep1&type=pdf|accessdate=27 August 2013}}</ref>
| [[K-Nearest Neighbors]] || K-NN with non-linear deformation (P2DHMDM) || {{okay|None}} || Shiftable edges || 0.52<ref>{{cite journal|last=Keysers|first=Daniel|author2=Thomas Deselaers |author3=Christian Gollan |author4=Hermann Ney |title=Deformation models for image recognition|journal=EEE Transactions on Pattern Analysis and Machine Intelligence|date=August 2007|volume=29|issue=8|pages=1422–1435|url=http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E0F3BDC7642FBA1D8E2811526BD0E596?doi=10.1.1.106.3963&rep=rep1&type=pdf|accessdate=27 August 2013}}</ref>
|-
|-
| [[Gradient boosting|Boosted Stumps]] || Product of stumps on [[Haar-like features|Haar features]] || Haar features || 0.87<ref>{{cite journal|last=Kégl|first=Balázs|author2=Róbert Busa-Fekete|title=Boosting products of base classifiers|journal=Proceedings of the 26th Annual International Conference on Machine Learning|year=2009|pages=497–504|url=https://users.lal.in2p3.fr/kegl/research/PDFs/keglBusafekete09.pdf|accessdate=27 August 2013}}</ref>
| [[Gradient boosting|Boosted Stumps]] || Product of stumps on [[Haar-like features|Haar features]] || {{okay|None}} || Haar features || 0.87<ref>{{cite journal|last=Kégl|first=Balázs|author2=Róbert Busa-Fekete|title=Boosting products of base classifiers|journal=Proceedings of the 26th Annual International Conference on Machine Learning|year=2009|pages=497–504|url=https://users.lal.in2p3.fr/kegl/research/PDFs/keglBusafekete09.pdf|accessdate=27 August 2013}}</ref>
|-
|-
| Non-Linear Classifier || 40 PCA + quadratic classifier || {{okay|None}} || 3.3<ref name="Gradient" />
| Non-Linear Classifier || 40 PCA + quadratic classifier || {{okay|None}} || {{okay|None}} || 3.3<ref name="Gradient" />
|-
|-
| [[Support vector machine]] || Virtual [[Support vector machine|SVM]], deg-9 poly, 2-pixel jittered || Deskewing || 0.56<ref>DeCoste and Scholkopf, MLJ 2002</ref>
| [[Support vector machine]] || Virtual [[Support vector machine|SVM]], deg-9 poly, 2-pixel jittered || {{okay|None}} || Deskewing || 0.56<ref>DeCoste and Scholkopf, MLJ 2002</ref>
|-
|-
| [[Neural network]] || 2-layer 784-800-10 || {{okay|None}} || 1.6<ref name="simard2003">{{cite journal|title=Best Practices for Convolutional Neural Networks
| [[Neural network]] || 2-layer 784-800-10 || {{okay|None}} || {{okay|None}} || 1.6<ref name="simard2003">{{cite journal|title=Best Practices for Convolutional Neural Networks
Applied to Visual Document Analysis|author=Patrice Y. Simard|author2=Dave Steinkraus|author3=John C. Platt|year=2003|url=http://research.microsoft.com/apps/pubs/?id=68920|publisher=IEEE|doi=10.1109/ICDAR.2003.1227801|journal=Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on}}</ref>
Applied to Visual Document Analysis|author=Patrice Y. Simard|author2=Dave Steinkraus|author3=John C. Platt|year=2003|url=http://research.microsoft.com/apps/pubs/?id=68920|publisher=IEEE|doi=10.1109/ICDAR.2003.1227801|journal=Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on}}</ref>
|-
|-
| [[Neural network]] || 2-layer 784-800-10 || [[Elastic deformation|elastic distortions]] || 0.7<ref name="simard2003"/>
| [[Neural network]] || 2-layer 784-800-10 || [[Elastic deformation|elastic distortions]] || {{okay|None}} || 0.7<ref name="simard2003"/>
|-
|-
| Deep [[neural network]] || 6-layer 784-2500-2000-1500-1000-500-10 || [[Elastic deformation|elastic distortions]] || 0.35<ref>{{cite journal|last=Ciresan|first=Claudiu Dan|coauthors=Dan, Ueli Meier, Luca Maria Gambardella, and Juergen Schmidhuber|title=Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition|journal=Neural Computation|date=December 2010|volume=22|issue=12|doi=10.1162/NECO_a_00052|url=http://arxiv.org/abs/1003.0358|accessdate=27 August 2013}}</ref>
| Deep [[neural network]] || 6-layer 784-2500-2000-1500-1000-500-10 || [[Elastic deformation|elastic distortions]] || {{okay|None}} || 0.35<ref>{{cite journal|last=Ciresan|first=Claudiu Dan|coauthors=Dan, Ueli Meier, Luca Maria Gambardella, and Juergen Schmidhuber|title=Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition|journal=Neural Computation|date=December 2010|volume=22|issue=12|doi=10.1162/NECO_a_00052|url=http://arxiv.org/abs/1003.0358|accessdate=27 August 2013}}</ref>
|-
|-
| [[Convolutional neural network]] || Committee of 35 conv. net, 1-20-P-40-P-150-10 || [[Elastic deformation|elastic distortions]]<br>Width normalizations || 0.23<ref name="Multideep" />
| [[Convolutional neural network]] || Committee of 35 conv. net, 1-20-P-40-P-150-10 || [[Elastic deformation|elastic distortions]] || Width normalizations || 0.23<ref name="Multideep" />
|}
|}



Revision as of 01:47, 25 October 2015

The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems.[1][2] The database is also widely used for training and testing in the field of machine learning.[3][4] It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, NIST's complete dataset was too hard.[5] Furthermore, the black and white images from NIST were normalized to fit into a 20x20 pixel bounding box and anti-aliased, which introduced grayscale levels.[5]

The database contains 60,000 training images and 10,000 testing images.[6] Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.[7] There have been a number of scientific papers on attempts to achieve the lowest error rate; one paper, using a hierarchical system of convolutional neural networks, manages to get an error rate on the MNIST database of 0.23 percent.[8] The original creators of the database keep a list of some of the methods tested on it.[5] In their original paper, they use a support vector machine to get an error rate of 0.8 percent.[9]

Dataset

The set of images in the MNIST database is a combination of two of NIST's databases: Special Database 1 and Special Database 3. Special Database 1 and Special Database 3 consist of digits written by high school students and employees of the United States Census Bureau, respectively.[5]

Performance

Some researchers have achieved "near-human performance" on the MNIST database, using a committee of neural networks; in the same paper, the authors achieve performance double that of humans on other recognition tasks.[8] The highest error rate listed[5] on the original website of the database is 12 percent, which is achieved with no preprocessing using a SVM with a 1-layer neural network.[9]

In 2004, A best-case error rate of 0.42 percent was achieved on the database by researchers using a new classifier called the LIRA, which is a neural classifier with three neuron layers based on Rosenblatt's perceptron principles.[10]

Some researchers have tested artificial intelligence systems using the database put under random distortions. The systems in these cases are usually neural networks and the distortions used tend to be either affine distortions or elastic distortions.[5] Sometimes, these systems can be very successful; one such system achieved an error rate on the database of 0.39 percent.[11]

In 2011, an error rate of 0.27 percent, improving on the best previous result, was reported by researchers using a similar system of neural networks.[12] In 2013, an approach based on regularization of neural networks using DropConnect has been claimed to achieve a 0.21 percent error rate.[13]

Classifiers

This is a table of some of the machine learning methods used on the database and their error rates, by type of classifier:

Type Classifier Distortion Preprocessing Error rate (%)
Linear classifier Pairwise linear classifier None Deskewing 7.6[9]
K-Nearest Neighbors K-NN with non-linear deformation (P2DHMDM) None Shiftable edges 0.52[14]
Boosted Stumps Product of stumps on Haar features None Haar features 0.87[15]
Non-Linear Classifier 40 PCA + quadratic classifier None None 3.3[9]
Support vector machine Virtual SVM, deg-9 poly, 2-pixel jittered None Deskewing 0.56[16]
Neural network 2-layer 784-800-10 None None 1.6[17]
Neural network 2-layer 784-800-10 elastic distortions None 0.7[17]
Deep neural network 6-layer 784-2500-2000-1500-1000-500-10 elastic distortions None 0.35[18]
Convolutional neural network Committee of 35 conv. net, 1-20-P-40-P-150-10 elastic distortions Width normalizations 0.23[8]

See also

References

  1. ^ "Support vector machines speed pattern recognition - Vision Systems Design". Vision Systems Design. Retrieved 17 August 2013.
  2. ^ Gangaputra, Sachin. "Handwritten digit database". Retrieved 17 August 2013.
  3. ^ Qiao, Yu (2007). "THE MNIST DATABASE of handwritten digits". Retrieved 18 August 2013.
  4. ^ Platt, John C. (1999). "Using analytic QP and sparseness to speed training of support vector machines" (PDF). Advances in neural information processing systems: 557–563. Retrieved 18 August 2013.
  5. ^ a b c d e f LeCun, Yann; Corinna Cortes; Christopher J.C. Burges. "MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges". Retrieved 17 August 2013.
  6. ^ Kussul, Ernst; Tatiana Baidyk (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008.
  7. ^ Zhang, Bin; Sargur N. Srihari (2004). "Fast k -Nearest Neighbor Classification Using Cluster-Based Trees" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 26 (4): 525–528. doi:10.1109/TPAMI.2004.1265868. PMID 15382657. Retrieved 18 August 2013.
  8. ^ a b c Cires¸an, Dan; Ueli Meier; Jürgen Schmidhuber (2012). "Multi-column deep neural networks for image classification" (PDF). 2012 IEEE Conference on Computer Vision and Pattern Recognition: 3642–3649. arXiv:1202.2745. doi:10.1109/CVPR.2012.6248110. ISBN 978-1-4673-1228-8.
  9. ^ a b c d LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-Based Learning Applied to Document Recognition" (PDF). Proceedings of the IEEE 86. 86 (11): 2278–2324. doi:10.1109/5.726791. Retrieved 18 August 2013.
  10. ^ Kussul, Ernst; Tatiana Baidyk (2004). "Improved method of handwritten digit recognition tested on MNIST database" (PDF). Image and Vision Computing. 22: 971–981. doi:10.1016/j.imavis.2004.03.008. Retrieved 20 September 2013.
  11. ^ Ranzato, Marc’Aurelio; Christopher Poultney; Sumit Chopra; Yann LeCun (2006). "Efficient Learning of Sparse Representations with an Energy-Based Model" (PDF). Advances in Neural Information Processing Systems. 19: 1137–1144. Retrieved 20 September 2013.
  12. ^ Ciresan, Dan Claudiu; Ueli Meier; Luca Maria Gambardella; Jürgen Schmidhuber (2011). "Convolutional neural network committees for handwritten character classification" (PDF). 2011 International Conference on Document Analysis and Recognition (ICDAR): 1135–1139. doi:10.1109/ICDAR.2011.229. Retrieved 20 September 2013.
  13. ^ Wan, Li; Matthew Zeiler; Sixin Zhang; Yann LeCun; Rob Fergus (2013). Regularization of Neural Network using DropConnect. International Conference on Machine Learning(ICML).
  14. ^ Keysers, Daniel; Thomas Deselaers; Christian Gollan; Hermann Ney (August 2007). "Deformation models for image recognition". EEE Transactions on Pattern Analysis and Machine Intelligence. 29 (8): 1422–1435. Retrieved 27 August 2013.
  15. ^ Kégl, Balázs; Róbert Busa-Fekete (2009). "Boosting products of base classifiers" (PDF). Proceedings of the 26th Annual International Conference on Machine Learning: 497–504. Retrieved 27 August 2013.
  16. ^ DeCoste and Scholkopf, MLJ 2002
  17. ^ a b Patrice Y. Simard; Dave Steinkraus; John C. Platt (2003). "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis". Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on. IEEE. doi:10.1109/ICDAR.2003.1227801. {{cite journal}}: line feed character in |title= at position 49 (help)
  18. ^ Ciresan, Claudiu Dan (December 2010). "Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition". Neural Computation. 22 (12). doi:10.1162/NECO_a_00052. Retrieved 27 August 2013. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

Further reading