Jump to content

Tanagra (machine learning)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Cydebot (talk | contribs) at 15:26, 23 December 2014 (Robot - Moving category Software programmed in Java to Category:Software programmed in Java (programming language) per CFD at Wikipedia:Categories for discussion/Log/2014 December 11.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Tanagra
Developer(s)Lumière University Lyon 2
Stable release
1.4.50 / 2013/12/18
Repository
Operating systemWindows
TypeMachine Learning, Data mining, Multivariate analysis, Data analysis
LicenseProprietary
Websitehttp://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html

Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France.[1] Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression, factor analysis, clustering, classification and association rule learning.

Tanagra is an academic project. It is widely used in the French speaking Universities.[2] Tanagra is frequently used in real studies[3][4] and in the software comparison papers.[5][6]

History

The development of Tanagra was started in June 2003. The first version is distributed in December 2003. Tanagra is the successor of Sipina, another free data mining tool which is intended only for the supervised learning tasks (classification), especially an interactive and visual construction of decision trees. Sipina is still available online and is maintained. Tanagra is an "open source project" as every researcher can access to the source code, and add his own algorithms, as far as he agrees and conforms to the software distribution license.

The main purpose of Tanagra project is to give researchers and students a user-friendly data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyze either real or synthetic data.

From 2006, Ricco Rakotomalala made an important documentation effort. A large number of tutorials are published on a dedicated website. They describe the statistical and machine learning methods and their implementation with Tanagra on real case studies. The use of the other free data mining tools on the same problems is also widely described. The comparison of the tools enables to the readers to understand the possible differences in the presenting of results.

Description

A screenshot of Tanagra software

Tanagra works as the current data mining tools. The user can design visually a data mining process in a diagram. Each node is a statistical or machine learning technique, the connection between two nodes represents the data transfer. But unlike of the majority of the tools which are based on the workflow paradigm, Tanagra is very simplified. The treatments are represented in a tree diagram. The results are displayed in a HTML format. So it is easy to export the outputs in order to visualize the results in a browser. It is also possible to copy the result tables to a spreadsheet.

Tanagra makes a good compromise between the statistical approaches (e.g. parametric and nonparametric statistical tests), the multivariate analysis methods (e.g. factor analysis, correspondence analysis, cluster analysis, regression) and the machine learning techniques (e.g. neural network, support vector machine, decision trees, random forest).

See also

References

  1. ^ Rakotomalala, Ricco. (2005). "TANAGRA: a free software for research and academic purposes". {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |conference= ignored (help)
  2. ^ G. Gregoire, F.X. Jollois, J.F. Petiot, A. Qannari, S. Sabourin, P. Swertwaegher, J.C. Turlot, V. Vandewalle, S. Viguier-Pla, “Software and statistics teaching in STID department of IUT”, in Statistique et Enseignement, 2(2), 5-24, 2011.
  3. ^ S.G. Jacob and R.G. Ramani, “Evolving Efficient Clustering and Classification Patterns in Lymphography Data through Data Mining Techniques”, in International Journal on Soft Computing (IJSC), 3(3), 119-132, 2012.
  4. ^ E. Kirkos, C. Spathis, A. Nanopoulos, Y. Manolopoulos, “Identifying Qualified Auditor's Opinions: A Data Mining Approach”, in Journal of Emerging Technologies in Accouting, 4(1), 183-197, 2007.
  5. ^ R.M. Rahman and F. Afroz, “Comparison of Various Classification Techniques Using Different Data Mining Tools for Diabete Diagnosis”, in Journal of Software Engineering and Applications, 6, 85-97, 2013.
  6. ^ H. Solanki “Comparative Study of Data Mining Tools and Analysis with Unified Data Mining Theory”, in International Journal of Computer Applications, 75(16), 23-28, 2013.