Jump to content

Data farming

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Pjsanchez (talk | contribs) at 23:16, 21 October 2016 (Broadened definition of data farming, added mining/farming metaphors). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Data farming is the process of using designed computational experiments to “grow” data, which can then be analyzed using statistical and visualization techniques to obtain insight into complex systems. These methods can be applied to any computational model.

Data farming differs from Data mining, as the following metaphors indicate:

Miners seek valuable nuggets of ore buried in the earth, but have no control over what is out there or how hard it is to extract the nuggets from their surroundings. ... Similarly, data miners seek to uncover valuable nuggets of information buried within massive amounts of data. Data-mining techniques use statistical and graphical measures to try to identify interesting correlations or clusters in the data set.

Farmers cultivate the land to maximize their yield. They manipulate the environment to their advantage using irrigation, pest control, crop rotation, fertilizer, and more. Small-scale designed experiments let them determine whether these treatments are effective. Similarly, data farmers manipulate simulation models to their advantage, using large-scale designed experimentation to grow data from their models in a manner that easily lets them extract useful information. ...the results can reveal root cause-and-effect relationships between the model input factors and the model responses, in addition to rich graphical and statistical views of these relationships.[1]

A NATO modeling and simulation task group has documented the data farming process in the Final Report of MSG-088. Here, data farming uses collaborative processes in combining rapid scenario prototyping, simulation modeling, design of experiments, high performance computing, and analysis and visualization in an iterative loop-of-loops.

Origins of the term

The term "data farming" comes from the idea of planting data in the simulation and parameter/value space, and then harvesting the data that results from the simulation runs.

Usage

Data farming was originally used in the Marine Corp’s Project Albert. Small agent-based distillation models (simulations) were created to capture a specific military challenge. These models were run thousands or millions of times at the Maui High Performance Computer Center and other facilities. Project Albert analysts would work with the military subject matter experts to refine the models and interpret the results. The Naval Postgraduate School also worked closely with Project Albert in model generation, output analysis, and the creation of new experimental designs to better leverage the computing capabilities at Maui and other facilities.

Since the end of Project Albert in 2006, data farming has been applied to many real world questions, in particular defense-related applications. For example, the NATO Final Report of MSG-088 contains a case study on humanitarian assistance and a case study on force protection. NATO has also begun a follow-on task group using data farming to examine questions in cyber security and operational force planning.

Workshops

International Data Farming Workshops are held twice each year, in the Spring and Fall. Workshop information, including proceedings from prior workshops and registration information for future ones, can be found at the Naval Postgraduate School's SEED Center for Data Farming and the Data Farming Community page on Workshops. The 28th workshop is to be held in the Washington DC area in October 2014.

  1. ^ Lucas, T. W.; Kelton, W. D.; Sanchez, P. J.; Sanchez, S. M.; Anderson, B. L. (2015). "Changing the Paradigm: Simulation, Now a Method of First Resort". Naval Research Logistics. 62 (4): 293–305. doi:10.1002/nav.21628.