Jump to content

Data farming: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m added link
Citation bot (talk | contribs)
Altered title. | Use this bot. Report bugs. | Suggested by Pancho507 | Linked from User:Pancho507/sandbox/2 | #UCB_webform_linked 740/3850
 
(22 intermediate revisions by 12 users not shown)
Line 5: Line 5:
Miners seek valuable nuggets of ore buried in the earth, but have no control over what is out there or how hard it is to extract the nuggets from their surroundings. ... Similarly, data miners seek to uncover valuable nuggets of information buried within massive amounts of data. Data-mining techniques use statistical and graphical measures to try to identify interesting correlations or clusters in the data set.
Miners seek valuable nuggets of ore buried in the earth, but have no control over what is out there or how hard it is to extract the nuggets from their surroundings. ... Similarly, data miners seek to uncover valuable nuggets of information buried within massive amounts of data. Data-mining techniques use statistical and graphical measures to try to identify interesting correlations or clusters in the data set.


Farmers cultivate the land to maximize their yield. They manipulate the environment to their advantage using irrigation, pest control, crop rotation, fertilizer, and more. Small-scale designed experiments let them determine whether these treatments are effective. Similarly, data farmers manipulate simulation models to their advantage, using large-scale designed experimentation to grow data from their models in a manner that easily lets them extract useful information. ...the results can reveal root cause-and-effect relationships between the model input factors and the model responses, in addition to rich graphical and statistical views of these relationships.<ref>{{cite journal|last1=Lucas|first1=T. W.|last2=Kelton|first2=W. D.|last3=Sanchez|first3=P. J.|last4=Sanchez|first4=S. M.|last5=Anderson|first5=B. L.|title=Changing the Paradigm: Simulation, Now a Method of First Resort|journal=Naval Research Logistics|date=2015|volume=62|issue=4|pages=293–305|doi=10.1002/nav.21628}}</ref>
Farmers cultivate the land to maximize their yield. They manipulate the environment to their advantage using irrigation, pest control, crop rotation, fertilizer, and more. Small-scale designed experiments let them determine whether these treatments are effective. Similarly, data farmers manipulate simulation models to their advantage, using large-scale designed experimentation to grow data from their models in a manner that easily lets them extract useful information. ...the results can reveal root cause-and-effect relationships between the model input factors and the model responses, in addition to rich graphical and statistical views of these relationships.<ref>{{cite journal|last1=Lucas|first1=T. W.|last2=Kelton|first2=W. D.|last3=Sanchez|first3=P. J.|last4=Sanchez|first4=S. M.|author4-link= Susan M. Sanchez |last5=Anderson|first5=B. L.|title=Changing the Paradigm: Simulation, Now a Method of First Resort|journal=Naval Research Logistics|date=2015|volume=62|issue=4|pages=293–305|doi=10.1002/nav.21628|s2cid=60846350 |hdl=10945/57859|hdl-access=free}}</ref>
</blockquote>
</blockquote>


A NATO modeling and simulation task group has documented the data farming process in the [https://www.cso.nato.int/Pubs/rdp.asp?RDP=STO-TR-MSG-088 Final Report of MSG-088].
A NATO modeling and simulation task group has documented the data farming process in the Final Report of MSG-088.<ref>https://www.cso.nato.int/Pubs/rdp.asp?RDP=STO-TR-MSG-088 {{Bare URL inline|date=August 2024}}</ref>
Here, data farming uses collaborative processes in combining rapid scenario prototyping, simulation modeling, design of experiments, high performance computing, and analysis and visualization in an iterative [http://www.datafarming.org/Data_Farming/Welcome.html loop-of-loops].
Here, data farming uses collaborative processes in combining rapid scenario prototyping, simulation modeling, design of experiments, high performance computing, and analysis and visualization in an iterative loop-of-loops.<ref>{{Cite web |url=http://www.datafarming.org/Data_Farming/Welcome.html |title=Data Farming |access-date=2014-04-22 |archive-date=2015-08-29 |archive-url=https://web.archive.org/web/20150829212237/http://www.datafarming.org/Data_Farming/Welcome.html |url-status=dead }}</ref>


==History==
==History==
The science of [[Design of Experiments]] (DOE) has been around for over a century, pioneered by [[R.A. Fisher]] for agricultural studies. Many of the classic experiment designs can be used in simulation studies. However, computational experiments have far fewer restrictions than do real-world experiments, in terms of costs, number of factors, time required, ability to replicate, ability to automate, etc. Consequently, a framework specifically oriented toward large-scale simulation experiments is warranted.
The science of [[Design of Experiments]] (DOE) has been around for over a century, pioneered by [[R.A. Fisher]] for agricultural studies. Many of the classic experiment designs can be used in simulation studies. However, computational experiments have far fewer restrictions than do real-world experiments, in terms of costs, number of factors, time required, ability to replicate, ability to automate, etc. Consequently, a framework specifically oriented toward large-scale simulation experiments is warranted.


People have been conducting computational experiments for as long as computers have been around. The term “data farming” is more recent, coined in 1998<ref>{{cite journal|last1=Brandstein|first1=A.|last2=Horne|first2=G.|title=Data Farming: A Meta-Technique for Research in the 21st Century|journal=Maneuver Warfare Science|date=1998|publisher=Marine Corps Combat Development Command|location=Quantico, VA}}</ref> in conjunction with the Marine Corp’s [http://projectalbert.org Project Albert], in which small agent-based distillation models (a type of stochastic simulation) were created to capture specific military challenges. These models were run thousands or millions of times at the [https://www.mhpcc.hpc.mil/ Maui High Performance Computer Center] and other facilities. Project Albert analysts would work with the military subject matter experts to refine the models and interpret the results.
People have been conducting computational experiments for as long as computers have been around. The term “data farming” is more recent, coined in 1998<ref>{{cite journal|last1=Brandstein|first1=A.|last2=Horne|first2=G.|title=Data Farming: A Meta-Technique for Research in the 21st Century|journal=Maneuver Warfare Science|date=1998|publisher=Marine Corps Combat Development Command|location=Quantico, VA}}</ref> in conjunction with the Marine Corp's Project Albert,<ref>http://projectalbert.org {{Bare URL inline|date=August 2024}}</ref> in which small agent-based distillation models (a type of stochastic simulation) were created to capture specific military challenges. These models were run thousands or millions of times at the Maui High Performance Computer Center<ref>https://www.mhpcc.hpc.mil/ {{Bare URL inline|date=August 2024}}</ref> and other facilities. Project Albert analysts would work with the military subject matter experts to refine the models and interpret the results.


Initially, the use of brute-force [[Factorial experiment|full factorial]] (gridded) designs meant that the simulations needed to run very quickly and the studies required [[high-performance computing]]. Even so, only a small number of factors (at a limited number of levels) could be investigated, due to the [[curse of dimensionality]].
Initially, the use of brute-force [[Factorial experiment|full factorial]] (gridded) designs meant that the simulations needed to run very quickly and the studies required [[high-performance computing]]. Even so, only a small number of factors (at a limited number of levels) could be investigated, due to the [[curse of dimensionality]].


The [http://harvest.nps.edu SEED Center for Data Farming] at the [http://www.nps.edu/ Naval Postgraduate School] also worked closely with Project Albert in model generation, output analysis, and the creation of new [[Design of experiments|experimental designs]] to better leverage the computing capabilities at Maui and other facilities. Recent breakthroughs in designs specifically developed for data farming can be found in
The SEED Center for Data Farming<ref>http://harvest.nps.edu {{Bare URL inline|date=August 2024}}</ref> at the [[Naval Postgraduate School]]<ref>http://www.nps.edu/ {{Bare URL inline|date=August 2024}}</ref> also worked closely with Project Albert in model generation, output analysis, and the creation of new [[Design of experiments|experimental designs]] to better leverage the computing capabilities at Maui and other facilities. Recent breakthroughs in designs specifically developed for data farming can be found in<ref>{{cite journal|last1=Kleijnen|first1=J. P. C.|last2=Sanchez|first2=S. M.|author2-link= Susan M. Sanchez |last3=Lucas|first3=T. W.|last4=Cioppa|first4=T. M.|title=A User's Guide to the Brave New World of Designing Simulation Experiments|journal=INFORMS Journal on Computing|date=2005|volume=17|issue=3|pages=263–289|doi=10.1287/ijoc.1050.0136}}</ref><ref>{{cite book|last1=Sanchez|first1=S. M.|author1-link= Susan M. Sanchez |last2=Sanchez|first2=P.|last3=Wan|first3=H.|title=2021 Winter Simulation Conference (WSC) |chapter=Work Smarter, Not Harder: A Tutorial on Designing and ConductingSimulation Experiments |date=2021|pages=1–15 |chapter-url=https://www.informs-sim.org/wsc21papers/112.pdf|publisher=Institute of Electrical and Electronics Engineers, Inc.|location=Piscataway, NJ|doi=10.1109/WSC52266.2021.9715422 |hdl=10945/44883 |isbn= 9780903440660|s2cid=247059747 }}</ref>
<ref>{{cite journal|last1=Kleijnen|first1=J. P. C.|last2=Sanchez|first2=S. M.|last3=Lucas|first3=T. W.|last4=Cioppa|first4=T. M.|title=A User's Guide to the Brave New World of Designing Simulation Experiments|journal=INFORMS Journal on Computing|date=2005|volume=17|issue=3|pages=263–289}}</ref>
,<ref>{{cite journal|last1=Sanchez|first1=S. M.|last2=Wan|first2=H.|title=Work Smarter, Not Harder: A Tutorial on Designing and Conducting Simulation Experiments|journal=Proceedings of the 2015 Winter Simulation Conference|date=2015|pages=1795–1809|url=http://www.informs-sim.org/wsc15papers/187.pdf|publisher=Institute of Electrical and Electronic Engineers, Inc.|location=Piscataway, NJ}}</ref>
among others.
among others.


==Workshops==
==Workshops==
A series of international data farming workshops have been held since 1998 by the [http://harvest.nps.edu SEED Center for Data Farming]. International Data Farming Workshop 1 occurred in 1991, and since then 16 more workshops have taken place. The workshops have seen a diverse array of representation from participating countries, such as Canada, Singapore, Mexico, Turkey, and the United States.<ref name=":0">Horne, G., & Schwierz, K. (2008). Data farming around the world overview. Paper presented at the 1442-1447. doi:10.1109/WSC.2008.4736222 </ref>
A series of international data farming workshops have been held since 1998 by the SEED Center for Data Farming.<ref>http://harvest.nps.edu {{Bare URL inline|date=August 2024}}</ref> International Data Farming Workshop 1 occurred in 1991, and since then 16 more workshops have taken place. The workshops have seen a diverse array of representation from participating countries, such as Canada, Singapore, Mexico, Turkey, and the United States.<ref name=":0">Horne, G., & Schwierz, K. (2008). Data farming around the world overview. Paper presented at the 1442-1447. doi:10.1109/WSC.2008.4736222</ref>


The International Data Farming Workshops operate through collaboration between various teams of experts. The most recent workshop held in 2008 saw over 100 teams participating. The teams of data farmers are assigned a specific area of study, such as [[robotics]], [[homeland security]], and [[disaster relief]]. Different forms of data farming are experimented with and utilized by each group, such as the [[Pythagoras ABM]], the Logistics Battle Command model, and the agent-based sensor effector model (ABSEM).<ref name=":0" />
The International Data Farming Workshops operate through collaboration between various teams of experts. The most recent workshop held in 2008 saw over 100 teams participating. The teams of data farmers are assigned a specific area of study, such as [[robotics]], [[homeland security]], and [[disaster relief]]. Different forms of data farming are experimented with and utilized by each group, such as the [[Pythagoras ABM]], the Logistics Battle Command model, and the agent-based sensor effector model (ABSEM).<ref name=":0" />
Line 34: Line 32:
* [http://harvest.nps.edu/ SEED Center for Data Farming] website, with links to numerous papers, applications, designs, and software.
* [http://harvest.nps.edu/ SEED Center for Data Farming] website, with links to numerous papers, applications, designs, and software.
* An article on the 27th Data Farming Workshop in Finland in [http://www.defensemedianetwork.com/stories/international-data-farming-workshop-under-way-in-finland/ Defense Media Network from January 2014]
* An article on the 27th Data Farming Workshop in Finland in [http://www.defensemedianetwork.com/stories/international-data-farming-workshop-under-way-in-finland/ Defense Media Network from January 2014]
* An article on data farming in [http://www.defensenews.com/article/20130103/TSJ01/301030005/Technical-Briefing-Data-Farming Defense News from January 2013]
* An article on data farming in [http://webarchive.loc.gov/all/20130109111949/http://www.defensenews.com/article/20130103/TSJ01/301030005/Technical%2DBriefing%2DData%2DFarming Defense News from January 2013]
* An article summarizing data farming in the [http://www.afcea.org/signal/articles/templates/SIGNAL_Article_Template.asp?articleid=975&zoneid=158 June 2005 issue of SIGNAL]
* An article summarizing data farming in the [http://www.afcea.org/signal/articles/templates/SIGNAL_Article_Template.asp?articleid=975&zoneid=158 June 2005 issue of SIGNAL]
* [http://www.informs-sim.org/wsc04papers/100.pdf MITRE Corporation research paper on data farming]
* [http://www.informs-sim.org/wsc04papers/100.pdf MITRE Corporation research paper on data farming]
Line 40: Line 38:
{{Data}}
{{Data}}


{{DEFAULTSORT:Data Farming}}
[[Category:Design of experiments]]
[[Category:Design of experiments]]
[[Category:Simulation]]
[[Category:Simulation]]
[[Category:Scientific modeling]]
[[Category:Cluster computing]]
[[Category:Cluster computing]]
[[Category:Data analysis]]
[[Category:Data analysis]]
[[Category:Operations research]]

Latest revision as of 09:15, 13 December 2024

Data farming is the process of using designed computational experiments to “grow” data, which can then be analyzed using statistical and visualization techniques to obtain insight into complex systems. These methods can be applied to any computational model.

Data farming differs from Data mining, as the following metaphors indicate:

Miners seek valuable nuggets of ore buried in the earth, but have no control over what is out there or how hard it is to extract the nuggets from their surroundings. ... Similarly, data miners seek to uncover valuable nuggets of information buried within massive amounts of data. Data-mining techniques use statistical and graphical measures to try to identify interesting correlations or clusters in the data set.

Farmers cultivate the land to maximize their yield. They manipulate the environment to their advantage using irrigation, pest control, crop rotation, fertilizer, and more. Small-scale designed experiments let them determine whether these treatments are effective. Similarly, data farmers manipulate simulation models to their advantage, using large-scale designed experimentation to grow data from their models in a manner that easily lets them extract useful information. ...the results can reveal root cause-and-effect relationships between the model input factors and the model responses, in addition to rich graphical and statistical views of these relationships.[1]

A NATO modeling and simulation task group has documented the data farming process in the Final Report of MSG-088.[2] Here, data farming uses collaborative processes in combining rapid scenario prototyping, simulation modeling, design of experiments, high performance computing, and analysis and visualization in an iterative loop-of-loops.[3]

History

[edit]

The science of Design of Experiments (DOE) has been around for over a century, pioneered by R.A. Fisher for agricultural studies. Many of the classic experiment designs can be used in simulation studies. However, computational experiments have far fewer restrictions than do real-world experiments, in terms of costs, number of factors, time required, ability to replicate, ability to automate, etc. Consequently, a framework specifically oriented toward large-scale simulation experiments is warranted.

People have been conducting computational experiments for as long as computers have been around. The term “data farming” is more recent, coined in 1998[4] in conjunction with the Marine Corp's Project Albert,[5] in which small agent-based distillation models (a type of stochastic simulation) were created to capture specific military challenges. These models were run thousands or millions of times at the Maui High Performance Computer Center[6] and other facilities. Project Albert analysts would work with the military subject matter experts to refine the models and interpret the results.

Initially, the use of brute-force full factorial (gridded) designs meant that the simulations needed to run very quickly and the studies required high-performance computing. Even so, only a small number of factors (at a limited number of levels) could be investigated, due to the curse of dimensionality.

The SEED Center for Data Farming[7] at the Naval Postgraduate School[8] also worked closely with Project Albert in model generation, output analysis, and the creation of new experimental designs to better leverage the computing capabilities at Maui and other facilities. Recent breakthroughs in designs specifically developed for data farming can be found in[9][10] among others.

Workshops

[edit]

A series of international data farming workshops have been held since 1998 by the SEED Center for Data Farming.[11] International Data Farming Workshop 1 occurred in 1991, and since then 16 more workshops have taken place. The workshops have seen a diverse array of representation from participating countries, such as Canada, Singapore, Mexico, Turkey, and the United States.[12]

The International Data Farming Workshops operate through collaboration between various teams of experts. The most recent workshop held in 2008 saw over 100 teams participating. The teams of data farmers are assigned a specific area of study, such as robotics, homeland security, and disaster relief. Different forms of data farming are experimented with and utilized by each group, such as the Pythagoras ABM, the Logistics Battle Command model, and the agent-based sensor effector model (ABSEM).[12]

References

[edit]
  1. ^ Lucas, T. W.; Kelton, W. D.; Sanchez, P. J.; Sanchez, S. M.; Anderson, B. L. (2015). "Changing the Paradigm: Simulation, Now a Method of First Resort". Naval Research Logistics. 62 (4): 293–305. doi:10.1002/nav.21628. hdl:10945/57859. S2CID 60846350.
  2. ^ https://www.cso.nato.int/Pubs/rdp.asp?RDP=STO-TR-MSG-088 [bare URL]
  3. ^ "Data Farming". Archived from the original on 2015-08-29. Retrieved 2014-04-22.
  4. ^ Brandstein, A.; Horne, G. (1998). "Data Farming: A Meta-Technique for Research in the 21st Century". Maneuver Warfare Science. Quantico, VA: Marine Corps Combat Development Command.
  5. ^ http://projectalbert.org [bare URL]
  6. ^ https://www.mhpcc.hpc.mil/ [bare URL]
  7. ^ http://harvest.nps.edu [bare URL]
  8. ^ http://www.nps.edu/ [bare URL]
  9. ^ Kleijnen, J. P. C.; Sanchez, S. M.; Lucas, T. W.; Cioppa, T. M. (2005). "A User's Guide to the Brave New World of Designing Simulation Experiments". INFORMS Journal on Computing. 17 (3): 263–289. doi:10.1287/ijoc.1050.0136.
  10. ^ Sanchez, S. M.; Sanchez, P.; Wan, H. (2021). "Work Smarter, Not Harder: A Tutorial on Designing and ConductingSimulation Experiments" (PDF). 2021 Winter Simulation Conference (WSC). Piscataway, NJ: Institute of Electrical and Electronics Engineers, Inc. pp. 1–15. doi:10.1109/WSC52266.2021.9715422. hdl:10945/44883. ISBN 9780903440660. S2CID 247059747.
  11. ^ http://harvest.nps.edu [bare URL]
  12. ^ a b Horne, G., & Schwierz, K. (2008). Data farming around the world overview. Paper presented at the 1442-1447. doi:10.1109/WSC.2008.4736222
[edit]