User:WillWare/Automation of science: Difference between revisions
←Created page with '== Notes from a slide show == Computers should * look for patterns in data (data mining) * propose falsifiable hypotheses * design experiments to test hypotheses * ...' |
No edit summary |
||
Line 7: | Line 7: | ||
* confirm/deny hypotheses |
* confirm/deny hypotheses |
||
* mine new data for new patterns, repeat |
* mine new data for new patterns, repeat |
||
What do we need? |
|||
* Ontology for data sets, hypotheses, predictions, deduction, induction, statistical inference, design of experiments |
|||
== Precedents == |
== Precedents == |
||
Line 18: | Line 15: | ||
* [http://ccsl.mae.cornell.edu/eureqa Website] |
* [http://ccsl.mae.cornell.edu/eureqa Website] |
||
Eureqa is a software tool for detecting equations and hidden mathematical relationships in your data. Its primary goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use, but AFAICT it is ''not'' open source. So we need an open source equivalent. Luckily the ideas behind Eureqa are laid out pretty plainly. |
Eureqa is a software tool for detecting equations and hidden mathematical relationships in your data. Its primary goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use, but AFAICT it is ''not'' open source. So we need an open source equivalent. Luckily the ideas behind Eureqa are laid out pretty plainly. |
||
Eureqa takes a data set and generates a curve-fitting function for that data. [http://www.alesdar.org/oldSite/IS/chap6-2.html Genetic programming] appears to be the preferred way for doing this. Hod Lipson discussed Eureqa in his talk, and his comment was that it's easy to generate a model but hard to generate an interpretation, a story that explains why that model is the right one. |
|||
== What next? == |
== What next? == |
||
Adam is designed to work alone. No connection to the broader scientific literature. Adam is confined to one very narrow problem domain. To broaden the effort, we need |
Adam is designed to work alone. No connection to the broader scientific literature. Adam is confined to one very narrow problem domain. To broaden the effort, we need |
||
* |
* International standard, an ontology for machine-parseable sharing of scientific reasoning processes |
||
** data sets |
|||
** data, hypotheses, causalities, experimental design |
|||
** hypotheses |
|||
** predictions |
|||
** deduction, induction, statistical inference |
|||
** design of experiments |
|||
* versions of Adam designed for other problem domains |
* versions of Adam designed for other problem domains |
||
* some amount of shared vocabulary, otherwise each works in isolation |
* some amount of shared vocabulary, otherwise each works in isolation |
||
Line 27: | Line 30: | ||
== Reasoning scenarios == |
== Reasoning scenarios == |
||
* Pure symbolic logic (no probabilities orconfidence levels) |
* Pure symbolic logic (no probabilities orconfidence levels) |
||
** Semantic web |
** Semantic web, inference engines, first order logic |
||
* Hypotheses with blanket probabilities |
* Hypotheses with blanket probabilities |
||
** each hypothesis describes a world, each world has logic propositions, but no probabilities |
** each hypothesis describes a world, each world has logic propositions, but no probabilities |
Revision as of 06:51, 17 January 2010
Notes from a slide show
Computers should
- look for patterns in data (data mining)
- propose falsifiable hypotheses
- design experiments to test hypotheses
- perform experiments & collect data
- confirm/deny hypotheses
- mine new data for new patterns, repeat
Precedents
Adam the "Robot Scientist"
Reported in April 2009 by Ross King at Aberystwyth University. It uses lab automation to perform experiments, and data mining to find patterns in the resulting data. Adam developed novel genomics hypotheses about S. cerevisiae yeast and tested them. Adam's conclusions were manually confirmed by human experimenters, and found to be correct.
Eureqa
Eureqa is a software tool for detecting equations and hidden mathematical relationships in your data. Its primary goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use, but AFAICT it is not open source. So we need an open source equivalent. Luckily the ideas behind Eureqa are laid out pretty plainly.
Eureqa takes a data set and generates a curve-fitting function for that data. Genetic programming appears to be the preferred way for doing this. Hod Lipson discussed Eureqa in his talk, and his comment was that it's easy to generate a model but hard to generate an interpretation, a story that explains why that model is the right one.
What next?
Adam is designed to work alone. No connection to the broader scientific literature. Adam is confined to one very narrow problem domain. To broaden the effort, we need
- International standard, an ontology for machine-parseable sharing of scientific reasoning processes
- data sets
- hypotheses
- predictions
- deduction, induction, statistical inference
- design of experiments
- versions of Adam designed for other problem domains
- some amount of shared vocabulary, otherwise each works in isolation
Reasoning scenarios
- Pure symbolic logic (no probabilities orconfidence levels)
- Semantic web, inference engines, first order logic
- Hypotheses with blanket probabilities
- each hypothesis describes a world, each world has logic propositions, but no probabilities
- use empirical evidence to update blanket probabilities
- Assign probabilities to individual propositions
- Statistical inference replaces logical deduction
- Get smart about the role of uncertainty
- Work with noisy analog data
- Get smart about signal processing, probability distributions
- Study the noise to look for deeper structures
Semantic markup for existing scientific and medical literature
Immediately useful for constructing a semantic search engine for medicine and research
Motivates development of science ontology
Machines should eventually publish journal articles
Maybe it will tell us something interesting about how humans do science
Fund long-term work by monetizing near-term work
IANAVC, but maybe one of these would work...
- Semantic search engine for doctors and researchers
- Build an oracle, win bets - politics, finance, climate
- Dual-license it and charge for commercial use
- Offer consulting services