Jump to content

Surrogate model: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
GreenC bot (talk | contribs)
 
(112 intermediate revisions by 66 users not shown)
Line 1: Line 1:
{{Short description|Engineering model}}
Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the air flow around the wing for different shape variables (length, curvature, material, ..). For many real world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and ''what-if'' analysis become impossible since they require thousands or even millions of simulation evaluations.
A '''surrogate model''' is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate [[mathematical model]] of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal [[airfoil]] shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables (e.g., length, curvature, material, etc.). For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as [[design optimization]], [[design space exploration]], [[sensitivity analysis]] and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations.


One way of alleviating this burden is by constructing approximation models, known as '''surrogate models''', [[Response surface methodology|response surface models]], metamodels or emulators, that mimic the behavior of the simulation model as closely as possible while being computationally cheap(er) to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known (or even understood), solely the input-output behavior is important. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. This approach is also known behavioral modeling or black-box modeling, though the terminology is not always consistent. When only a single design variable is involved, the process is known as [[curve fitting]] as illustrated in the Figure.
One way of alleviating this burden is by constructing approximation models, known as '''surrogate models''', ''metamodels'' or ''emulators'', that mimic the behavior of the simulation model as closely as possible while being computationally cheaper to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known (or even understood), relying solely on the input-output behavior. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. This approach is also known as behavioral modeling or [[Black box|black-box]] modeling, though the terminology is not always consistent. When only a single design variable is involved, the process is known as [[curve fitting]].


Though using surrogate models in lieu of experiments and simulations in engineering design is more common, surrogate modeling may be used in many other areas of science where there are expensive experiments and/or function evaluations.
[[Image:Multiple surrogate models.jpg|frame]]


==Goals==
While this article is written around the subject of using surrogate models in lieu of experiments and simulations in engineering design, surrogate modelling may be used in many other areas of science where there are expensive experiments and/or function evaluations.


The scientific challenge of surrogate modeling is the generation of a surrogate that is as accurate as possible, using as few simulation evaluations as possible. The process comprises three major steps which may be interleaved iteratively:
An important distinction can be made between two different applications of surrogate models: design optimization and design space approximation (also known as emulation).


* Sample selection (also known as sequential design, [[optimal experimental design]] (OED) or [[Active learning (machine learning)|active learning]])
In surrogate model based optimization an initial surrogate is constructed using some of the available budget of expensive experiments and/or simulations. The remaining experiments/simulations are run for designs which the surrogate model predicts may have promising performance. The process usually takes the form of the following search/update procedure.
* Construction of the surrogate model and optimizing the model parameters (i.e., [[bias-variance tradeoff]])
* Appraisal of the accuracy of the surrogate.


The accuracy of the surrogate depends on the number and location of samples (expensive experiments or simulations) in the design space. Various [[design of experiments]] (DOE) techniques cater to different sources of errors, in particular, errors due to noise in the data or errors due to an improper surrogate model.
*1. Initial sample selection (the experiments and/or simulations to be run)
*2. Construct surrogate model
*3. Search surrogate model (the model can be searched extensively, e.g. using a [[genetic algorithm]], as it is cheap to evaluate)
*4. Run undate experiment/simulation at new location(s) found by search and add to sample
*5. Iterate steps 2 to 4 until out of time or design 'good enough'


==Types of surrogate models==
Depending on the type of surrogate used and the complexity of the problem, the process may converge on a local or global optimum, or perhaps none at all.<ref>Jones, D.R (2001), "A taxonomy of global optimization methods based on
response surfaces," Journal of Global Optimization, 21:345-383.</ref>


Popular surrogate modeling approaches are: polynomial [[response surface]]s; [[kriging]]; more generalized [[Bayesian]] approaches;<ref>{{Cite journal|last1=Ranftl|first1=Sascha|last2=von der Linden|first2=Wolfgang|date=2021-11-13|title=Bayesian Surrogate Analysis and Uncertainty Propagation|journal=Physical Sciences Forum|volume=3|issue=1|pages=6|doi=10.3390/psf2021003006|issn=2673-9984|doi-access=free |arxiv=2101.04038}}</ref> [[gradient-enhanced kriging]] (GEK); [[radial basis function]]; [[support vector machine]]s; [[space mapping]];<ref name="space mapping">[[John Bandler|J.W. Bandler]], Q. Cheng, S.A. Dakroury, A.S. Mohamed, M.H. Bakr, K. Madsen and J. Søndergaard, "[https://ieeexplore.ieee.org/document/1262727 Space mapping: the state of the art]," IEEE Trans. Microwave Theory Tech., vol. 52, no. 1, pp. 337-361, Jan. 2004.</ref> [[artificial neural networks]] and [[Bayesian networks]].<ref>{{cite journal|last1= Cardenas |first1=IC|title= On the use of Bayesian networks as a meta-modeling approach to analyse uncertainties in slope stability analysis|journal =Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards|date=2019|volume=13|issue=1|pages=53–65|doi=10.1080/17499518.2018.1498524|bibcode=2019GAMRE..13...53C |s2cid=216590427 }}</ref> Other methods recently explored include [[Fourier transform|Fourier]] surrogate modeling <ref>Manzoni, L.; Papetti, D. M.; Cazzaniga, P.; Spolaor, S.; Mauri, G.; Besozzi, D.; Nobile, M. S. Surfing on Fitness Landscapes: A Boost on Optimization by Fourier Surrogate Modeling. Entropy 2020, 22, 285.</ref><ref>Bliek, L.; Verstraete, H. R.; Verhaegen, M.; Wahls, S. Online optimization with costly and noisy measurements using random Fourier expansions. IEEE transactions on neural networks and learning systems 2016, 29(1), 167-182.</ref> and [[random forest]]s.<ref>{{cite conference
In design space approximation, one is not interested in finding the optimal parameter vector but rather in the global behavior of the system. Here the surrogate is tuned to mimic the underlying model as closely as needed over the complete design space. Such surrogates are a useful, cheap way to gain insight into the global behavior of the system. Optimization can still occur as a post processing step, although with no update proceedure (see above) the optimum found cannot be validated.
| first = S.K.
| last = Dasari |author2=P. Andersson |author3=A. Cheddad
| title = Random Forest Surrogate Models to Support Design Space Exploration in Aerospace Use-Case
| book-title = Artificial Intelligence Applications and Innovations (AIAI 2019)
| pages = 532–544
| publisher = Springer
| date = 2019
| url = https://www.springerprofessional.de/en/random-forest-surrogate-models-to-support-design-space-explorati/16724106
| access-date = 2019-06-02
}}</ref>


For some problems, the nature of the true function is not known ''a priori'', and therefore it is not clear which surrogate model will be the most accurate one. In addition, there is no consensus on how to obtain the most reliable estimates of the accuracy of a given surrogate. Many other problems have known physics properties. In these cases, physics-based surrogates such as [[space-mapping]] based models are commonly used.<ref name="space mapping" /><ref>J.E. Rayas-Sanchez,[https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7423860&action=search&sortType=&rowsPerPage=&searchField=Search_All&matchBoolean=true&queryText=(%22Document%20Title%22:simplicity%20in%20asm) "Power in simplicity with ASM: tracing the aggressive space mapping algorithm over two decades of development and engineering applications"], IEEE Microwave Magazine, vol. 17, no. 4, pp. 64-76, April 2016.</ref>
The scientific challenge of surrogate modeling is the generation of a surrogate that is as accurate as possible, using as few simulation evaluations as possible. The process comprises three major steps which may be interleaved iteratively:


==Invariance properties==
* Sample selection (also known as sequential design, optimal experimental design (OED) or active learning)
Recently proposed comparison-based surrogate models (e.g., ranking [[support vector machine|support vector machines]]) for [[evolutionary algorithms]], such as [[CMA-ES]], allow preservation of some invariance properties of surrogate-assisted optimizers:<ref>{{cite conference
* Construction of the surrogate model and optimizing the model parameters (Bias-Variance trade-off)
| first = I.
* Appraisal of the accuracy of the surrogate.
| last = Loshchilov |author2=M. Schoenauer |author3=M. Sebag
| title = Comparison-Based Optimizers Need Comparison-Based Surrogates
| book-title = Parallel Problem Solving from Nature (PPSN XI)
| pages = 364–1373
| publisher = Springer
| date = 2010
| url = https://hal.inria.fr/file/index/docid/493921/filename/ACM-ES.pdf
}}</ref>
#Invariance with respect to [[Monotonic function|monotonic transformations]] of the function (scaling)
#Invariance with respect to [[orthogonal transform]]ations of the search space (rotation)


==Applications==
The accuracy of the surrogate depends on the number and location of samples (expensive experiments or simulations) in the design space. Various [[design of experiments]] (DOE) techniques cater to different sources of errors, in particular errors due to noise in the data or errors due to an improper surrogate model.


An important distinction can be made between two different applications of surrogate models: design optimization and design space approximation (also known as emulation).
The most popular surrogate models are polynomial [[response surface]]s, [[Kriging]], [[support vector machine]]s and [[artificial neural networks]]. For most problems, the nature of true function is not known a priori so it is not clear which surrogate model will be most accurate. In addition, there is no consensus on how to obtain the most reliable estimates of the accuracy of a given surrogate.

In surrogate model-based optimization, an initial surrogate is constructed using some of the available budgets of expensive experiments and/or simulations. The remaining experiments/simulations are run for designs which the surrogate model predicts may have promising performance. The process usually takes the form of the following search/update procedure.

#Initial sample selection (the experiments and/or simulations to be run)
#Construct surrogate model
#Search surrogate model (the model can be searched extensively, e.g., using a [[genetic algorithm]], as it is cheap to evaluate)
#Run and update experiment/simulation at new location(s) found by search and add to sample
#Iterate steps 2 to 4 until out of time or design is "good enough"

Depending on the type of surrogate used and the complexity of the problem, the process may converge on a [[Local optimum|local]] or [[global optimum]], or perhaps none at all.<ref>Jones, D.R (2001), "[http://www.ressources-actuarielles.net/EXT/ISFA/1226.nsf/9c8e3fd4d8874d60c1257052003eced6/e7dc33e4da12c5a9c12576d8002e442b/$FILE/Jones01.pdf A taxonomy of global optimization methods based on response surfaces]," Journal of Global Optimization, 21:345–383.</ref>

In design space approximation, one is not interested in finding the optimal parameter vector, but rather in the global behavior of the system. Here the surrogate is tuned to mimic the underlying model as closely as needed over the complete design space. Such surrogates are a useful, cheap way to gain insight into the global behavior of the system. Optimization can still occur as a post-processing step, although with no update procedure (see above), the optimum found cannot be validated.

== Surrogate modeling software ==
* Surrogate Modeling Toolbox (SMT: https://github.com/SMTorg/smt) is a [[Python (programming language)|Python]] package that contains a collection of surrogate modeling methods, sampling techniques, and benchmarking functions. This package provides a library of surrogate models that is simple to use and facilitates the implementation of additional methods. SMT is different from existing surrogate modeling libraries because of its emphasis on [[Derivative|derivatives]], including training derivatives used for [[gradient]]-enhanced modeling, prediction derivatives, and derivatives with respect to the training data. It also includes new surrogate models that are not available elsewhere: kriging by partial-least squares reduction and energy-minimizing [[spline interpolation]].<ref name = bouhlel2019>{{cite journal | last1 = Bouhlel | first1 = M.A. | last2 = Hwang | first2 = J.H. | last3 = Bartoli | first3 = Nathalie | last4 = Lafage |first4 = R. | last5 = Morlier | first5 = J. | last6 = Martins | first6 = J.R.R.A. | year = 2019 | title = A Python surrogate modeling framework with derivatives | journal = Advances in Engineering Software | volume = 135 | pages = 102662 | doi =10.1016/j.advengsoft.2019.03.005 | s2cid = 128324330 | url = http://mdolab.engin.umich.edu/content/python-surrogate-modeling-framework-derivatives | doi-access = free }}</ref>
* [https://surrogates.sciml.ai/latest/ Surrogates.jl] is a [[Julia (programming language)|Julia]] packages which offers tools like random forests, radial basis methods and kriging.


==See also==
==See also==
*[[Approximation of functions]]
*[[Linear approximation]]
*[[Linear approximation]]
*[[Response surface methodology]]
*[[Response surface methodology]]
*[[Kriging]]
*[[Kriging]]
*[[Radial basis function]]s
*[[Gradient-enhanced kriging]] (GEK)
*[[OptiY]]
*[[Space mapping]]
*[[Surrogate endpoint]]
*[[Surrogate data]]
*[[Fitness approximation]]
*[[Computer experiment]]
*[[Conceptual model]]
*[[Bayesian regression]]
*[[Bayesian model selection]]


==References==
==References==
<references/>
<references/>


==Further reading==
* Queipo, N.V., Haftka, R.T., [[Wei Shyy|Shyy, W.]], Goel, T., Vaidyanathan, R., Tucker, P.K. (2005), “Surrogate-based analysis and optimization,” Progress in Aerospace Sciences, 41, 1-28.
{{further cleanup|date=March 2023}}
* Queipo, N.V., Haftka, R.T., [[Wei Shyy|Shyy, W.]], Goel, T., Vaidyanathan, R., Tucker, P.K. (2005), “[https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20050186653.pdf Surrogate-based analysis and optimization],” Progress in Aerospace Sciences, 41, 1–28.
* D. Gorissen, I. Couckuyt, P. Demeester, T. Dhaene, K. Crombecq, (2010), “[http://jmlr.csail.mit.edu/papers/volume11/gorissen10a/gorissen10a.pdf A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design]," Journal of Machine Learning Research, Vol. 11, pp.&nbsp;2051−2055, July 2010.
* T-Q. Pham, A. Kamusella, H. Neubert, “[http://www.ep.liu.se/ecp/063/074/ecp11063074.pdf Auto-Extraction of Modelica Code from Finite Element Analysis or Measurement Data]," 8th International Modelica Conference, 20–22 March 2011 in Dresden.
* Forrester, Alexander, Andras Sobester, and Andy Keane, ''[https://books.google.com/books?id=ulMHmeMnRCcC&dq=%22Engineering+design+via+surrogate+modelling%3A+a+practical+guide%22&pg=PR5 Engineering design via surrogate modelling: a practical guide]'', John Wiley & Sons, 2008.
* Bouhlel, M. A. and Bartoli, N. and Otsmane, A. and Morlier, J. (2016) "[https://hal.archives-ouvertes.fr/hal-01232938/file/KPLS_paper2015.pdf Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction]", Structural and Multidisciplinary Optimization 53 (5), 935-952
* Bouhlel, M. A. and Bartoli, N. and Otsmane, A. and Morlier, J. (2016) "[http://downloads.hindawi.com/journals/mpe/2016/6723410.pdf An improved approach for estimating the hyperparameters of the kriging model for high-dimensional problems through the partial least squares method]", Mathematical Problems in Engineering


==External links==
==External links==

* [http://www.wiley.com//legacy/wileychi/forrester/terms.html Matlab code for surrogate modelling]
* [http://www.wiley.com//legacy/wileychi/forrester/terms.html Matlab code for surrogate modelling]
* [http://sumowiki.intec.ugent.be Matlab '''SU'''rrogate '''MO'''deling Toolbox - SUMO Toolbox]
* [http://sumowiki.intec.ugent.be Matlab '''SU'''rrogate '''MO'''deling Toolbox – Matlab SUMO Toolbox]
* [https://github.com/SMTorg/SMT Surrogate Modeling Toolbox -- Python]


[[Category:Experimental design]]
[[Category:Design of experiments]]
[[Category:Numerical analysis]]
[[Category:Numerical analysis]]
[[Category:Surrogate models]]
[[Category:Scientific models]]
[[Category:Mathematical modeling]]
[[Category:Machine learning]]

Latest revision as of 20:53, 30 July 2024

A surrogate model is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate mathematical model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables (e.g., length, curvature, material, etc.). For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations.

One way of alleviating this burden is by constructing approximation models, known as surrogate models, metamodels or emulators, that mimic the behavior of the simulation model as closely as possible while being computationally cheaper to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known (or even understood), relying solely on the input-output behavior. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. This approach is also known as behavioral modeling or black-box modeling, though the terminology is not always consistent. When only a single design variable is involved, the process is known as curve fitting.

Though using surrogate models in lieu of experiments and simulations in engineering design is more common, surrogate modeling may be used in many other areas of science where there are expensive experiments and/or function evaluations.

Goals

[edit]

The scientific challenge of surrogate modeling is the generation of a surrogate that is as accurate as possible, using as few simulation evaluations as possible. The process comprises three major steps which may be interleaved iteratively:

The accuracy of the surrogate depends on the number and location of samples (expensive experiments or simulations) in the design space. Various design of experiments (DOE) techniques cater to different sources of errors, in particular, errors due to noise in the data or errors due to an improper surrogate model.

Types of surrogate models

[edit]

Popular surrogate modeling approaches are: polynomial response surfaces; kriging; more generalized Bayesian approaches;[1] gradient-enhanced kriging (GEK); radial basis function; support vector machines; space mapping;[2] artificial neural networks and Bayesian networks.[3] Other methods recently explored include Fourier surrogate modeling [4][5] and random forests.[6]

For some problems, the nature of the true function is not known a priori, and therefore it is not clear which surrogate model will be the most accurate one. In addition, there is no consensus on how to obtain the most reliable estimates of the accuracy of a given surrogate. Many other problems have known physics properties. In these cases, physics-based surrogates such as space-mapping based models are commonly used.[2][7]

Invariance properties

[edit]

Recently proposed comparison-based surrogate models (e.g., ranking support vector machines) for evolutionary algorithms, such as CMA-ES, allow preservation of some invariance properties of surrogate-assisted optimizers:[8]

  1. Invariance with respect to monotonic transformations of the function (scaling)
  2. Invariance with respect to orthogonal transformations of the search space (rotation)

Applications

[edit]

An important distinction can be made between two different applications of surrogate models: design optimization and design space approximation (also known as emulation).

In surrogate model-based optimization, an initial surrogate is constructed using some of the available budgets of expensive experiments and/or simulations. The remaining experiments/simulations are run for designs which the surrogate model predicts may have promising performance. The process usually takes the form of the following search/update procedure.

  1. Initial sample selection (the experiments and/or simulations to be run)
  2. Construct surrogate model
  3. Search surrogate model (the model can be searched extensively, e.g., using a genetic algorithm, as it is cheap to evaluate)
  4. Run and update experiment/simulation at new location(s) found by search and add to sample
  5. Iterate steps 2 to 4 until out of time or design is "good enough"

Depending on the type of surrogate used and the complexity of the problem, the process may converge on a local or global optimum, or perhaps none at all.[9]

In design space approximation, one is not interested in finding the optimal parameter vector, but rather in the global behavior of the system. Here the surrogate is tuned to mimic the underlying model as closely as needed over the complete design space. Such surrogates are a useful, cheap way to gain insight into the global behavior of the system. Optimization can still occur as a post-processing step, although with no update procedure (see above), the optimum found cannot be validated.

Surrogate modeling software

[edit]
  • Surrogate Modeling Toolbox (SMT: https://github.com/SMTorg/smt) is a Python package that contains a collection of surrogate modeling methods, sampling techniques, and benchmarking functions. This package provides a library of surrogate models that is simple to use and facilitates the implementation of additional methods. SMT is different from existing surrogate modeling libraries because of its emphasis on derivatives, including training derivatives used for gradient-enhanced modeling, prediction derivatives, and derivatives with respect to the training data. It also includes new surrogate models that are not available elsewhere: kriging by partial-least squares reduction and energy-minimizing spline interpolation.[10]
  • Surrogates.jl is a Julia packages which offers tools like random forests, radial basis methods and kriging.

See also

[edit]

References

[edit]
  1. ^ Ranftl, Sascha; von der Linden, Wolfgang (2021-11-13). "Bayesian Surrogate Analysis and Uncertainty Propagation". Physical Sciences Forum. 3 (1): 6. arXiv:2101.04038. doi:10.3390/psf2021003006. ISSN 2673-9984.
  2. ^ a b J.W. Bandler, Q. Cheng, S.A. Dakroury, A.S. Mohamed, M.H. Bakr, K. Madsen and J. Søndergaard, "Space mapping: the state of the art," IEEE Trans. Microwave Theory Tech., vol. 52, no. 1, pp. 337-361, Jan. 2004.
  3. ^ Cardenas, IC (2019). "On the use of Bayesian networks as a meta-modeling approach to analyse uncertainties in slope stability analysis". Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards. 13 (1): 53–65. Bibcode:2019GAMRE..13...53C. doi:10.1080/17499518.2018.1498524. S2CID 216590427.
  4. ^ Manzoni, L.; Papetti, D. M.; Cazzaniga, P.; Spolaor, S.; Mauri, G.; Besozzi, D.; Nobile, M. S. Surfing on Fitness Landscapes: A Boost on Optimization by Fourier Surrogate Modeling. Entropy 2020, 22, 285.
  5. ^ Bliek, L.; Verstraete, H. R.; Verhaegen, M.; Wahls, S. Online optimization with costly and noisy measurements using random Fourier expansions. IEEE transactions on neural networks and learning systems 2016, 29(1), 167-182.
  6. ^ Dasari, S.K.; P. Andersson; A. Cheddad (2019). "Random Forest Surrogate Models to Support Design Space Exploration in Aerospace Use-Case". Artificial Intelligence Applications and Innovations (AIAI 2019). Springer. pp. 532–544. Retrieved 2019-06-02.
  7. ^ J.E. Rayas-Sanchez,"Power in simplicity with ASM: tracing the aggressive space mapping algorithm over two decades of development and engineering applications", IEEE Microwave Magazine, vol. 17, no. 4, pp. 64-76, April 2016.
  8. ^ Loshchilov, I.; M. Schoenauer; M. Sebag (2010). "Comparison-Based Optimizers Need Comparison-Based Surrogates" (PDF). Parallel Problem Solving from Nature (PPSN XI). Springer. pp. 364–1373.
  9. ^ Jones, D.R (2001), "A taxonomy of global optimization methods based on response surfaces," Journal of Global Optimization, 21:345–383.
  10. ^ Bouhlel, M.A.; Hwang, J.H.; Bartoli, Nathalie; Lafage, R.; Morlier, J.; Martins, J.R.R.A. (2019). "A Python surrogate modeling framework with derivatives". Advances in Engineering Software. 135: 102662. doi:10.1016/j.advengsoft.2019.03.005. S2CID 128324330.

Further reading

[edit]
[edit]