Dakota Reference Manual
Version 6.10
Explore and Predict with Confidence

An empirical model that is created from data or the results of a submodel
Alias: none
Argument(s): none
Child Keywords:
Required/Optional  Description of Group  Dakota Keyword  Dakota Keyword Description  

Optional  id_surrogates  Identifies the subset of the response functions by number that are to be approximated (the default is all functions).  
Required (Choose One)  Surrogate Category (Group 1)  global  Select a surrogate model with global support  
multipoint  Construct a surrogate from multiple existing training points  
local  Build a locally accurate surrogate from data at a single point  
hierarchical  Hierarchical approximations use corrected results from a low fidelity model as an approximation to the results of a high fidelity "truth" model. 
Surrogate models are inexpensive approximate models that are intended to capture the salient features of an expensive highfidelity model. They can be used to explore the variations in response quantities over regions of the parameter space, or they can serve as inexpensive standins for optimization or uncertainty quantification studies (see, for example, the surrogatebased optimization methods, surrogate_based_global and surrogate_based_local).
Surrogate models supported in Dakota are categorized as Data Fitting or Hierarchical, as shown below. Each of these surrogate types provides an approximate representation of a "truth" model which is used to perform the parameter to response mappings. This approximation is built and updated using results from the truth model, called the "training data".
Data fits:
Known Issue: When using discrete variables, there have been sometimes significant differences in surrogate behavior observed across computing platforms in some cases. The cause has not yet been fully diagnosed and is currently under investigation. In addition, guidance on appropriate construction and use of surrogates with discrete variables is under development. In the meantime, users should therefore be aware that there is a risk of inaccurate results when using surrogates with discrete variables.
Data fitting methods involve construction of an approximation or surrogate model using data (response values, gradients, and Hessians) generated from the original truth model. Data fit methods can be further categorized as local, multipoint, and global approximation techniques, based on the number of points used in generating the data fit.
Taylor series expansion: taylor_series
Training data consists of a single point, plus gradient and Hessian information.
TANA3: tana
Training Data comes from a few previously evaluated points
Global full space response surface methods:
Training data is generated using either a design of experiments method applied to the truth model ( specified by dace_method_pointer), or from saved data (specified by reuse_points ) in a restart database, or an import file.
Multifidelity/hierarchical:
Multifidelity modeling involves the use of a lowfidelity physicsbased model as a surrogate for the original highfidelity model. The lowfidelity model typically involves a coarser mesh, looser convergence tolerances, reduced element order, or omitted physics.
See hierarchical.
The global and hierarchal surrogates have a correction feature in order to improve the local accuracy of the surrogate models. The correction factors force the surrogate models to match the true function values and possibly true function derivatives at the center point of each trust region. Details can be found on global correction or hierarchical correction.
Surrogate models are used extensively in the surrogatebased optimization and least squares methods, in which the goals are to reduce expense by minimizing the number of truth function evaluations and to smooth out noisy data with a global data fit. However, the use of surrogate models is not restricted to optimization techniques; uncertainty quantification and optimization under uncertainty methods are other primary users.
Data Fit Surrogate Models
A surrogate of the {data fit} type is a nonphysicsbased approximation typically involving interpolation or regression of a set of data generated from the original model. Data fit surrogates can be further characterized by the number of data points used in the fit, where a local approximation (e.g., first or secondorder Taylor series) uses data from a single point, a multipoint approximation (e.g., twopoint exponential approximations (TPEA) or twopoint adaptive nonlinearity approximations (TANA)) uses a small number of data points often drawn from the previous iterates of a particular algorithm, and a global approximation (e.g., polynomial response surfaces, kriging/gaussian_process, neural networks, radial basis functions, splines) uses a set of data points distributed over the domain of interest, often generated using a design of computer experiments.
Dakota contains several types of surface fitting methods that can be used with optimization and uncertainty quantification methods and strategies such as surrogatebased optimization and optimization under uncertainty. These are: polynomial models (linear, quadratic, and cubic), firstorder Taylor series expansion, kriging spatial interpolation, artificial neural networks, multivariate adaptive regression splines, radial basis functions, and moving least squares. With the exception of Taylor series methods, all of the above methods listed in the previous sentence are accessed in Dakota through the Surfpack library. All of these surface fitting methods can be applied to problems having an arbitrary number of design parameters. However, surface fitting methods usually are practical only for problems where there are a small number of parameters (e.g., a maximum of somewhere in the range of 3050 design parameters). The mathematical models created by surface fitting methods have a variety of names in the engineering community. These include surrogate models, metamodels, approximation models, and response surfaces. For this manual, the terms surface fit model and surrogate model are used.
The data fitting methods in Dakota include software developed by Sandia researchers and by various researchers in the academic community.
Multifidelity Surrogate Models
A second type of surrogate is the {model hierarchy} type (also called multifidelity, variable fidelity, variable complexity, etc.). In this case, a model that is still physicsbased but is of lower fidelity (e.g., coarser discretization, reduced element order, looser convergence tolerances, omitted physics) is used as the surrogate in place of the highfidelity model. For example, an inviscid, incompressible Euler CFD model on a coarse discretization could be used as a lowfidelity surrogate for a highfidelity NavierStokes model on a fine discretization.
Surrogate Model Selection
This section offers some guidance on choosing from among the available surrogate model types.
For Surrogate Based Local Optimization, using the surrogate_based_local method with a trust region:
using the keywords:
will probably work best.
If for some reason you wish or need to use a global surrogate (not recommended) then the best of these options is likely to be either:
For Efficient Global Optimization (EGO), the efficient_global method:
the default surrogate is: gaussian_process surfpack which is likely to find a more optimal value and/or require fewer true function evaluations than the alternative, gaussian_process dakota. However, the surfpack will likely take more time to build than the dakota
version. Note that currently the use_derivatives keyword is not recommended for use with EGO based methods.
For EGO based global interval estimation, the global_interval_est ego method:
the default gaussian_process surfpack will likely work better than the alternative gaussian_process dakota.
For Efficient Global Reliability Analysis (EGRA), the global_reliability method:
the surfpack and dakota
versions of the gaussian process tend to give similar answers with the dakota
version tending to use fewer true function evaluations. Since this is based on EGO, it is likely that the default surfpack is more accurate, although this has not been rigorously demonstrated.
use_derivatives
keyword will only be useful if accurate and inexpensive derivatives are available. Finite difference derivatives are disqualified on both counts. However, derivatives generated by analytical, automatic differentiation, or continuous adjoint techniques can be appropriate. Currently, first order derivatives, i.e. gradients, are the highest order derivatives that can be used to construct the gaussian_process surfpack model; Hessians will not be used even if they are available. These keywords may also be of interest: