Responses Commands

Responses Commands Table of Contents

Responses Description

Responses specify the data set produced by an interface after the completion of a "function evaluation." Here, the term function evaluation is used loosely to denote a data request from an iterator that is mapped through an interface in a single pass. Strictly speaking, this data request may actually involve multiple response functions and their derivatives, but the term function evaluation is widely used for this purpose. The data set is potentially comprised of a set of functions, their first derivative vectors (gradients), and their second derivative matrices (Hessians). This abstraction provides a generic data container (the Response class) whose contents are interpreted differently depending upon the type of iteration being performed. In the case of optimization, the set of functions consists of one or more objective functions, nonlinear inequality constraints, and nonlinear equality constraints. (Linear constraints are not part of a response set since their coefficients can be communicated to an optimizer at start up and then computed internally for all function evaluations; see Method Independent Controls). In the case of least squares iterators, the functions consist of individual residual terms or model responses together with an observed data file for comparison (as opposed to a sum of the squares objective function) as well as nonlinear inequality and equality constraints. In the case of nondeterministic iterators, the function set is made up of generic response functions for which the effect of parameter uncertainty is to be quantified. Parameter study and design of experiments iterators may be used with any of the response data set types. Thus the interpretation of the response data varies from iterator to iterator.

Gradient specification types include none, numerical, analytic, and mixed. The no_gradients selection indicates that gradient information is not needed in the study. The numerical_gradients selection means that gradient information is needed and will be computed with finite differences by DAKOTA or the optimization algorithm in use. The analytic_gradients selection means that gradient information is available directly from the simulation (finite differencing is not required). And the mixed_gradients selection means that some gradient information is available directly from the simulation whereas the rest will have to be estimated with finite differences.

Hessian availability is characterized as none, analytic, numerical, quasi, or mixed. Similar to gradients, the no_hessians selection indicates that Hessian information is not needed/available in the study, and the analytic_hessians selection indicates that Hessian information is available directly from the simulation. The numerical_hessians selection indicates that Hessian information will be estimated with finite differences. The quasi_hessians specification means that Hessian information will be accumulated over time using secant updates based on the existing gradient evaluations. Finally, the mixed_hessians selection allows for a mixture of analytic, numerical, and quasi Hessian response data.

Responses specify the total data set that is available for use by the method over the course of iteration. This is distinguished from the data subset described by an active set vector (see DAKOTA File Data Formats in the Users Manual [Adams et al., 2010]) indicating the particular subset of the response data needed for a particular function evaluation. Thus, the responses specification is a broad description of the data to be used during a study whereas the active set vector indicates the subset currently needed.

Several examples follow. The first example shows an optimization data set containing an objective function and two nonlinear inequality constraints. These three functions have analytic gradient availability and no Hessian availability.

responses,
	objective_functions = 1
	nonlinear_inequality_constraints = 2
	analytic_gradients
	no_hessians

The next example shows a typical specification for a calibration data set. The six residual functions will have numerical gradients computed using the dakota finite differencing routine with central differences of 0.1% (plus/minus delta value = .001*value).

responses,
	calibration_terms = 6
	numerical_gradients
	  method_source dakota
	  interval_type central
	  fd_gradient_step_size = .001
	no_hessians

The last example shows a specification that could be used with a nondeterministic sampling iterator. The three response functions have no gradient or Hessian availability; therefore, only function values will be used by the iterator.

responses,
	response_functions = 3
	no_gradients
	no_hessians

Parameter study and design of experiments iterators are not restricted in terms of the response data sets which may be catalogued; they may be used with any of the function specification examples shown above.

Responses Specification

The responses specification has the following structure (see dakota.input.summary):

responses,
	<set identifier>
	<response descriptors>
	<function specification>
	<gradient specification>
	<Hessian specification>

The set identifier and response descriptors are optional. However, the function, gradient, and Hessian specifications are all required, their type selected from the options discussed above. For example, the function specification must be one of three types:

The following sections describe each of these specification components and their options in additional detail.

Responses Set Identifier

The optional set identifier specification uses the keyword id_responses to input a string for use in identifying a particular responses specification. A model can then identify the use of this response set by specifying the same string in its responses_pointer specification (see Model Independent Controls). For example, a model whose specification contains responses_pointer = 'R1' will use a responses set with id_responses = 'R1'.

If the id_responses specification is omitted, a particular responses specification will be used by a model only if that model omits specifying a responses_pointer and if the responses set was the last set parsed (or is the only set parsed). In common practice, if only one responses set exists, then id_responses can be safely omitted from the responses specification and responses_pointer can be omitted from the model specification(s), since there is no potential for ambiguity in this case. Table 9.1 summarizes the set identifier input.

Table 9.1 Specification detail for set identifier

Description

Keyword

Associated Data

Status

Default

Responses set identifier

id_responses

string

Optional

use of last responses parsed

Response Labels

The optional response labels specification response_descriptors is a list of strings which will be printed in DAKOTA output to identify the values for particular response functions. The default descriptor strings use a root string plus a numeric identifier. This root string is "obj_fn" for objective functions, "least_sq_term" for least squares terms, "response_fn" for generic response functions, "nln_ineq_con" for nonlinear inequality constraints, and "nln_eq_con" for nonlinear equality constraints. Table 9.2 summarizes the response descriptors input.

Table 9.2 Specification detail for response labels

Description

Keyword

Associated Data

Status

Default

Response labels

descriptors

list of strings

Optional

root strings plus numeric identifiers

Function Specification

The function specification must be one of three types: 1) a group containing objective and constraint functions, 2) a group containing calibration (least squares) terms and constraint functions, or 3) a generic response functions specification. These function sets correspond to optimization, least squares, and uncertainty quantification iterators, respectively. Parameter study and design of experiments iterators may be used with any of the three function specifications.

Objective and constraint functions (optimization data set)

An optimization data set is specified using objective_functions and optionally objective_function_scale_types, objective_function_scales, multi_objective_weights, nonlinear_inequality_constraints, nonlinear_inequality_lower_bounds, nonlinear_inequality_upper_bounds, nonlinear_inequality_scale_types, nonlinear_inequality_scales, nonlinear_equality_constraints, nonlinear_equality_targets, nonlinear_equality_scale_types, and nonlinear_equality_scales. The objective_functions, nonlinear_inequality_constraints, and nonlinear_equality_constraints inputs specify the number of objective functions, nonlinear inequality constraints, and nonlinear equality constraints, respectively. The number of objective functions must be 1 or greater, and the number of inequality and equality constraints must be 0 or greater. The objective_function_scale_types specification includes strings specifying the scaling type for each objective function value in methods that support scaling, when scaling is enabled (see Method Independent Controls for details). Each entry in objective_function_scale_types may be selected from 'none', 'value', or 'log', to select no, characteristic value, or logarithmic scaling, respectively. Automatic scaling is not available for objective functions. If a single string is specified it will apply to each objective function. Each entry in objective_function_scales may be a user-specified nonzero characteristic value to be used in scaling each objective function. These values are ignored for scaling type 'none', required for 'value', and optional for 'log'. If a single real value is specified it will apply to each function. If the number of objective functions is greater than 1, then a multi_objective_weights specification provides a simple weighted-sum approach to combining multiple objectives:

\[f = \sum_{i=1}^{n} w_{i}f_{i}\]

If this is not specified, then each objective function is given equal weighting:

\[f = \sum_{i=1}^{n} \frac{f_i}{n}\]

If scaling is specified, it is applied before multi-objective weighted sums are formed.

The nonlinear_inequality_lower_bounds and nonlinear_inequality_upper_bounds specifications provide the lower and upper bounds for 2-sided nonlinear inequalities of the form

\[g_l \leq g(x) \leq g_u\]

The defaults for the inequality constraint bounds are selected so that one-sided inequalities of the form

\[g(x) \leq 0.0\]

result when there are no user constraint bounds specifications (this provides backwards compatibility with previous DAKOTA versions). In a user bounds specification, any upper bound values greater than +bigRealBoundSize (1.e+30, as defined in Minimizer) are treated as +infinity and any lower bound values less than -bigRealBoundSize are treated as -infinity. This feature is commonly used to drop one of the bounds in order to specify a 1-sided constraint (just as the default lower bounds drop out since -DBL_MAX < -bigRealBoundSize). The same approach is used for nonexistent linear inequality bounds as described in Method Independent Controls and for nonexistent design variable bounds as described in Design Variables.

The nonlinear_equality_targets specification provides the targets for nonlinear equalities of the form

\[g(x) = g_t\]

and the defaults for the equality targets enforce a value of 0. for each constraint

\[g(x) = 0.0\]

The nonlinear_inequality_scale_types and nonlinear_equality_scale_types specifications include strings specifying the scaling type for each nonlinear inequality or equality constraint, respectively, in methods that support scaling, when scaling is enabled (see Method Independent Controls for details). Each entry in objective_function_scale_types may be selected from 'none', 'value', 'auto', or 'log', to select no, characteristic value, automatic, or logarithmic scaling, respectively. If a single string is specified it will apply to all components of the relevant nonlinear constraint vector. Each entry in nonlinear_inequality_scales and nonlinear_equality_scales may be a user-specified nonzero characteristic value to be used in scaling each constraint component. These values are ignored for scaling type 'none', required for 'value', and optional for 'auto' and 'log'. If a single real value is specified it will apply to each constraint.

Any linear constraints present in an application need only be input to an optimizer at start up and do not need to be part of the data returned on every function evaluation (see the linear constraints description in Method Independent Controls). Table 9.3 summarizes the optimization data set specification.

Table 9.3 Specification detail for optimization data sets

Description

Keyword

Associated Data

Status

Default

Number of objective functions

objective_functions

integer

Required group

N/A

Objective function scaling types

objective_function_scale_types

list of strings

Optional

vector values = 'none'

Objective function scales

objective_function_scales

list of reals

Optional

vector values = 1. (no scaling)

Multiobjective weightings

multi_objective_weights

list of reals

Optional

equal weightings

Number of nonlinear inequality constraints

nonlinear_inequality_constraints

integer

Optional

0

Nonlinear inequality constraint lower bounds

nonlinear_inequality_lower_bounds

list of reals

Optional

vector values = -DBL_MAX

Nonlinear inequality constraint upper bounds

nonlinear_inequality_upper_bounds

list of reals

Optional

vector values = 0.

Nonlinear inequality constraint scaling types

nonlinear_inequality_scale_types

list of strings

Optional

vector values = 'none'

Nonlinear inequality constraint scales

nonlinear_inequality_scales

list of reals

Optional

vector values = 1. (no scaling)

Number of nonlinear equality constraints

nonlinear_equality_constraints

integer

Optional

0

Nonlinear equality constraint targets

nonlinear_equality_targets

list of reals

Optional

vector values = 0.

Nonlinear equality constraint scaling types

nonlinear_equality_scale_types

list of strings

Optional

vector values = 'none'

Nonlinear equality constraint scales

nonlinear_equality_scales

list of reals

Optional

vector values = 1. (no scaling)

Calibration terms and constraint functions (least squares data set)

A calibration data set is specified using calibration_terms and optionally the specifications summarized in Table 9.4 and Table 9.5, including weighting/scaling, data, and constraints. Each of the calibration terms is a residual function to be driven toward zero, and the nonlinear inequality and equality constraint specifications have identical meanings to those described in Objective and constraint functions (optimization data set). These types of problems are commonly encountered in parameter estimation, system identification, and model calibration. Least squares calibration problems are most efficiently solved using special-purpose least squares solvers such as Gauss-Newton or Levenberg-Marquardt; however, they may also be solved using general-purpose optimization algorithms.

While DAKOTA can solve these problems with either least squares or optimization algorithms, the response data sets to be returned from the simulator are different. Least squares calibration involves a set of residual functions whereas optimization involves a single objective function (sum of the squares of the residuals), i.e.,

\[f = \sum_{i=1}^{n} R_i^2\]

where f is the objective function and the set of $R_i$ are the residual functions. Therefore, function values and derivative data in the least squares case involve the values and derivatives of the residual functions, whereas the optimization case involves values and derivatives of the sum of squares objective function. This means that in the least squares calibration case, the user must return each of n residuals separately as a separate calibration term. Switching between the two approaches sometimes requires different simulation interfaces capable of returning the different granularity of response data required, although DAKOTA supports automatic recasting of residuals into a sum of squares for presentation to an optimization method. Typically, the user must compute the difference between the model results and the observations when computing the residuals. However, the user has the option of specifying the observational data (e.g. from physical experiments or other sources) in a file. The specification calibration_data_file may be used to specify a text file containing calibration_terms observed data values (in a supported DAKOTA tabular format; default formats change in DAKOTA .5.2 -- see User's Manual) to be used in computing the residuals

\[R_i = y^M_i - y^O_i \]

where M denotes model and O, observation from the file. In this case the simulator should return the actual model response, as DAKOTA will compute the residual internally using the supplied data.

The calibration_term_scale_types specification includes strings specifying the scaling type for each residual term in methods that support scaling, when scaling is enabled (see Method Independent Controls for details). Each entry in calibration_term_scale_types may be selected from 'none', 'value', or 'log', to select no, characteristic value, or logarithmic scaling, respectively. Automatic scaling is not available for calibration terms. If a single string is specified it will apply to each least squares terms. Each entry in calibration_term_scales may be a user-specified nonzero characteristic value to be used in scaling each term. These values are ignored for scaling type 'none', required for 'value', and optional for 'log'. If a single real value is specified it will apply to each term. The calibration_weights specification provides a means to specify a relative emphasis among the vector of squared residuals through multiplication of these squared residuals by a vector of weights:

\[f = \sum_{i=1}^{n} w_i R_i^2 = \sum_{i=1}^{n} w_i (y^M_i - y^O_i)^2\]

If characteristic value scaling is additionally specified, then it is applied to each residual prior to squaring:

\[f = \sum_{i=1}^{n} w_i (\frac{y^M_i - y^O_i}{s_i})^2\]

And in the case where experimental data uncertainties are supplied, then the weights are automatically defined to be the inverse of the experimental variance:

\[f = \sum_{i=1}^{n} \frac{1}{\sigma^2_i} (\frac{y^M_i - y^O_i}{s_i})^2\]

Table 9.4 Specification detail for nonlinear least squares data sets (calibration terms)

Description

Keyword

Associated Data

Status

Default

Number of calibration terms

calibration_terms

integer

Required

N/A

Calibration data file name

calibration_data_file

string

Optional

none

Experiments (rows) in file

num_experiments

integer

Optional

1

Data file in annotated format

annotated

boolean

Optional

annotated

Data file in freeform format

freeform

boolean

Optional

annotated

Configuration variable columns in file

num_config_variables

integer

Optional

0

Standard deviation columns in file

num_std_deviations

integer

Optional

0

Calibration scaling types

calibration_term_scale_types

list of strings

Optional

vector values = 'none'

Calibration scales

calibration_term_scales

list of reals

Optional

no scaling (vector values = 1.)

Calibration term weights

calibration_weights

list of reals

Optional

equal weighting

Table 9.5 Specification detail for nonlinear least squares data sets (constraints)

Description

Keyword

Associated Data

Status

Default

Number of nonlinear inequality constraints

nonlinear_inequality_constraints

integer

Optional

0

Nonlinear inequality lower bounds

nonlinear_inequality_lower_bounds

list of reals

Optional

vector values = -DBL_MAX

Nonlinear inequality upper bounds

nonlinear_inequality_upper_bounds

list of reals

Optional

vector values = 0.

Nonlinear inequality scaling types

nonlinear_inequality_scale_types

list of strings

Optional

vector values = 'none'

Nonlinear inequality scales

nonlinear_inequality_scales

list of reals

Optional

no scaling (vector values = 1.)

Number of nonlinear equality constraints

nonlinear_equality_constraints

integer

Optional

0

Nonlinear equality targets

nonlinear_equality_targets

list of reals

Optional

vector values = 0.

Nonlinear equality scaling types

nonlinear_equality_scale_types

list of strings

Optional

vector values = 'none'

Nonlinear equality scales

nonlinear_equality_scales

list of reals

Optional

no scaling (vector values = 1.)

Response functions (generic data set)

A generic response data set is specified using response_functions. Each of these functions is simply a response quantity of interest with no special interpretation taken by the method in use. This type of data set is used by uncertainty quantification methods, in which the effect of parameter uncertainty on response functions is quantified, and can also be used in parameter study and design of experiments methods (although these methods are not restricted to this data set), in which the effect of parameter variations on response functions is evaluated. Whereas objective, constraint, and residual functions have special meanings for optimization and least squares algorithms, the generic response function data set need not have a specific interpretation and the user is free to define whatever functional form is convenient. Table 9.6 summarizes the generic response function data set specification.

Table 9.6 Specification detail for generic response function data sets

Description

Keyword

Associated Data

Status

Default

Number of response functions

response_functions

integer

Required

N/A

Gradient Specification

The gradient specification must be one of four types: 1) no gradients, 2) numerical gradients, 3) analytic gradients, or 4) mixed gradients.

No gradients

The no_gradients specification means that gradient information is not needed in the study. Therefore, it will neither be retrieved from the simulation nor computed with finite differences. The no_gradients keyword is a complete specification for this case.

Numerical gradients

The numerical_gradients specification means that gradient information is needed and will be computed with finite differences using either the native or one of the vendor finite differencing routines.

The method_source setting specifies the source of the finite differencing routine that will be used to compute the numerical gradients: dakota denotes DAKOTA's internal finite differencing algorithm and vendor denotes the finite differencing algorithm supplied by the iterator package in use (DOT, CONMIN, NPSOL, NL2SOL, NLSSOL, and OPT++ each have their own internal finite differencing routines). The dakota routine is the default since it can execute in parallel and exploit the concurrency in finite difference evaluations (see Exploiting Parallelism in the Users Manual [Adams et al., 2010]). However, the vendor setting can be desirable in some cases since certain libraries will modify their algorithm when the finite differencing is performed internally. Since the selection of the dakota routine hides the use of finite differencing from the optimizers (the optimizers are configured to accept user-supplied gradients, which some algorithms assume to be of analytic accuracy), the potential exists for the vendor setting to trigger the use of an algorithm more optimized for the higher expense and/or lower accuracy of finite-differencing. For example, NPSOL uses gradients in its line search when in user-supplied gradient mode (since it assumes they are inexpensive), but uses a value-based line search procedure when internally finite differencing. The use of a value-based line search will often reduce total expense in serial operations. However, in parallel operations, the use of gradients in the NPSOL line search (user-supplied gradient mode) provides excellent load balancing without need to resort to speculative optimization approaches. In summary, then, the dakota routine is preferred for parallel optimization, and the vendor routine may be preferred for serial optimization in special cases.

The interval_type setting is used to select between forward and central differences in the numerical gradient calculations. The dakota, DOT vendor, and OPT++ vendor routines have both forward and central differences available, the CONMIN and NL2SOL vendor routines support forward differences only, and the NPSOL and NLSSOL vendor routines start with forward differences and automatically switch to central differences as the iteration progresses (the user has no control over this). The following forward difference expression

\[ \nabla f ({\bf x}) \cong \frac{f ({\bf x} + h {\bf e}_i) - f ({\bf x})}{h} \]

and the following central difference expression

\[ \nabla f ({\bf x}) \cong \frac{f ({\bf x} + h {\bf e}_i) - f ({\bf x} - h {\bf e}_i)}{2h} \]

are used to estimate the $i^{th}$ component of the gradient vector.

Lastly, fd_gradient_step_size specifies the relative finite difference step size to be used in the computations. Either a single value may be entered for use with all parameters, or a list of step sizes may be entered, one for each parameter. The latter option of a list of step sizes is only valid for use with the DAKOTA finite differencing routine. For DAKOTA, DOT, CONMIN, and OPT++, the differencing intervals are computed by multiplying the fd_gradient_step_size with the current parameter value. In this case, a minimum absolute differencing interval is needed when the current parameter value is close to zero. This prevents finite difference intervals for the parameter which are too small to distinguish differences in the response quantities being computed. DAKOTA, DOT, CONMIN, and OPT++ all use .01*fd_gradient_step_size as their minimum absolute differencing interval. With a fd_gradient_step_size = .001, for example, DAKOTA, DOT, CONMIN, and OPT++ will use intervals of .001*current value with a minimum interval of 1.e-5. NPSOL and NLSSOL use a different formula for their finite difference intervals: fd_gradient_step_size*(1+|current parameter value|). This definition has the advantage of eliminating the need for a minimum absolute differencing interval since the interval no longer goes to zero as the current parameter value goes to zero.

When DAKOTA computes gradients or Hessians by finite differences and the variables in question have bounds, it by default chooses finite-differencing steps that keep the variables within their specified bounds. Older versions of DAKOTA generally ignored bounds when computing finite differences. To restore the older behavior, one can add keyword ignore_bounds to the response specification when method_source dakota (or just dakota) is also specified. In forward difference or backward difference computations, honoring bounds is straightforward. To honor bounds when approximating $\partial f / \partial x_i$, i.e., component $i$ of the gradient of $f$, by central differences, DAKOTA chooses two steps $h_1$ and $h_2$ with $h_1 \ne h_2$, such that $x + h_1 e_i$ and $x + h_2 e_i$ both satisfy the bounds, and then computes

\[ \frac{\partial f}{\partial x_i} \cong \frac{h_2^2(f_1 - f_0) - h_1^2(f_2 - f_0)}{h_1 h_2 (h_2 - h_1)} , \]

with $f_0 = f(x)$, $f_1 = f(x + h_1 e_i)$, and $f_2 = f(x + h_2 e_i)$.

Table 9.7 summarizes the numerical gradient specification.

Table 9.7 Specification detail for numerical gradients

Description

Keyword

Associated Data

Status

Default

Numerical gradients

numerical_gradients

none

Required group

N/A

Method source

method_source

dakota | vendor

Optional group

dakota

Interval type

interval_type

forward | central

Optional group

forward

Finite difference step size

fd_gradient_step_size

list of reals

Optional

0.001

Ignore variable bounds

ignore_bounds

none

Optional

bounds respected

Analytic gradients

The analytic_gradients specification means that gradient information is available directly from the simulation (finite differencing is not required). The simulation must return the gradient data in the DAKOTA format (enclosed in single brackets; see DAKOTA File Data Formats in the Users Manual [Adams et al., 2010]) for the case of file transfer of data. The analytic_gradients keyword is a complete specification for this case.

Mixed gradients

The mixed_gradients specification means that some gradient information is available directly from the simulation (analytic) whereas the rest will have to be finite differenced (numerical). This specification allows the user to make use of as much analytic gradient information as is available and then finite difference for the rest. For example, the objective function may be a simple analytic function of the design variables (e.g., weight) whereas the constraints are nonlinear implicit functions of complex analyses (e.g., maximum stress). The id_analytic_gradients list specifies by number the functions which have analytic gradients, and the id_numerical_gradients list specifies by number the functions which must use numerical gradients. Each function identifier, from 1 through the total number of functions, must appear once and only once within the union of the id_analytic_gradients and id_numerical_gradients lists. The method_source, interval_type, and fd_gradient_step_size specifications are as described previously in Numerical gradients and pertain to those functions listed by the id_numerical_gradients list. Table 9.8 summarizes the mixed gradient specification.

Table 9.8 Specification detail for mixed gradients

Description

Keyword

Associated Data

Status

Default

Mixed gradients

mixed_gradients

none

Required group

N/A

Analytic derivatives function list

id_analytic_gradients

list of integers

Required

N/A

Numerical derivatives function list

id_numerical_gradients

list of integers

Required

N/A

Method source

method_source

dakota | vendor

Optional group

dakota

Interval type

interval_type

forward | central

Optional group

forward

Finite difference step size

fd_step_size

list of reals

Optional

0.001

Ignore variable bounds

ignore_bounds

none

Optional

bounds respected

Hessian Specification

Hessian availability must be specified with either no_hessians, numerical_hessians, quasi_hessians, analytic_hessians, or mixed_hessians.

No Hessians

The no_hessians specification means that the method does not require DAKOTA to manage the computation of any Hessian information. Therefore, it will neither be retrieved from the simulation nor computed by DAKOTA. The no_hessians keyword is a complete specification for this case. Note that, in some cases, Hessian information may still be being approximated internal to an algorithm (e.g., within a quasi-Newton optimizer such as optpp_q_newton); however, DAKOTA has no direct involvement in this process and the responses specification need not include it.

Numerical Hessians

The numerical_hessians specification means that Hessian information is needed and will be computed with finite differences using either first-order gradient differencing (for the cases of analytic_gradients or for the functions identified by id_analytic_gradients in the case of mixed_gradients) or first- or second-order function value differencing (all other gradient specifications). In the former case, the following expression

\[ \nabla^2 f ({\bf x})_i \cong \frac{\nabla f ({\bf x} + h {\bf e}_i) - \nabla f ({\bf x})}{h} \]

estimates the $i^{th}$ Hessian column, and in the latter case, the following expressions

\[ \nabla^2 f ({\bf x})_{i,j} \cong \frac{f({\bf x} + h_i {\bf e}_i + h_j {\bf e}_j) - f({\bf x} + h_i {\bf e}_i) - f({\bf x} - h_j {\bf e}_j) + f({\bf x})}{h_i h_j} \]

and

\[ \nabla^2 f ({\bf x})_{i,j} \cong \frac{f({\bf x} + h {\bf e}_i + h {\bf e}_j) - f({\bf x} + h {\bf e}_i - h {\bf e}_j) - f({\bf x} - h {\bf e}_i + h {\bf e}_j) + f({\bf x} - h {\bf e}_i - h {\bf e}_j)}{4h^2} \]

provide first- and second-order estimates of the $ij^{th}$ Hessian term. Prior to DAKOTA 5.0, DAKOTA always used second-order estimates. In DAKOTA 5.0 and newer, the default is to use first-order estimates (which honor bounds on the variables and require only about a quarter as many function evaluations as do the second-order estimates), but specifying central after numerical_hessians causes DAKOTA to use the old second-order estimates, which do not honor bounds. In optimization algorithms that use Hessians, there is little reason to use second-order differences in computing Hessian approximations.

The fd_hessian_step_size specifies the relative finite difference step size to be used in these differences. Either a single value may be entered for use with all parameters, or a list of step sizes may be entered, one for each parameter. The differencing intervals are computed by multiplying the fd_hessian_step_size with the current parameter value. A minimum absolute differencing interval of .01*fd_hessian_step_size is used when the current parameter value is close to zero. Table 9.9 summarizes the numerical Hessian specification.

Table 9.9 Specification detail for numerical Hessians

Description

Keyword

Associated Data

Status

Default

Numerical Hessians

numerical_hessians

none

Required group

N/A

Finite difference step size

fd_step_size

list of reals

Optional

0.001 (1st-order), 0.002 (2nd-order)

Difference order

forward | central

none

Optional

forward

Quasi Hessians

The quasi_hessians specification means that Hessian information is needed and will be approximated using secant updates (sometimes called ``quasi-Newton updates, though any algorithm that approximates Newton's method is a quasi-Newton method). Compared to finite difference numerical Hessians, secant approximations do not expend additional function evaluations in estimating all of the second-order information for every point of interest. Rather, they accumulate approximate curvature information over time using the existing gradient evaluations. The supported secant approximations include the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update (specified with the keyword bfgs)

\[ B_{k+1} = B_{k} - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} + \frac{y_k y_k^T}{y_k^T s_k} \]

and the Symmetric Rank 1 (SR1) update (specified with the keyword sr1)

\[ B_{k+1} = B_k + \frac{(y_k - B_k s_k)(y_k - B_k s_k)^T}{(y_k - B_k s_k)^T s_k} \]

where $B_k$ is the $k^{th}$ approximation to the Hessian, $s_k = x_{k+1} - x_k$ is the step and $y_k = \nabla f_{k+1} - \nabla f_k$ is the corresponding yield in the gradients. In both cases, an initial scaling of $\frac{y_k^T y_k}{y_k^T s_k} I$ is used for $B_0$ prior to the first update. In addition, both cases employ basic numerical safeguarding to protect against numerically small denominators within the updates. This safeguarding skips the update if $|y_k^T s_k| < 10^{-6} s_k^T B_k s_k$ in the BFGS case or if $|(y_k - B_k s_k)^T s_k| < 10^{-6} ||s_k||_2 ||y_k - B_k s_k||_2$ in the SR1 case. In the BFGS case, additional safeguarding can be added using the damped option, which utilizes an alternative damped BFGS update when the curvature condition $y_k^T s_k > 0$ is nearly violated. Table 9.10 summarizes the quasi Hessian specification.

Table 9.10 Specification detail for quasi Hessians

Description

Keyword

Associated Data

Status

Default

Quasi Hessians

quasi_hessians

bfgs | sr1

Required group

N/A

Numerical safeguarding of BFGS update

damped

none

Optional

undamped BFGS

Analytic Hessians

The analytic_hessians specification means that Hessian information is available directly from the simulation. The simulation must return the Hessian data in the DAKOTA format (enclosed in double brackets; see DAKOTA File Data Formats in Users Manual [Adams et al., 2010]) for the case of file transfer of data. The analytic_hessians keyword is a complete specification for this case.

Mixed Hessians

The mixed_hessians specification means that some Hessian information is available directly from the simulation (analytic) whereas the rest will have to be estimated by finite differences (numerical) or approximated by secant updating. As for mixed gradients, this specification allows the user to make use of as much analytic information as is available and then estimate/approximate the rest. The id_analytic_hessians list specifies by number the functions which have analytic Hessians, and the id_numerical_hessians and id_quasi_hessians lists specify by number the functions which must use numerical Hessians and secant Hessian updates, respectively. Each function identifier, from 1 through the total number of functions, must appear once and only once within the union of the id_analytic_hessians, id_numerical_hessians, and id_quasi_hessians lists. The fd_hessian_step_size and bfgs, damped bfgs, or sr1 secant update selections are as described previously in Numerical Hessians and Quasi Hessians and pertain to those functions listed by the id_numerical_hessians and id_quasi_hessians lists. Table 9.11 summarizes the mixed Hessian specification.

Table 9.11 Specification detail for mixed Hessians

Description

Keyword

Associated Data

Status

Default

Mixed Hessians

mixed_hessians

none

Required group

N/A

Analytic Hessians function list

id_analytic_hessians

list of integers

Required

N/A

Numerical Hessians function list

id_numerical_hessians

list of integers

Required

N/A

Finite difference step size

fd_step_size

list of reals

Optional

0.001 (1st-order), 0.002 (2nd-order)

Quasi Hessians function list

id_quasi_hessians

list of integers

Required

N/A

Quasi-Hessian update

bfgs | sr1

none

Required

N/A

Numerical safeguarding of BFGS update

damped

none

Optional

undamped BFGS



Previous chapter

Next chapter

Generated on 9 Feb 2012 for DAKOTA by  doxygen 1.6.1