![]() |
Dakota Reference Manual
Version 6.10
Explore and Predict with Confidence
|
Beginning with release 6.9, Dakota gained the ability to write many method results such as the correlation matrices computed by sampling studies and the best parameters discovered by optimization methods to disk in HDF5. In Dakota 6.10 and above, evaluation data (variables and responses for each model or interface evaluation) may also be written. Many users may find this newly supported format more convenient than scraping or copying and pasting from Dakota's console output.
To enable HDF5 output, the results_output keyword with the hdf5 option must be added to the Dakota input file. In additon, Dakota must have been built with HDF5 support. Beginning with Dakota 6.10, HDF5 is enabled in our publicly available downloads. HDF5 support is considered a somewhat experimental feature. The results of some Dakota methods are not yet written to HDF5, and in a few, limited situations, enabling HDF5 will cause Dakota to crash.
HDF5 is a format that is widely used in scientific software for efficiently storing and organizing data. The HDF5 standard and libraries are maintained by the HDF Group.
In HDF5, data are stored in multidimensional arrays called datasets. Datasets are organized hierarchically in groups, which also can contain other groups. Datasets and groups are conceptually similar to files and directories in a filesystem. In fact, every HDF5 file contains at least one group, the root group, denoted "/", and groups and datasets are referred to using slash-delimited absolute or relative paths, which are more accurately called link names.
HDF5 has as one goal that data be "self-documenting" through the use of metadata. Dakota output files include two kinds of metadata.
dakota_version
, and (in Dakota output) the value can be a string-, integer-, or real-valued scalar. Dakota stores the number of samples that were requested in a sampling
study in the attribute 'samples'.Many popular programming languages have support, either natively or from a third-party library, for reading and writing HDF5 files. The HDF Group itself supports C/C++ and Java libraries. The Dakota Project suggests the h5py
module for Python. Examples that demonstrate using h5py
to access and use Dakota HDF5 output may be found in the Dakota installation at share/dakota/examples/hdf5
.
Currently, complete or nearly complete coverage of results from sampling, optimization and calibration methods, parameter studies, and stochastic expansions exists. Coverage will continue to expand in future releases to include not only the results of all methods, but other potentially useful information such as interface evaluations and model tranformations.
Methods in Dakota have a character string Id and are executed by Dakota one or more times. (Methods are executed more than once in studies that include a nested model, for example.) The Id may be provided by the user in the input file using the id_method keyword, or it may be automatically generated by Dakota. Dakota uses the label NO_METHOD_ID
for methods that are specified in the input file without an id_method
, and NOSPEC_METHOD_ID_<N>
for methods that it generates for its own internal use. The <N> in the latter case is an incrementing integer that begins at 1.
The results for the <N>th execution of a method that has the label <method
Id> are stored in the group
/methods/<method Id>/results/execution:<N>/
The /methods
group is always present in Dakota HDF5 files, provided at least one method added results to the output. (In a future Dakota release, the top level groups /interfaces
and /models
will be added.) The group execution:1
also is always present, even if there is only a single execution.
The groups and datasets for each type of result that Dakota is currently capable of storing are described in the following sections. Every dataset is documented in its own table. These tables include:
/methods/<method Id>/execution:<N>
. This path may include both literal text that is always present and replacement text. Replacement text is <enclosed in angle brackets and italicized>. Two examples of replacement text are <response descriptor> and <variable descriptor>, which indicate that the name of a Dakota response or variable makes up a portion of the path.The Expected Output section of each method's keyword documentation indicates the kinds of output, if any, that method currently can write to HDF5. These are typically in the form of bulleted lists with clariying notes that refer back to the sections that follow.
Several pieces of information about the Dakota study are stored as attributes of the top-level HDF5 root group ("/"). These include:
Study Attributes | |||
---|---|---|---|
Label | Type | Description | |
dakota_version | String | Version of Dakota used to run the study | |
dakota_revision | String | Dakota version control information | |
output_version | String | Version of the output file | |
input | String | Dakota input file | |
total_cpu_time | Real | Combined parent and child CPU time in seconds | |
parent_cpu_time | Real | Parent CPU time in seconds (when Dakota is built with UTILIB) | |
child_cpu_time | Real | Child CPU time in seconds (when Dakota is built with UTILIB) | |
total_wallclock_time | Real | Total wallclock time in seconds (when Dakota is built with UTILIB) | |
mpi_init_wallclock_time | Real | Wallclock time to MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel) | |
run_wallclock_time | Real | Wallclock time since MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel) | |
mpi_wallclock_time | Real | Wallclock time since MPI_Init in seconds (when Dakota is not built with UTILIB and run in parallel) |
Variables in most Dakota output (e.g. tabular data files) and input (e.g. imported data to construct surrogates) are listed in "input spec" order. (The variables keyword section is arranged by input spec order.) In this ordering, they are sorted first by function:
And within each of these categories, they are sorted by domain:
A shortcoming of HDF5 is that datasets are homogeneous; for example, string- and real-valued data cannot readily be stored in the same dataset. As a result, Dakota has chosen to flip "input spec" order for HDF5 and sort first by domain, then by function when storing variable information. When applicable, there may be as many as four datasets to store variable information: one to store continuous variables, another to store discrete integer variables, and so on. Within each of these, variables will be ordered by function.
sampling produces moments (e.g. mean, standard deviation or variance) of all responses, as well as 95% lower and upper confidence intervals for the 1st and 2nd moments. These are stored as described below. When sampling is used in incremental mode by specifying refinement_samples, all results, including the moments
group, are placed within groups named increment:<N>
, where <N> indicates the increment number beginning with 1.
Moments | ||||||
---|---|---|---|---|---|---|
Description | 1st through 4th moments for each response | |||||
Location | [increment:<N>]/moments/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: length of 4 | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | moments | mean, std_deviation, skewness, kurtosis | Only for standard moments | ||
0 | String | moments | mean, variance, third_central, fourth_central | Only for central moments |
Moment Confidence Intervals | ||||||
---|---|---|---|---|---|---|
Description | Lower and upper 95% confidence intervals on the 1st and 2nd moments | |||||
Location | moment_confidence_intervals/<response descriptor> | |||||
Shape | 2-dimensional: 2x2 | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | bounds | lower, upper | |||
1 | String | moments | mean, std_deviation | Only for standard moments | ||
1 | String | moments | mean, variance | Only for central moments |
A few different methods produce information about the correlations between pairs of variables and responses (collectively: factors). The four tables in this section describe how correlation information is stored. One important note is that HDF5 has no special, native type for symmetric matrices, and so the simple correlations and simple rank correlations are stored in dense 2D datasets.
Simple Correlations | ||||||
---|---|---|---|---|---|---|
Description | Simple correlation matrix | |||||
Location | [increment:<N>]/simple_correlations | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 2-dimensional: number of factors by number of factors | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0, 1 | String | factors | Variable and response descriptors | The scales for both dimensions are identical |
Simple Rank Correlations | ||||||
---|---|---|---|---|---|---|
Description | Simple rank correlation matrix | |||||
Location | [increment:<N>]/simple_rank_correlations | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 2-dimensional: number of factors by number of factors | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0, 1 | String | factors | Variable and response descriptors | The scales for both dimensions are identical |
Partial Correlations | ||||||
---|---|---|---|---|---|---|
Description | Partial correlations | |||||
Location | [increment:<N>]/partial_correlations/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of variables | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable descriptors |
Partial Rank Correlations | ||||||
---|---|---|---|---|---|---|
Description | Partial Rank correlations | |||||
Location | [increment:<N>]/partial_rank_correlations/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of variables | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable descriptors |
Some aleatory UQ methods estimate the probability density of resposnes.
Probability Density | ||||||
---|---|---|---|---|---|---|
Description | Probability density of a response | |||||
Location | [increment:<N>]/probability_density/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of bins in probability density | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Real | lower_bounds | Lower bin edges | |||
0 | Real | upper_bounds | Upper bin edges |
Aleatory UQ methods can calculate level mappings (from user-specified probability, reliability, or generalized reliability to response, or vice versa).
Probability Levels | ||||||
---|---|---|---|---|---|---|
Description | Response levels corresponding to user-specified probability levels | |||||
Location | [increment:<N>]/probability_levels/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of requested levels for the response | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Real | probability_levels | User-specified probability levels |
Reliability Levels | ||||||
---|---|---|---|---|---|---|
Description | Response levels corresponding to user-specified reliability levels | |||||
Location | [increment:<N>]/reliability_levels/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of requested levels for the response | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Real | reliability_levels | User-specified reliability levels |
Generalized Reliability Levels | ||||||
---|---|---|---|---|---|---|
Description | Response levels corresponding to user-specified generalized reliability levels | |||||
Location | [increment:<N>]/gen_reliability_levels/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of requested levels for the response | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Real | gen_reliability_levels | User-specified generalized reliability levels |
Response Levels | ||||||
---|---|---|---|---|---|---|
Description | Probability, reliability, or generalized reliability levels corresponding to user-specified response levels | |||||
Location | [increment:<N>]/response_levels/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 1-dimensional: number of requested levels for the response | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Real | response_levels | User-specified response levels |
Dakota's sampling method can produce main and total effects; stochastic expansions (polynomial_chaos, stoch_collocation) additionally can produce interaction effects.
Main Effects | ||||||
---|---|---|---|---|---|---|
Description | First-order Sobol' indices | |||||
Location | main_effects/<response descriptor> | |||||
Shape | 1-dimensional: number of variables | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable descriptors |
Total Effects | ||||||
---|---|---|---|---|---|---|
Description | Total-effect Sobol' indices | |||||
Location | total_effects/<response descriptor> | |||||
Shape | 1-dimensional: number of variables | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable descriptors |
Interaction Effects | ||||||
---|---|---|---|---|---|---|
Description | Sobol' indices for interactions | |||||
Location | order_<N>_interactions/<response descriptor> | |||||
Shape | 1-dimensional: number of Nth order interactions | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Descriptors of the variables in the interaction | Scales for interaction effects are 2D datasets with the dimensions (number of interactions, N) |
Stochastic expansion methods can obtain moments two ways.
Integration Moments | ||||||
---|---|---|---|---|---|---|
Description | Moments obtained via integration | |||||
Location | integration_moments/<response descriptor> | |||||
Shape | 4 | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | moments | mean, std_deviation, skewness, kurtosis | Only for standard moments | ||
0 | String | moments | mean, variance, third_central, fourth_central | Only for central moments |
Expansion Moments | ||||||
---|---|---|---|---|---|---|
Description | Moments obtained via expansion | |||||
Location | expansion_moments/<response descriptor> | |||||
Shape | 4 | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | moments | mean, std_deviation, skewness, kurtosis | Only for standard moments | ||
0 | String | moments | mean, variance, third_central, fourth_central | Only for central moments |
sampling with epistemic variables produces extreme values (minimum and maximum) for each response.
Extreme Responses | ||||||
---|---|---|---|---|---|---|
Description | The sample minimum and maximum of each response | |||||
Location | [increment:<N>]/extreme_responses/<response descriptor> | |||||
Notes | The [increment:<N>] group is present only for sampling with refinement | |||||
Shape | 2 | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | extremes | minimum, maximum |
All parameter studies (vector_parameter_study, list_parameter_study, multidim_parameter_study, centered_parameter_study) record tables of evaluations (parameter-response pairs), similar to Dakota's tabular output file. Centered parameter studies additionally store evaluations in an order that is more natural to intepret, which is described below.
In the tabular-like listing, variables are stored according to the scheme described in a previous section.
Parameter Sets | ||||||
---|---|---|---|---|---|---|
Description | Parameter study evaluations in a tabular-like listing | |||||
Location | parameter_sets/{continuous_variables, discrete_integer_variables, discrete_string_variables, discrete_real_variables, responses} | |||||
Shape | 2-dimensional: number of evaluations by number of variables or responses | |||||
Type | Real, String, or Integer, as applicable | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
1 | String | variables or responses | Variable or response descriptors |
Centered paramter studies store "slices" of the tabular data that make evaluating the effects of each variable on each response more convenient. The steps for each individual variable, including the initial or center point, and corresponding responses are stored in separate groups.
Variable Slices | ||||||
---|---|---|---|---|---|---|
Description | Steps, including center/initial point, for a single variable | |||||
Location | variable_slices/<variable descriptor>/steps | |||||
Shape | 1-dimensional: number of user-specified steps for this variable | |||||
Type | Real, String, or Integer, as applicable |
Variable Slices - Responses | ||||||
---|---|---|---|---|---|---|
Description | Responses for variable slices | |||||
Location | variable_slices/<variable descriptor>/responses | |||||
Shape | 2-dimensional: number of evaluations by number of responses | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
1 | String | responses | Response descriptors |
Dakota's optimization and calibration methods report the parameters at the best point (or points, for multiple final solutions) discovered. These are stored using the scheme decribed in the variables section. When more than one solution is reported, the best parameters are nested in groups named set:<N>
, where <N> is a integer numbering the set and beginning with 1.
State (and other inactive variables) are reported when using objective functions and for some calibration studies. However, when using configuration variables in a calibration, state variables are suppressed.
Best Parameters | ||||||
---|---|---|---|---|---|---|
Description | Best parameters discovered by optimization or calibration | |||||
Location | [set:<N>]/best_parameters/{continuous, discrete_integer, discrete_string, discrete_real} | |||||
Notes | The [set:<N>] group is present only when multiple final solutions are reported. | |||||
Shape | 2-dimensional: number of evaluations by number of variables | |||||
Type | Real, String, or Integer, as applicable | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable descriptors |
Dakota's optimization methods report the objective functions at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best objective functions are nested in groups named set:<N>
, where <N> is a integer numbering the set and beginning with 1.
Best Objective Functions | ||||||
---|---|---|---|---|---|---|
Description | Best objective functions discovered by optimization | |||||
Location | [set:<N>]/best_objective_functions | |||||
Notes | The [set:<N>] group is present only when multiple final solutions are reported. | |||||
Shape | 1-dimensional: number of objective functions | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | responses | Response descriptors |
Dakota's optimization and calibration methods report the nonlinear constraints at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best constraints are nested in groups named set:<N>
, where N
is a integer numbering the set and beginning with 1.
Best Nonlinear Constraints | ||||||
---|---|---|---|---|---|---|
Description | Best nonlinear constraints discovered by optimization or calibration | |||||
Location | [set:<N>]/best_constraints | |||||
Notes | The [set:<N>] group is present only when multiple final solutions are reported. | |||||
Shape | 1-dimensional: number of nonlinear constraints | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | responses | Response descriptors |
When using calibration terms with an optimization method, or when using a nonlinear least squares method such as nl2sol, Dakota reports residuals and residual norms for the best point (or points, for multiple final solutions) discovered.
Best Residuals | ||||||
---|---|---|---|---|---|---|
Description | Best residuals discovered | |||||
Location | best_residuals | |||||
Shape | 1-dimensional: number of residuals | |||||
Type | Real |
Best Residual Norm | ||||||
---|---|---|---|---|---|---|
Description | Norm of best residuals discovered | |||||
Location | best_norm | |||||
Shape | Scalar | |||||
Type | Real |
Least squares methods (nl2sol, nlssol_sqp, optpp_g_newton) compute confidence intervals on the calibration parameters.
Parameter Confidence Intervals | ||||||
---|---|---|---|---|---|---|
Description | Lower and upper confidence intervals on calibrated parameters | |||||
Location | confidence_intervals | |||||
Notes | The confidence intervals are not stored when there is more than one experiment. | |||||
Shape | 2-dimensional: 2x2 | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable desriptors | |||
1 | String | bounds | lower, upper |
When performing calibration with experimental data (but no configruation variables), Dakota records, in addition to the best residuals, the best original model resposnes.
Best Model Responses | ||||||
---|---|---|---|---|---|---|
Description | Original model responses for the best residuals discovered | |||||
Location | best_model_responses | |||||
Shape | 1-dimensional: number of model responses | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | responses | Response descriptors |
When performing calibration with experimental data that includes configuration variables, Dakota reports the best model responses for each experiment. These results include the configuration variables, stored in the scheme described in the variables section, and the model responses.
Best Configuration Variables for Experiment | ||||||
---|---|---|---|---|---|---|
Description | Configuration variables associated with experiment N | |||||
Location | best_model_responses/experiment:<N>/{continuous_config_variables, discrete_integer_config_variables, discrete_string_config_variables, discrete_real_config_variables} | |||||
Shape | 1-dimensional: number of variables | |||||
Type | Real, String, or Integer, as applicable | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | variables | Variable descriptors |
Best Model Responses for Experiment | ||||||
---|---|---|---|---|---|---|
Description | Original model responses for the best residuals discovered | |||||
Location | best_model_responses/experiment:<N>/responses | |||||
Shape | 1-dimensional: number of model responses | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | String | responses | Response descriptors |
An evaluation is a mapping from variables to responses performed by a Dakota model or interface. Beginning with release 6.10, Dakota has the ability to report evaluation history in HDF5 format. The HDF5 format offers many advantages over existing console output and tabular output. Requring no "scraping", it is more convenient for most users than the former, and being unrestricted to a two-dimensional, tabular arragnment of information, it is far richer than the latter.
This section begins by describing the Dakota components that can generate evaluation data. It then documents the high-level organization of the data from those components. Detailed documentation of the individual datasets (the "low-level" organization) where data are stored follows. Finally, information is provided concerning input keywords that control which components report evaluations.
Evaluation data are produced by only two kinds of components in Dakota: models and interfaces. The purpose of this subsection is to provide a basic description of models and interfaces for the purpose of equipping users to manage and understand HDF5-format evaluation data.
Because interfaces and models must be specified in even simple Dakota studies, most novice users of Dakota will have some familiarity with these concepts. However, the exact nature of the relationship between methods, models, and interfaces may be unclear. Moreover, the models and interfaces present in a Dakota study are not always limited to those specified by the user. Some input keywords or combinations of components cause Dakota to create new models or interfaces "behind the scenes" and without the user's direct knowledge. Not only can user-specified models and interfaces write evaluation data to HDF5, but also these auto-generated components. Accordingly, it may be helpful for consumers of Dakota's evaluation data to have a basic understanding of how Dakota creates and employs models and interfaces.
Consider first the input file shown here.
environment tabular_data results_output hdf5 method id_method 'sampling' sampling samples 20 model_pointer 'sim' model id_model 'sim' single interface_pointer 'tb' variables uniform_uncertain 2 descriptors 'x1' 'x2' lower_bounds 0.0 0.0 upper_bounds 1.0 1.0 responses response_functions 1 descriptors 'f' no_gradients no_hessians interface id_interface 'tb' fork analysis_drivers 'text_book'
This simple input file specifies a single method of type sampling, which also has the Id 'sampling'. The 'sampling' method possesses a model of type single (alias simulation) named 'sim', which it uses to perform evaluations. (Dakota would have automatically generated a single model had one not been specified.) That is to say, for each variables-to-response mapping required by the method, it provides variables to the model and receives back responses from it.
Single/simulation models like 'sim' perform evaluations by means of an interface, typically an interface to an external simulation. In this case, the interface is 'tb'. The model passes the variables to 'tb', which executes the text_book
driver, and receives back responses.
It is clear that two components produce evaluation data in this study. The first is the single model 'sim', which receives and fulfills evaluation requests from the method 'sampling', and the second is the interface 'tb', which similarly receives requests from 'sim' and fulfills them by running the text_book
driver.
Because tabular data was requested in the environment block, a record of the model's evaluations will be reported to a tabular file. The interface's evaluations could be dumped from the restart file using dakota_restart_util
.
If we compared these evaluation histories from 'sim' and 'tb', we would see that they are identical to one another. The model 'sim' is a mere "middle man" whose only responsibility is passing variables from the method down to the interface, executing the interface, and passing responses back up to the method. However, this is not always the case.
For example, if this study were converted to a gradient-based optimzation using optpp_q_newton, and the user specified numerical_gradients :
# model and interface same as above. Replace the method, variables, and responses with: method id_method 'opt' optpp_q_newton variables continuous_design 2 descriptors 'x1' 'x2' lower_bounds 0.0 0.0 upper_bounds 1.0 1.0 responses objective_functions 1 descriptors 'f' numerical_gradients no_hessians
Then the model would have the responsibility of performing finite differencing to estimate gradients of the response 'f' requested by the method. Multiple function evaluations of 'tb' would map to a single gradient evaluation at the model level, and the evaluation histories of 'sim' and 'tb' would contain different information.
Note that because it is unwieldy to report gradients (or Hessians) in a tabular format, they are not written to the tabular file, and historically were avialable only in the console output. The HDF5 format provides convenient access to both the "raw" evaluations performed by the interface and higher level model evaluations that include estimated gradients.
This pair of examples hopefully provides a basic understanding of the flow of evaluation data between a method, model, and interface, and explains why models and interfaces are producers of evaluation data.
Next consider a somewhat more complex study that includes a Dakota model of type surrogate. A surrogate model performs evaluations requested by a method by executing a special kind of interface called an approximation interface, which Dakota implicitly creates without the direct knowledge of the user. Approximation interfaces are a generic container for the various kinds of surrogates Dakota can use, such as gaussian processes.
A Dakota model of type global surrogate may use a user-specified dace method to construct the actual underlying model(s) that it evaluates via its approximation interface. The dace method will have its own model (typically of type single/simulation), which will have a user-specified interface.
In this more complicated case there are at least four components that produce evaluation data: (1) the surrogate model and (2) its approximation interface, and (3) the dace method's model and (4) its interface. Although only components (1), (3), and (4) are user-specified, evaluation data produced by (2) may be written to HDF5, as well. (As explained below, only evaluations performed by the surrogate model and the dace interface will be recorded by default. This can be overriden using hdf5 sub-keywords.) This is an example where "extra" and potentially confusing data appears in Dakota's output due to an auto-generated component.
An important family of implicitly-created models is the recast models, which have the responsibility of transforming variables and responses. One type of recast called a data transform model is responsible for computing residuals when a user provides experimental data in a calibration study. Scaling recast models are employed when scaling is requested by the user for variables and/or responses.
Recast models work on the principle of function composition, and "wrap" a submodel, which may itself also be a recast model. The innermost model in the recursion often will be the simulation or surrogate model specified by the user in the input file. Dakota is capable of recording evaluation data at each level of recast.
This subsection describes how evaluation data produced by models and interfaces are organized at high level. A detailed description of the datasets and subgroups that contain evaluation data for a specific model or interface is given in the next subsection.
Two top level groups contain evaluation data, /interfaces
and /models
.
Because interfaces can be executed by more than one model, interface evaluations are more precisely thought of as evaluations of an interface/model combination. Consequently, interface evaluations are grouped not only by interface Id ('tb' in the example above), but also the Id of the model that requested them ('sim').
/interfaces/<interface Id>/<model Id>/
If the user does not provide an Id for an interface that he specifies, Dakota assigns it the Id NO_ID
. Approximation interfaces receive the Id APPROX_INTERFACE_<N>
, where N is an incrementing integer beginning at 1. Other kinds of automatically generated interfaces are named NOSPEC_INTERFACE_ID_<N>
.
The top-level group for model evaluations is /models
. Within this group, model evaluations are grouped by type: simulation
, surrogate
, nested
, or recast
, and then by model Id. That is:
/models/<type>/<model Id>/
Similar to interfaces, user-specified models that lack an Id are given one by Dakota. A single model is named NO_MODEL_ID
. Some automatically generated models receive the name NOSPEC_MODEL_ID
.
Recast models are a special case and receive the name RECAST_<WRAPPED-MODEL>_<TYPE>_<N>
. In this string:
WRAPPED-MODEL
is the Id of the innermost wrapped model, typically a user-specified modelTYPE
is the specific kind of recast. The three most common recasts are:RECAST:
several generic responsibilities, including summing objective functions to present to a single-objective optimizerDATA_TRANSFORM:
Compute residuals in a calibrationSCALING:
scale variables and responsesN
is an incrementing integer that begins with 1. It is employed to distinguish recasts of the same type that wrap the same underlying model.The model's evaluations may be the result of combining information from multiple sources. A simulation/single model will receive all the information it requires from its interface, but more complicated model types may use information not only from interfaces, but also other models and the results of method executions. Nested models, for instance, receive information from a submethod (the mean of a response from a sampling study, for instance) and potentially also an optional interface.
The sources of a model's evaluations may be roughly identified by examining the contents of that models' sources
group. The sources
group contains softlinks (note: softlinks are an HDF5 feature analogous to soft or symbolic links on many file systems) to groups for the interfaces, models, or methods that the model used to produce its evaluation data. (At this time, Dakota does not report the specific interface or model evaluations or method executions that were used to produce a specific model evaluation, but this is a planned feature.)
Method results likewise have a sources
group that identifies the models or methods employed by that method. By following the softlinks contained in a method's or model's sources
group, it is possible to "drill down" from a method to its ultimate sources of information. In the sampling example above, interface evaluations performed via the 'sim' model at the request of the 'sampling' method could be obtained at the HDF5 path: /methods/sampling/sources/sim/sources/tb/
The evaluation data for each interface and model are stored using the same schema in a collection of groups and datasets that reside within that interface or model's high-level location in the HDF5 file. This section describes that "low-level" schema.
Data are divided first of all into variables
, responses
, and metadata
groups.
The variables
group contains datasets that store the variables information for each evaluation. Four datasets may be present, one for each "domain": continuous
, discrete_integer
, discrete_string
, and discrete_real
. These datasets are two-dimensional, with a row (0th dimension) for each evaluation and a column (1st dimension) for each variable. The 0th dimension has one dimension scale for the integer-valued evaluation Id. The 1st dimension has two scales. The 0th scale contains descriptors of the variables, and the 1st contains their variable Ids. In this context, the Ids are a 1-to-N ranking of the variables in Dakota "input spec" order.
Variables | ||||||
---|---|---|---|---|---|---|
Description | Best parameters discovered by optimization or calibration | |||||
Location | variables/{continuous, discrete_integer, discrete_string, discrete_real} | |||||
Shape | 2-dimensional: number of evaluations by number of variables | |||||
Type | Real, String, or Integer, as applicable | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Integer | evaluation_ids | Evaluation Ids | |||
1 | String | variables | Variable descriptors | |||
1 | Integer | variables | Variable Ids | 1-to-N rank of the variable in Dakota input spec order |
The responses group contains datasets for functions and, when available, gradients and Hessians.
Functions: The functions
dataset is two-dimensional and contains function values for all responses. Like the variables datasets, evaluations are stored along the 0th dimension, and responses are stored along the 1st. The evaluation Ids and response descriptors are attached as scales to these axes, respectively.
Variables | ||||||
---|---|---|---|---|---|---|
Description | Values of functions in evaluations | |||||
Location | responses/functions | |||||
Shape | 2-dimensional: number of evaluations by number of responses | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Integer | evaluation_ids | Evaluation Ids | |||
1 | String | responses | Response descriptors |
Gradients: The gradients
dataset is three-dimensional. It has the shape . Dakota supports a specification of mixed_gradients, and the
gradients
dataset is sized and organized such that only those responses for which gradients are available are stored. When mixed_gradients
are employed, a response will not necessarily have the same index in the functions
and gradients
datasets.
Because it is possible that the gradient could be computed with respect to any of the continuous variables, active or inactive, that belong to the associated model, the gradients
dataset is sized to accomodate gradients taken with respect to all continuous variables. Components that were not included in a particular evaluation will be set to NaN (not a number), and the derivative_variables_vector
(in the matadata group) for that evaluation can be examined as well.
Gradients | ||||||
---|---|---|---|---|---|---|
Description | Values of gradients in evaluations | |||||
Location | responses/gradients | |||||
Shape | 3-dimensional: number of evaluations by number of responses by number of variables | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Integer | evaluation_ids | Evaluation Ids | |||
1 | String | responses | Response descriptors |
Hessians: Hessians are stored in a four-dimensional dataset, . The
hessians
dataset shares many of the characteristics with the gradients:
in the mixed_hessians case, it will be smaller in the response dimension than the functions
dataset, and unrequested components are set to NaN.
Hessians | ||||||
---|---|---|---|---|---|---|
Description | Values of Hessians in evaluations | |||||
Location | responses/hessians | |||||
Shape | 4-dimensional: number of evaluations by number of responses by number of variables by number of variables | |||||
Type | Real | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Integer | evaluation_ids | Evaluation Ids | |||
1 | String | responses | Response descriptors |
The metadata
group contains up to three datasets.
Active Set Vector: The first is the active_set_vector
. It is two dimensional, with rows corresponding to evaluations and columns corresponding to responses. Each element contains an integer in the range 0-7, which indicates the request (function, gradient, Hessian) for the corresponding response for that evaluation. The 0th dimension has the evaluations Ids scale, and the 1st dimension has two scales: the response descriptors and the "default" or "maximal" ASV, an integer 0-7 for each response that indicates the information (function, gradient, Hessian) that possibly could have been requested during the study.
Active Set Vector | ||||||
---|---|---|---|---|---|---|
Description | Values of the active set vector in evaluations | |||||
Location | metadata/active_set_vector | |||||
Shape | 2-dimensional: number of evaluations by number of responses | |||||
Type | Integer | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Integer | evaluation_ids | Evaluation Ids | |||
1 | String | responses | Response descriptors |
Derivative Variables Vector: The second dataset in the metadata group is the derivative_variables_vector
dataset. It is included only when gradients or Hessians are available. Like the ASV, it is two-dimensional. Each column of the DVV dataset corresponds to a continuous variable and contains a 0 or 1, indicating whether gradients and Hessians were computed with respect to that variaable for the evaluation. The 0th dimension has the evaluation Ids as a scale, and the 1st dimension has two scales. The 0th is the descriptors of the continuous variables. The 1st contains the variable Ids of the continuous variables.
Derivative Variables Vector | ||||||
---|---|---|---|---|---|---|
Description | Values of the derivative variables vector in evaluations | |||||
Location | metadata/derivative_variables_vector | |||||
Shape | 2-dimensional: number of evaluations by number of continuous variables | |||||
Type | Integer | |||||
Scales | Dimension | Type | Label | Contents | Notes | |
0 | Integer | evaluation_ids | Evaluation Ids | |||
1 | String | variables | Variable descriptors |
Analysis Components: The final dataset in the metadata group is the analysis_components
dataset. It is a 1D dataset that is present only when the user specified analysis components, and it contains those components as strings.
Analysis Components | ||||||
---|---|---|---|---|---|---|
Description | Values of the analysis components in evaluations | |||||
Location | metadata/analysis_components | |||||
Shape | 1-dimensional: number of analysis components | |||||
Type | String |
When HDF5 output is enabled (by including the hdf5 keyword), then by default evaluation data for the following components will be stored:
method-hybrid
, no model evaluation data will be stored.)The user can override these defaults using the keywords model_selection and interface_selection.
The choices for model_selection
are:
The choices for interface_selection
are:
If a model or interface is excluded from storage by these selections, then they cannot appear in the sources
group for methods or models.