Dakota Reference Manual
Version 6.12
Explore and Predict with Confidence

Beginning with release 6.9, Dakota gained the ability to write many method results such as the correlation matrices computed by sampling studies and the best parameters discovered by optimization methods to disk in HDF5. In Dakota 6.10 and above, evaluation data (variables and responses for each model or interface evaluation) may also be written. Many users may find this newly supported format more convenient than scraping or copying and pasting from Dakota's console output.
To enable HDF5 output, the results_output keyword with the hdf5 option must be added to the Dakota input file. In additon, Dakota must have been built with HDF5 support. Beginning with Dakota 6.10, HDF5 is enabled in our publicly available downloads. HDF5 support is considered a somewhat experimental feature. The results of some Dakota methods are not yet written to HDF5, and in a few, limited situations, enabling HDF5 will cause Dakota to crash.
HDF5 is a format that is widely used in scientific software for efficiently storing and organizing data. The HDF5 standard and libraries are maintained by the HDF Group.
In HDF5, data are stored in multidimensional arrays called datasets. Datasets are organized hierarchically in groups, which also can contain other groups. Datasets and groups are conceptually similar to files and directories in a filesystem. In fact, every HDF5 file contains at least one group, the root group, denoted "/", and groups and datasets are referred to using slashdelimited absolute or relative paths, which are more accurately called link names.
HDF5 has as one goal that data be "selfdocumenting" through the use of metadata. Dakota output files include two kinds of metadata.
dakota_version
, and (in Dakota output) the value can be a string, integer, or realvalued scalar. Dakota stores the number of samples that were requested in a sampling
study in the attribute 'samples'.Many popular programming languages have support, either natively or from a thirdparty library, for reading and writing HDF5 files. The HDF Group itself supports C/C++ and Java libraries. The Dakota Project suggests the h5py
module for Python. Examples that demonstrate using h5py
to access and use Dakota HDF5 output may be found in the Dakota installation at share/dakota/examples/hdf5
.
Currently, complete or nearly complete coverage of results from sampling, optimization and calibration methods, parameter studies, and stochastic expansions exists. Coverage will continue to expand in future releases to include not only the results of all methods, but other potentially useful information such as interface evaluations and model tranformations.
Methods in Dakota have a character string Id and are executed by Dakota one or more times. (Methods are executed more than once in studies that include a nested model, for example.) The Id may be provided by the user in the input file using the id_method keyword, or it may be automatically generated by Dakota. Dakota uses the label NO_METHOD_ID
for methods that are specified in the input file without an id_method
, and NOSPEC_METHOD_ID_<N>
for methods that it generates for its own internal use. The <N> in the latter case is an incrementing integer that begins at 1.
The results for the <N>th execution of a method that has the label <method
Id> are stored in the group
/methods/<method Id>/results/execution:<N>/
The /methods
group is always present in Dakota HDF5 files, provided at least one method added results to the output. (In a future Dakota release, the top level groups /interfaces
and /models
will be added.) The group execution:1
also is always present, even if there is only a single execution.
The groups and datasets for each type of result that Dakota is currently capable of storing are described in the following sections. Every dataset is documented in its own table. These tables include:
/methods/<method Id>/execution:<N>
. This path may include both literal text that is always present and replacement text. Replacement text is <enclosed in angle brackets and italicized>. Two examples of replacement text are <response descriptor> and <variable descriptor>, which indicate that the name of a Dakota response or variable makes up a portion of the path.The Expected Output section of each method's keyword documentation indicates the kinds of output, if any, that method currently can write to HDF5. These are typically in the form of bulleted lists with clariying notes that refer back to the sections that follow.
Several pieces of information about the Dakota study are stored as attributes of the toplevel HDF5 root group ("/"). These include:
Study Attributes  

Label  Type  Description  
dakota_version  String  Version of Dakota used to run the study  
dakota_revision  String  Dakota version control information  
output_version  String  Version of the output file  
input  String  Dakota input file  
top_method  String  Id of the toplevel method  
total_cpu_time  Real  Combined parent and child CPU time in seconds  
parent_cpu_time  Real  Parent CPU time in seconds (when Dakota is built with UTILIB)  
child_cpu_time  Real  Child CPU time in seconds (when Dakota is built with UTILIB)  
total_wallclock_time  Real  Total wallclock time in seconds (when Dakota is built with UTILIB)  
mpi_init_wallclock_time  Real  Wallclock time to MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)  
run_wallclock_time  Real  Wallclock time since MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)  
mpi_wallclock_time  Real  Wallclock time since MPI_Init in seconds (when Dakota is not built with UTILIB and run in parallel) 
Variables in most Dakota output (e.g. tabular data files) and input (e.g. imported data to construct surrogates) are listed in "input spec" order. (The variables keyword section is arranged by input spec order.) In this ordering, they are sorted first by function:
And within each of these categories, they are sorted by domain:
A shortcoming of HDF5 is that datasets are homogeneous; for example, string and realvalued data cannot readily be stored in the same dataset. As a result, Dakota has chosen to flip "input spec" order for HDF5 and sort first by domain, then by function when storing variable information. When applicable, there may be as many as four datasets to store variable information: one to store continuous variables, another to store discrete integer variables, and so on. Within each of these, variables will be ordered by function.
sampling produces moments (e.g. mean, standard deviation or variance) of all responses, as well as 95% lower and upper confidence intervals for the 1st and 2nd moments. These are stored as described below. When sampling is used in incremental mode by specifying refinement_samples, all results, including the moments
group, are placed within groups named increment:<N>
, where <N> indicates the increment number beginning with 1.
Moments  

Description  1st through 4th moments for each response  
Location  [increment:<N>]/moments/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: length of 4  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  moments  mean, std_deviation, skewness, kurtosis  Only for standard moments  
0  String  moments  mean, variance, third_central, fourth_central  Only for central moments 
Moment Confidence Intervals  

Description  Lower and upper 95% confidence intervals on the 1st and 2nd moments  
Location  moment_confidence_intervals/<response descriptor>  
Shape  2dimensional: 2x2  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  bounds  lower, upper  
1  String  moments  mean, std_deviation  Only for standard moments  
1  String  moments  mean, variance  Only for central moments 
A few different methods produce information about the correlations between pairs of variables and responses (collectively: factors). The four tables in this section describe how correlation information is stored. One important note is that HDF5 has no special, native type for symmetric matrices, and so the simple correlations and simple rank correlations are stored in dense 2D datasets.
Simple Correlations  

Description  Simple correlation matrix  
Location  [increment:<N>]/simple_correlations  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  2dimensional: number of factors by number of factors  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0, 1  String  factors  Variable and response descriptors  The scales for both dimensions are identical 
Simple Rank Correlations  

Description  Simple rank correlation matrix  
Location  [increment:<N>]/simple_rank_correlations  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  2dimensional: number of factors by number of factors  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0, 1  String  factors  Variable and response descriptors  The scales for both dimensions are identical 
Partial Correlations  

Description  Partial correlations  
Location  [increment:<N>]/partial_correlations/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable descriptors 
Partial Rank Correlations  

Description  Partial Rank correlations  
Location  [increment:<N>]/partial_rank_correlations/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable descriptors 
Some aleatory UQ methods estimate the probability density of resposnes.
Probability Density  

Description  Probability density of a response  
Location  [increment:<N>]/probability_density/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of bins in probability density  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Real  lower_bounds  Lower bin edges  
0  Real  upper_bounds  Upper bin edges 
Aleatory UQ methods can calculate level mappings (from userspecified probability, reliability, or generalized reliability to response, or vice versa).
Probability Levels  

Description  Response levels corresponding to userspecified probability levels  
Location  [increment:<N>]/probability_levels/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of requested levels for the response  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Real  probability_levels  Userspecified probability levels 
Reliability Levels  

Description  Response levels corresponding to userspecified reliability levels  
Location  [increment:<N>]/reliability_levels/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of requested levels for the response  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Real  reliability_levels  Userspecified reliability levels 
Generalized Reliability Levels  

Description  Response levels corresponding to userspecified generalized reliability levels  
Location  [increment:<N>]/gen_reliability_levels/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of requested levels for the response  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Real  gen_reliability_levels  Userspecified generalized reliability levels 
Response Levels  

Description  Probability, reliability, or generalized reliability levels corresponding to userspecified response levels  
Location  [increment:<N>]/response_levels/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  1dimensional: number of requested levels for the response  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Real  response_levels  Userspecified response levels 
Dakota's sampling method can produce main and total effects; stochastic expansions (polynomial_chaos, stoch_collocation) additionally can produce interaction effects.
Main Effects  

Description  Firstorder Sobol' indices  
Location  main_effects/<response descriptor>  
Shape  1dimensional: number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable descriptors 
Total Effects  

Description  Totaleffect Sobol' indices  
Location  total_effects/<response descriptor>  
Shape  1dimensional: number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable descriptors 
Interaction Effects  

Description  Sobol' indices for interactions  
Location  order_<N>_interactions/<response descriptor>  
Shape  1dimensional: number of Nth order interactions  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Descriptors of the variables in the interaction  Scales for interaction effects are 2D datasets with the dimensions (number of interactions, N) 
Stochastic expansion methods can obtain moments two ways.
Integration Moments  

Description  Moments obtained via integration  
Location  integration_moments/<response descriptor>  
Shape  4  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  moments  mean, std_deviation, skewness, kurtosis  Only for standard moments  
0  String  moments  mean, variance, third_central, fourth_central  Only for central moments 
Expansion Moments  

Description  Moments obtained via expansion  
Location  expansion_moments/<response descriptor>  
Shape  4  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  moments  mean, std_deviation, skewness, kurtosis  Only for standard moments  
0  String  moments  mean, variance, third_central, fourth_central  Only for central moments 
sampling with epistemic variables produces extreme values (minimum and maximum) for each response.
Extreme Responses  

Description  The sample minimum and maximum of each response  
Location  [increment:<N>]/extreme_responses/<response descriptor>  
Notes  The [increment:<N>] group is present only for sampling with refinement  
Shape  2  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  extremes  minimum, maximum 
All parameter studies (vector_parameter_study, list_parameter_study, multidim_parameter_study, centered_parameter_study) record tables of evaluations (parameterresponse pairs), similar to Dakota's tabular output file. Centered parameter studies additionally store evaluations in an order that is more natural to intepret, which is described below.
In the tabularlike listing, variables are stored according to the scheme described in a previous section.
Parameter Sets  

Description  Parameter study evaluations in a tabularlike listing  
Location  parameter_sets/{continuous_variables, discrete_integer_variables, discrete_string_variables, discrete_real_variables, responses}  
Shape  2dimensional: number of evaluations by number of variables or responses  
Type  Real, String, or Integer, as applicable  
Scales  Dimension  Type  Label  Contents  Notes  
1  String  variables or responses  Variable or response descriptors 
Centered paramter studies store "slices" of the tabular data that make evaluating the effects of each variable on each response more convenient. The steps for each individual variable, including the initial or center point, and corresponding responses are stored in separate groups.
Variable Slices  

Description  Steps, including center/initial point, for a single variable  
Location  variable_slices/<variable descriptor>/steps  
Shape  1dimensional: number of userspecified steps for this variable  
Type  Real, String, or Integer, as applicable 
Variable Slices  Responses  

Description  Responses for variable slices  
Location  variable_slices/<variable descriptor>/responses  
Shape  2dimensional: number of evaluations by number of responses  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
1  String  responses  Response descriptors 
Dakota's optimization and calibration methods report the parameters at the best point (or points, for multiple final solutions) discovered. These are stored using the scheme decribed in the variables section. When more than one solution is reported, the best parameters are nested in groups named set:<N>
, where <N> is a integer numbering the set and beginning with 1.
State (and other inactive variables) are reported when using objective functions and for some calibration studies. However, when using configuration variables in a calibration, state variables are suppressed.
Best Parameters  

Description  Best parameters discovered by optimization or calibration  
Location  [set:<N>]/best_parameters/{continuous, discrete_integer, discrete_string, discrete_real}  
Notes  The [set:<N>] group is present only when multiple final solutions are reported.  
Shape  1dimensional: number of variables  
Type  Real, String, or Integer, as applicable  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable descriptors 
Dakota's optimization methods report the objective functions at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best objective functions are nested in groups named set:<N>
, where <N> is a integer numbering the set and beginning with 1.
Best Objective Functions  

Description  Best objective functions discovered by optimization  
Location  [set:<N>]/best_objective_functions  
Notes  The [set:<N>] group is present only when multiple final solutions are reported.  
Shape  1dimensional: number of objective functions  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  responses  Response descriptors 
Dakota's optimization and calibration methods report the nonlinear constraints at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best constraints are nested in groups named set:<N>
, where N
is a integer numbering the set and beginning with 1.
Best Nonlinear Constraints  

Description  Best nonlinear constraints discovered by optimization or calibration  
Location  [set:<N>]/best_constraints  
Notes  The [set:<N>] group is present only when multiple final solutions are reported.  
Shape  1dimensional: number of nonlinear constraints  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  responses  Response descriptors 
When using calibration terms with an optimization method, or when using a nonlinear least squares method such as nl2sol, Dakota reports residuals and residual norms for the best point (or points, for multiple final solutions) discovered.
Best Residuals  

Description  Best residuals discovered  
Location  best_residuals  
Shape  1dimensional: number of residuals  
Type  Real 
Best Residual Norm  

Description  Norm of best residuals discovered  
Location  best_norm  
Shape  Scalar  
Type  Real 
Least squares methods (nl2sol, nlssol_sqp, optpp_g_newton) compute confidence intervals on the calibration parameters.
Parameter Confidence Intervals  

Description  Lower and upper confidence intervals on calibrated parameters  
Location  confidence_intervals  
Notes  The confidence intervals are not stored when there is more than one experiment.  
Shape  2dimensional: 2x2  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable desriptors  
1  String  bounds  lower, upper 
When performing calibration with experimental data (but no configruation variables), Dakota records, in addition to the best residuals, the best original model resposnes.
Best Model Responses  

Description  Original model responses for the best residuals discovered  
Location  best_model_responses  
Shape  1dimensional: number of model responses  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  responses  Response descriptors 
When performing calibration with experimental data that includes configuration variables, Dakota reports the best model responses for each experiment. These results include the configuration variables, stored in the scheme described in the variables section, and the model responses.
Best Configuration Variables for Experiment  

Description  Configuration variables associated with experiment N  
Location  best_model_responses/experiment:<N>/{continuous_config_variables, discrete_integer_config_variables, discrete_string_config_variables, discrete_real_config_variables}  
Shape  1dimensional: number of variables  
Type  Real, String, or Integer, as applicable  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  variables  Variable descriptors 
Best Model Responses for Experiment  

Description  Original model responses for the best residuals discovered  
Location  best_model_responses/experiment:<N>/responses  
Shape  1dimensional: number of model responses  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  String  responses  Response descriptors 
The multi_start and pareto_set methods are metaiterators that control multiple optimization subiterators. For both methods, Dakota
stores the results of the subiterators (best parameters and best results). For multi_start
, Dakota
additionally stores the initial points, and for pareto_set
, it stores the objective function weights.
Starting Points (multi_start)  

Description  Starting points for multi_start  
Location  starting_points/continuous  
Notes  Currently only continuous starting points are supported by multi_start  
Shape  2dimensional: number of sets by number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  set_id  set Ids  
1  String  variables  Variable descriptors 
Weights (pareto_set)  

Description  Response Weights for pareto_set  
Location  weights  
Shape  2dimensional: number of sets by number of responses  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  set_id  set Ids  
1  String  weights  w1, w2, ... wN 
Best Parameters (multi_start or pareto_set)  

Description  Best parameters discovered by multi_start or pareto_set  
Location  best_parameters/{continuous, discrete_integer, discrete_string, discrete_real}  
Shape  2dimensional: number of sets by number of variables  
Type  Real, String, or Integer, as applicable  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  set_id  set Ids  
1  String  variables  Variable descriptors 
Best responses  

Description  Best responses for multi_start and pareto_set  
Location  best_responses  
Shape  2dimensional: number of sets by number of responses  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  set_id  set Ids  
1  String  responses  Response descriptors 
An evaluation is a mapping from variables to responses performed by a Dakota model or interface. Beginning with release 6.10, Dakota has the ability to report evaluation history in HDF5 format. The HDF5 format offers many advantages over existing console output and tabular output. Requring no "scraping", it is more convenient for most users than the former, and being unrestricted to a twodimensional, tabular arragnment of information, it is far richer than the latter.
This section begins by describing the Dakota components that can generate evaluation data. It then documents the highlevel organization of the data from those components. Detailed documentation of the individual datasets (the "lowlevel" organization) where data are stored follows. Finally, information is provided concerning input keywords that control which components report evaluations.
Evaluation data are produced by only two kinds of components in Dakota: models and interfaces. The purpose of this subsection is to provide a basic description of models and interfaces for the purpose of equipping users to manage and understand HDF5format evaluation data.
Because interfaces and models must be specified in even simple Dakota studies, most novice users of Dakota will have some familiarity with these concepts. However, the exact nature of the relationship between methods, models, and interfaces may be unclear. Moreover, the models and interfaces present in a Dakota study are not always limited to those specified by the user. Some input keywords or combinations of components cause Dakota to create new models or interfaces "behind the scenes" and without the user's direct knowledge. Not only can userspecified models and interfaces write evaluation data to HDF5, but also these autogenerated components. Accordingly, it may be helpful for consumers of Dakota's evaluation data to have a basic understanding of how Dakota creates and employs models and interfaces.
Consider first the input file shown here.
environment tabular_data results_output hdf5 method id_method 'sampling' sampling samples 20 model_pointer 'sim' model id_model 'sim' single interface_pointer 'tb' variables uniform_uncertain 2 descriptors 'x1' 'x2' lower_bounds 0.0 0.0 upper_bounds 1.0 1.0 responses response_functions 1 descriptors 'f' no_gradients no_hessians interface id_interface 'tb' fork analysis_drivers 'text_book'
This simple input file specifies a single method of type sampling, which also has the Id 'sampling'. The 'sampling' method possesses a model of type single (alias simulation) named 'sim', which it uses to perform evaluations. (Dakota would have automatically generated a single model had one not been specified.) That is to say, for each variablestoresponse mapping required by the method, it provides variables to the model and receives back responses from it.
Single/simulation models like 'sim' perform evaluations by means of an interface, typically an interface to an external simulation. In this case, the interface is 'tb'. The model passes the variables to 'tb', which executes the text_book
driver, and receives back responses.
It is clear that two components produce evaluation data in this study. The first is the single model 'sim', which receives and fulfills evaluation requests from the method 'sampling', and the second is the interface 'tb', which similarly receives requests from 'sim' and fulfills them by running the text_book
driver.
Because tabular data was requested in the environment block, a record of the model's evaluations will be reported to a tabular file. The interface's evaluations could be dumped from the restart file using dakota_restart_util
.
If we compared these evaluation histories from 'sim' and 'tb', we would see that they are identical to one another. The model 'sim' is a mere "middle man" whose only responsibility is passing variables from the method down to the interface, executing the interface, and passing responses back up to the method. However, this is not always the case.
For example, if this study were converted to a gradientbased optimzation using optpp_q_newton, and the user specified numerical_gradients :
# model and interface same as above. Replace the method, variables, and responses with: method id_method 'opt' optpp_q_newton variables continuous_design 2 descriptors 'x1' 'x2' lower_bounds 0.0 0.0 upper_bounds 1.0 1.0 responses objective_functions 1 descriptors 'f' numerical_gradients no_hessians
Then the model would have the responsibility of performing finite differencing to estimate gradients of the response 'f' requested by the method. Multiple function evaluations of 'tb' would map to a single gradient evaluation at the model level, and the evaluation histories of 'sim' and 'tb' would contain different information.
Note that because it is unwieldy to report gradients (or Hessians) in a tabular format, they are not written to the tabular file, and historically were avialable only in the console output. The HDF5 format provides convenient access to both the "raw" evaluations performed by the interface and higher level model evaluations that include estimated gradients.
This pair of examples hopefully provides a basic understanding of the flow of evaluation data between a method, model, and interface, and explains why models and interfaces are producers of evaluation data.
Next consider a somewhat more complex study that includes a Dakota model of type surrogate. A surrogate model performs evaluations requested by a method by executing a special kind of interface called an approximation interface, which Dakota implicitly creates without the direct knowledge of the user. Approximation interfaces are a generic container for the various kinds of surrogates Dakota can use, such as gaussian processes.
A Dakota model of type global surrogate may use a userspecified dace method to construct the actual underlying model(s) that it evaluates via its approximation interface. The dace method will have its own model (typically of type single/simulation), which will have a userspecified interface.
In this more complicated case there are at least four components that produce evaluation data: (1) the surrogate model and (2) its approximation interface, and (3) the dace method's model and (4) its interface. Although only components (1), (3), and (4) are userspecified, evaluation data produced by (2) may be written to HDF5, as well. (As explained below, only evaluations performed by the surrogate model and the dace interface will be recorded by default. This can be overriden using hdf5 subkeywords.) This is an example where "extra" and potentially confusing data appears in Dakota's output due to an autogenerated component.
An important family of implicitlycreated models is the recast models, which have the responsibility of transforming variables and responses. One type of recast called a data transform model is responsible for computing residuals when a user provides experimental data in a calibration study. Scaling recast models are employed when scaling is requested by the user for variables and/or responses.
Recast models work on the principle of function composition, and "wrap" a submodel, which may itself also be a recast model. The innermost model in the recursion often will be the simulation or surrogate model specified by the user in the input file. Dakota is capable of recording evaluation data at each level of recast.
This subsection describes how evaluation data produced by models and interfaces are organized at high level. A detailed description of the datasets and subgroups that contain evaluation data for a specific model or interface is given in the next subsection.
Two top level groups contain evaluation data, /interfaces
and /models
.
Because interfaces can be executed by more than one model, interface evaluations are more precisely thought of as evaluations of an interface/model combination. Consequently, interface evaluations are grouped not only by interface Id ('tb' in the example above), but also the Id of the model that requested them ('sim').
/interfaces/<interface Id>/<model Id>/
If the user does not provide an Id for an interface that he specifies, Dakota assigns it the Id NO_ID
. Approximation interfaces receive the Id APPROX_INTERFACE_<N>
, where N is an incrementing integer beginning at 1. Other kinds of automatically generated interfaces are named NOSPEC_INTERFACE_ID_<N>
.
The toplevel group for model evaluations is /models
. Within this group, model evaluations are grouped by type: simulation
, surrogate
, nested
, or recast
, and then by model Id. That is:
/models/<type>/<model Id>/
Similar to interfaces, userspecified models that lack an Id are given one by Dakota. A single model is named NO_MODEL_ID
. Some automatically generated models receive the name NOSPEC_MODEL_ID
.
Recast models are a special case and receive the name RECAST_<WRAPPEDMODEL>_<TYPE>_<N>
. In this string:
WRAPPEDMODEL
is the Id of the innermost wrapped model, typically a userspecified modelTYPE
is the specific kind of recast. The three most common recasts are:RECAST:
several generic responsibilities, including summing objective functions to present to a singleobjective optimizerDATA_TRANSFORM:
Compute residuals in a calibrationSCALING:
scale variables and responsesN
is an incrementing integer that begins with 1. It is employed to distinguish recasts of the same type that wrap the same underlying model.The model's evaluations may be the result of combining information from multiple sources. A simulation/single model will receive all the information it requires from its interface, but more complicated model types may use information not only from interfaces, but also other models and the results of method executions. Nested models, for instance, receive information from a submethod (the mean of a response from a sampling study, for instance) and potentially also an optional interface.
The sources of a model's evaluations may be roughly identified by examining the contents of that models' sources
group. The sources
group contains softlinks (note: softlinks are an HDF5 feature analogous to soft or symbolic links on many file systems) to groups for the interfaces, models, or methods that the model used to produce its evaluation data. (At this time, Dakota does not report the specific interface or model evaluations or method executions that were used to produce a specific model evaluation, but this is a planned feature.)
Method results likewise have a sources
group that identifies the models or methods employed by that method. By following the softlinks contained in a method's or model's sources
group, it is possible to "drill down" from a method to its ultimate sources of information. In the sampling example above, interface evaluations performed via the 'sim' model at the request of the 'sampling' method could be obtained at the HDF5 path: /methods/sampling/sources/sim/sources/tb/
The evaluation data for each interface and model are stored using the same schema in a collection of groups and datasets that reside within that interface or model's highlevel location in the HDF5 file. This section describes that "lowlevel" schema.
Data are divided first of all into variables
, responses
, and metadata
groups.
The variables
group contains datasets that store the variables information for each evaluation. Four datasets may be present, one for each "domain": continuous
, discrete_integer
, discrete_string
, and discrete_real
. These datasets are twodimensional, with a row (0th dimension) for each evaluation and a column (1st dimension) for each variable. The 0th dimension has one dimension scale for the integervalued evaluation Id. The 1st dimension has two scales. The 0th scale contains descriptors of the variables, and the 1st contains their variable Ids. In this context, the Ids are a 1toN ranking of the variables in Dakota "input spec" order.
Variables  

Description  Values of variables in evaluations  
Location  variables/{continuous, discrete_integer, discrete_string, discrete_real}  
Shape  2dimensional: number of evaluations by number of variables  
Type  Real, String, or Integer, as applicable  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  evaluation_ids  Evaluation Ids  
1  String  variables  Variable descriptors  
1  Integer  variables  Variable Ids  1toN rank of the variable in Dakota input spec order  
1  String  types  Variable types  Type of each variable, e.g. CONTINUOUS_DESIGN, DISCRETE_DESIGN_SET_INT 
The responses group contains datasets for functions and, when available, gradients and Hessians.
Functions: The functions
dataset is twodimensional and contains function values for all responses. Like the variables datasets, evaluations are stored along the 0th dimension, and responses are stored along the 1st. The evaluation Ids and response descriptors are attached as scales to these axes, respectively.
Variables  

Description  Values of functions in evaluations  
Location  responses/functions  
Shape  2dimensional: number of evaluations by number of responses  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  evaluation_ids  Evaluation Ids  
1  String  responses  Response descriptors 
Gradients: The gradients
dataset is threedimensional. It has the shape . Dakota supports a specification of mixed_gradients, and the gradients
dataset is sized and organized such that only those responses for which gradients are available are stored. When mixed_gradients
are employed, a response will not necessarily have the same index in the functions
and gradients
datasets.
Because it is possible that the gradient could be computed with respect to any of the continuous variables, active or inactive, that belong to the associated model, the gradients
dataset is sized to accomodate gradients taken with respect to all continuous variables. Components that were not included in a particular evaluation will be set to NaN (not a number), and the derivative_variables_vector
(in the matadata group) for that evaluation can be examined as well.
Gradients  

Description  Values of gradients in evaluations  
Location  responses/gradients  
Shape  3dimensional: number of evaluations by number of responses by number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  evaluation_ids  Evaluation Ids  
1  String  responses  Response descriptors 
Hessians: Hessians are stored in a fourdimensional dataset, . The hessians
dataset shares many of the characteristics with the gradients:
in the mixed_hessians case, it will be smaller in the response dimension than the functions
dataset, and unrequested components are set to NaN.
Hessians  

Description  Values of Hessians in evaluations  
Location  responses/hessians  
Shape  4dimensional: number of evaluations by number of responses by number of variables by number of variables  
Type  Real  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  evaluation_ids  Evaluation Ids  
1  String  responses  Response descriptors 
The metadata
group contains up to three datasets.
Active Set Vector: The first is the active_set_vector
. It is two dimensional, with rows corresponding to evaluations and columns corresponding to responses. Each element contains an integer in the range 07, which indicates the request (function, gradient, Hessian) for the corresponding response for that evaluation. The 0th dimension has the evaluations Ids scale, and the 1st dimension has two scales: the response descriptors and the "default" or "maximal" ASV, an integer 07 for each response that indicates the information (function, gradient, Hessian) that possibly could have been requested during the study.
Active Set Vector  

Description  Values of the active set vector in evaluations  
Location  metadata/active_set_vector  
Shape  2dimensional: number of evaluations by number of responses  
Type  Integer  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  evaluation_ids  Evaluation Ids  
1  String  responses  Response descriptors 
Derivative Variables Vector: The second dataset in the metadata group is the derivative_variables_vector
dataset. It is included only when gradients or Hessians are available. Like the ASV, it is twodimensional. Each column of the DVV dataset corresponds to a continuous variable and contains a 0 or 1, indicating whether gradients and Hessians were computed with respect to that variaable for the evaluation. The 0th dimension has the evaluation Ids as a scale, and the 1st dimension has two scales. The 0th is the descriptors of the continuous variables. The 1st contains the variable Ids of the continuous variables.
Derivative Variables Vector  

Description  Values of the derivative variables vector in evaluations  
Location  metadata/derivative_variables_vector  
Shape  2dimensional: number of evaluations by number of continuous variables  
Type  Integer  
Scales  Dimension  Type  Label  Contents  Notes  
0  Integer  evaluation_ids  Evaluation Ids  
1  String  variables  Variable descriptors 
Analysis Components: The final dataset in the metadata group is the analysis_components
dataset. It is a 1D dataset that is present only when the user specified analysis components, and it contains those components as strings.
Analysis Components  

Description  Values of the analysis components in evaluations  
Location  metadata/analysis_components  
Shape  1dimensional: number of analysis components  
Type  String 
When HDF5 output is enabled (by including the hdf5 keyword), then by default evaluation data for the following components will be stored:
methodhybrid
, no model evaluation data will be stored.)The user can override these defaults using the keywords model_selection and interface_selection.
The choices for model_selection
are:
The choices for interface_selection
are:
If a model or interface is excluded from storage by these selections, then they cannot appear in the sources
group for methods or models.
Variables are characterized by parameters such as the mean and standard deviation or lower and upper bounds. Typically, users provide these parameters as part of their input to Dakota, but Dakota itself may also compute them as it scales and transforms variables, normalizes empirical distributions (e.g. for histogram_bin_uncertain
variables), or calculates alternative parameterizations (lambda and zeta vs mean and standard deviation for a lognormal_uncertain
).
Beginning with release 6.11, models write their variable’s parameters to HDF5. The information is located in each model's metadata/variable_parameters
subgroup. Within this group, parameters are stored by Dakota variable type (e.g. normal_uncertain
), with one 1D dataset per type. The datasets have the same names as their variable types and have one element per variable. Parameters are stored by name.
Consider the following variable specification, which includes two normal and two uniform variables:
variables normal_uncertain 2 descriptors 'nuv_1' 'nuv_2' means 0.0 1.0 std_devations 1.0 0.5 uniform_uncertain 2 descriptors 'uuv_1' 'uuv_2' lower_bounds 1.0 0.0 upper_bounds 1.0 1.0
Given this specification, and assuming a model ID of “tb_model”, Dakota will write two 1D datasets, both of length 2, to the group /models/simulation/tb_model/metadata/variable_parameters
, the first named normal_uncertain
, and the second named uniform_uncertain
. Using a JSONlike representation for illustration, the normal_uncertain
dataset will appear as:
[ { "mean": 0.0, "std_deviation": 1.0, "lower_bound": inf, "upper_bound": inf }, { "mean": 1.0, "std_deviation": 0.5, "lower_bound": inf, "upper_bound": inf } ]
The uniform_uncertain
dataset will contain:
[ { "lower_bound": 1.0, "upper_bound": 1.0 }, { "lower_bound": 0.0, "upper_bound": 1.0 } ]
In these representations of the normal_uncertain
and uniform_uncertain
datasets, the outer square brackets ([]) enclose the dataset, and each element within the datasets are enclosed in curly braces ({}). The curly braces are meant to indicate that the elements are dictionarylike objects that support access by string field name. A bit more concretely, the following code snippet demonstrates reading the mean of the second normal variable, nuv_2.
The feature in HDF5 that underlies this namebased storage of fields is compound datatypes, which are similar to C/C++ structs or Python dictionaries. Further information about how to work with compound datatypes is available in the h5py documentation.
In most cases, datasets for storing parameters have names that match their variable types. The normal_uncertain
and uniform_uncertain
datasets illustrated above are examples. Exceptions include types such as discrete_design_set, which has string, integer, and real subtypes. For these, the dataset name is the toplevel type with _string
, _int
, or _real
appended: discrete_design_set_string
, discrete_design_set_int
, and discrete_design_set_real
.
Most Dakota variable types have scalar parameters. For these, the names of the parameters are generally the singular form of the associated Dakota keyword. For example, triangular_uncertain variables are characterized in Dakota input using the plural keywords modes
, lower_bounds
, and upper_bounds
. The singular field names are, respectively, "mode", "lower_bound", and "upper_bound". In this case, all three parameters are realvalued and stored as floating point numbers, but variable types/fields can also be integervalued (e.g. binomial_uncertain/num_trials
) or stringvalued.
Some variable/parameter fields contain 1D arrays or vectors of information. Consider histogram_bin_uncertain variables, for which the user specifies not just one value, but an ordered collection of abscissas and corresponding ordinates or counts. Dakota stores the abscissas in the "abscissas" field, which is a 1D dataset of floatingpoint numbers. It similarly stores the counts in the "counts" field. (In this case, only the normalized counts are stored, regardless of whether the user provided counts or ordinates.)
When the user specifies more than one histogram_bin_uncertain
variable, it often is also necessary to include the pairs_per_variable keyword to divide the abscissa/count
pairs among the variables. This raises the question of how lists of parameters that vary in length across the variables ought to be stored.
Although HDF5 supports variablelength datasets, for simplicity (and due to limitations in h5py at the time of the 6.11 release), Dakota stores vector parameter fields in conventional fixedlength datasets. The lengths of these datasets are determined at runtime in the following way: For a particular variable type and field, the field for all variables is sized to be large enough to accommodate the variable with the longest list of parameters. Any unused space for a particular variable is filled with NaN
(if the parameter is realvalued), INTMAX
(integervalued), or an empty string (stringvalued
). In addition, each variable has an additional field, "num_elements", that reports the number of elements in the fields that contain actual data and not fill values.
Consider this example, in which the user has specified a pair of histogram_bin_uncertain
variables. The first has 3 pairs, and the second has 4.
variables histogram_bin_uncertain 2 pairs_per_variable 2 3 abscissas 0.0 0.5 1.0 1.0 0.5 0.5 1.0 counts 0.25 0.75 0.0 0.2 0.4 0.2 0.0
For this specification, Dakota will write a dataset named histogram_bin_uncertain
to the metadata/variable_parameters/
subgroup for the model. It will be of length 2, one element for each variable, and contain the following:
[ { "num_elements": 3, "abscissas": [0.0, 0.5, 1.0, NaN], "counts": [0.25, 0.75, 0.0, NaN] }, { "num_elements": 4, "abscissas": [1.0, 0.5, 0.5, 1.0], "counts": [0.2, 0.4, 0.2, 0.0] } ]
The fields available for a variable parameters dataset can be determined in h5py by examining the datatype of the dataset.
h5py has a known bug that prevents parameters for some types of variables from being accessed (the Python interpreter crashes with a segfault). These include:
histogram_point_uncertain
stringdiscrete_uncertain_set
stringThe variable parameter datasets have two dimension scales. The first (index 0) contains the variable descriptors, and the second (index 1) contains variable Ids. Available Parameters
The table below lists all Dakota variables and parameters that can be stored.
Distribution Parameters  

Variable Type  Parameter Name  Type  Rank  
continuous_design  lower_bound  real  scalar  
upper_bound  real  scalar  
discrete_design_range  lower_bound  integer  scalar  
upper_bound  integer  scalar  
discrete_design_set_int  num_elements  integer  scalar  
elements  integer  vector  
discrete_design_set_string  num_elements  integer  scalar  
elements  string  vector  
discrete_design_set_real  num_elements  integer  scalar  
elements  real  vector  
normal_uncertain  mean  real  scalar  
std_deviation  real  scalar  
lower_bound  real  scalar  
upper_bound  real  scalar  
lognormal_uncertain  lower_bound  real  scalar  
upper_bound  real  scalar  
mean  real  scalar  
std_deviation  real  scalar  
error_factor  real  scalar  
lambda  real  scalar  
zeta  real  scalar  
uniform_uncertain  lower_bound  real  scalar  
upper_bound  real  scalar  
loguniform_uncertain  lower_bound  real  scalar  
upper_bound  real  scalar  
triangular_uncertain  mode  real  scalar  
lower_bound  real  scalar  
upper_bound  real  scalar  
exponential_uncertain  beta  real  scalar  
beta_uncertain  alpha  real  scalar  
beta  real  scalar  
lower_bound  real  scalar  
upper_bound  real  scalar  
gamma_uncertain  alpha  real  scalar  
beta  real  scalar  
gumbel_uncertain  alpha  real  scalar  
beta  real  scalar  
frechet_uncertain  alpha  real  scalar  
beta  real  scalar  
weibull_uncertain  alpha  real  scalar  
beta  real  scalar  
histogram_bin_uncertain  num_elements  integer  scalar  
abscissas  real  vector  
counts  real  vector  
poisson_uncertain  lambda  real  scalar  
binomial_uncertain  probability_per_trial  real  scalar  
num_trials  integer  scalar  
negative_binomial_uncertain  probability_per_trial  real  scalar  
num_trials  integer  scalar  
geometric_uncertain  probability_per_trial  real  scalar  
hypergeometric_uncertain  total_population  integer  scalar  
selected_population  integer  scalar  
num_drawn  integer  scalar  
histogram_point_uncertain_int  num_elements  integer  scalar  
abscissas  integer  vector  
counts  real  vector  
histogram_point_uncertain_real  num_elements  integer  scalar  
abscissas  real  vector  
counts  real  vector  
continuous_interval_uncertain  num_elements  integer  scalar  
interval_probabilities  real  vector  
lower_bounds  real  vector  
upper_bounds  real  vector  
discrete_interval_uncertain  num_elements  integer  scalar  
interval_probabilities  real  vector  
lower_bounds  integer  vector  
upper_bounds  integer  vector  
discrete_uncertain_set_int  num_elements  integer  scalar  
elements  integer  vector  
set_probabilities  real  vector  
discrete_uncertain_set_real  num_elements  integer  scalar  
elements  real  vector  
set_probabilities  real  vector  
continuous_state  lower_bound  real  scalar  
upper_bound  real  scalar  
discrete_state_range  lower_bound  integer  scalar  
upper_bound  integer  scalar  
discrete_state_set_int  num_elements  integer  scalar  
elements  integer  vector  
discrete_state_set_string  num_elements  integer  scalar  
elements  string  vector  
discrete_state_set_real  num_elements  integer  scalar  
elements  real  vector 