Dakota Reference Manual  Version 6.16
Explore and Predict with Confidence
 All Pages
Dakota HDF5 Output

Beginning with release 6.9, Dakota gained the ability to write many method results such as the correlation matrices computed by sampling studies and the best parameters discovered by optimization methods to disk in HDF5. In Dakota 6.10 and above, evaluation data (variables and responses for each model or interface evaluation) may also be written. Many users may find this newly supported format more convenient than scraping or copying and pasting from Dakota's console output.

To enable HDF5 output, the results_output keyword with the hdf5 option must be added to the Dakota input file. In additon, Dakota must have been built with HDF5 support. Beginning with Dakota 6.10, HDF5 is enabled in our publicly available downloads. HDF5 support is considered a somewhat experimental feature. The results of some Dakota methods are not yet written to HDF5, and in a few, limited situations, enabling HDF5 will cause Dakota to crash.

HDF5 Concepts

HDF5 is a format that is widely used in scientific software for efficiently storing and organizing data. The HDF5 standard and libraries are maintained by the HDF Group.

In HDF5, data are stored in multidimensional arrays called datasets. Datasets are organized hierarchically in groups, which also can contain other groups. Datasets and groups are conceptually similar to files and directories in a filesystem. In fact, every HDF5 file contains at least one group, the root group, denoted "/", and groups and datasets are referred to using slash-delimited absolute or relative paths, which are more accurately called link names.

hdf5_layout.png
Example HDF5 Layout

HDF5 has as one goal that data be "self-documenting" through the use of metadata. Dakota output files include two kinds of metadata.

  • Dimension Scales. Each dimension of a dataset may have zero or more scales, which are themselves datasets. Scales are often used to provide, for example, labels analogous to column headings in a table (see the dimension scales that Dakota applies to moments) or numerical values of an indepenent variable (user-specified probability levels in level mappings).
  • Attributes. key:value pairs that annotate a group or dataset. A key is always a character string, such as dakota_version, and (in Dakota output) the value can be a string-, integer-, or real-valued scalar. Dakota stores the number of samples that were requested in a sampling study in the attribute 'samples'.

Accessing Results

Many popular programming languages have support, either natively or from a third-party library, for reading and writing HDF5 files. The HDF Group itself supports C/C++ and Java libraries. The Dakota Project suggests the h5py module for Python. Examples that demonstrate using h5py to access and use Dakota HDF5 output may be found in the Dakota installation at dakota/share/dakota/examples/official/hdf5.

Organization of Results

Currently, complete or nearly complete coverage of results from sampling, optimization and calibration methods, parameter studies, and stochastic expansions exists. Coverage will continue to expand in future releases to include not only the results of all methods, but other potentially useful information such as interface evaluations and model tranformations.

Methods in Dakota have a character string Id and are executed by Dakota one or more times. (Methods are executed more than once in studies that include a nested model, for example.) The Id may be provided by the user in the input file using the id_method keyword, or it may be automatically generated by Dakota. Dakota uses the label NO_METHOD_ID for methods that are specified in the input file without an id_method, and NOSPEC_METHOD_ID_<N> for methods that it generates for its own internal use. The <N> in the latter case is an incrementing integer that begins at 1.

The results for the <N>th execution of a method that has the label <method Id> are stored in the group

/methods/<method Id>/results/execution:<N>/

The /methods group is always present in Dakota HDF5 files, provided at least one method added results to the output. (In a future Dakota release, the top level groups /interfaces and /models will be added.) The group execution:1 also is always present, even if there is only a single execution.

The groups and datasets for each type of result that Dakota is currently capable of storing are described in the following sections. Every dataset is documented in its own table. These tables include:

  • A brief description of the dataset.
  • The location of the dataset relative to /methods/<method Id>/execution:<N>. This path may include both literal text that is always present and replacement text. Replacement text is <enclosed in angle brackets and italicized>. Two examples of replacement text are <response descriptor> and <variable descriptor>, which indicate that the name of a Dakota response or variable makes up a portion of the path.
  • Clarifying notes, where appropriate.
  • The type (String, Integer, or Real) of the information in the dataset.
  • The shape of the dataset; that is, the number of dimensions and the size of each dimension.
  • A description of the dataset's scales, which includes
    • The dimension of the dataset that the scale belongs to.
    • The type (String, Integer, or Real) of the information in the scale.
    • The label or name of the scale.
    • The contents of the scale. Contents that appear in plaintext are literal and will always be present in a scale. Italicized text describes content that varies.
    • notes that provide further clarification about the scale.
  • A description of the dataset's attributes, which are key:value pairs that provide helpful context for the dataset.

The Expected Output section of each method's keyword documentation indicates the kinds of output, if any, that method currently can write to HDF5. These are typically in the form of bulleted lists with clariying notes that refer back to the sections that follow.

Study Metadata

Several pieces of information about the Dakota study are stored as attributes of the top-level HDF5 root group ("/"). These include:

Study Attributes
Label Type Description
dakota_version String Version of Dakota used to run the study
dakota_revision String Dakota version control information
output_version String Version of the output file
input String Dakota input file
top_method String Id of the top-level method
total_cpu_time Real Combined parent and child CPU time in seconds
parent_cpu_time Real Parent CPU time in seconds (when Dakota is built with UTILIB)
child_cpu_time Real Child CPU time in seconds (when Dakota is built with UTILIB)
total_wallclock_time Real Total wallclock time in seconds (when Dakota is built with UTILIB)
mpi_init_wallclock_time Real Wallclock time to MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)
run_wallclock_time Real Wallclock time since MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)
mpi_wallclock_time Real Wallclock time since MPI_Init in seconds (when Dakota is not built with UTILIB and run in parallel)

A Note about Variables Storage

Variables in most Dakota output (e.g. tabular data files) and input (e.g. imported data to construct surrogates) are listed in "input spec" order. (The variables keyword section is arranged by input spec order.) In this ordering, they are sorted first by function:

  1. Design
  2. Aleatory
  3. Epistemic
  4. State

And within each of these categories, they are sorted by domain:

  1. Continuous
  2. Discrete integer (sets and ranges)
  3. Discrete string
  4. Discrete real

A shortcoming of HDF5 is that datasets are homogeneous; for example, string- and real-valued data cannot readily be stored in the same dataset. As a result, Dakota has chosen to flip "input spec" order for HDF5 and sort first by domain, then by function when storing variable information. When applicable, there may be as many as four datasets to store variable information: one to store continuous variables, another to store discrete integer variables, and so on. Within each of these, variables will be ordered by function.

Sampling Moments

sampling produces moments (e.g. mean, standard deviation or variance) of all responses, as well as 95% lower and upper confidence intervals for the 1st and 2nd moments. These are stored as described below. When sampling is used in incremental mode by specifying refinement_samples, all results, including the moments group, are placed within groups named increment:<N>, where <N> indicates the increment number beginning with 1.

Moments
Description 1st through 4th moments for each response
Location [increment:<N>]/moments/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: length of 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Moment Confidence Intervals
Description Lower and upper 95% confidence intervals on the 1st and 2nd moments
Location moment_confidence_intervals/<response descriptor>
Shape 2-dimensional: 2x2
Type Real
Scales Dimension Type Label Contents Notes
0 String bounds lower, upper
1 String moments mean, std_deviation Only for standard moments
1 String moments mean, variance Only for central moments

Correlations

A few different methods produce information about the correlations between pairs of variables and responses (collectively: factors). The four tables in this section describe how correlation information is stored. One important note is that HDF5 has no special, native type for symmetric matrices, and so the simple correlations and simple rank correlations are stored in dense 2D datasets.

Simple Correlations
Description Simple correlation matrix
Location [increment:<N>]/simple_correlations
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2-dimensional: number of factors by number of factors
Type Real
Scales Dimension Type Label Contents Notes
0, 1 String factors Variable and response descriptors The scales for both dimensions are identical

Simple Rank Correlations
Description Simple rank correlation matrix
Location [increment:<N>]/simple_rank_correlations
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2-dimensional: number of factors by number of factors
Type Real
Scales Dimension Type Label Contents Notes
0, 1 String factors Variable and response descriptors The scales for both dimensions are identical

Partial Correlations
Description Partial correlations
Location [increment:<N>]/partial_correlations/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Partial Rank Correlations
Description Partial Rank correlations
Location [increment:<N>]/partial_rank_correlations/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Probability Density

Some aleatory UQ methods estimate the probability density of resposnes.

Probability Density
Description Probability density of a response
Location [increment:<N>]/probability_density/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of bins in probability density
Type Real
Scales Dimension Type Label Contents Notes
0 Real lower_bounds Lower bin edges
0 Real upper_bounds Upper bin edges

Level Mappings

Aleatory UQ methods can calculate level mappings (from user-specified probability, reliability, or generalized reliability to response, or vice versa).

Probability Levels
Description Response levels corresponding to user-specified probability levels
Location [increment:<N>]/probability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real probability_levels User-specified probability levels

Reliability Levels
Description Response levels corresponding to user-specified reliability levels
Location [increment:<N>]/reliability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real reliability_levels User-specified reliability levels

Generalized Reliability Levels
Description Response levels corresponding to user-specified generalized reliability levels
Location [increment:<N>]/gen_reliability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real gen_reliability_levels User-specified generalized reliability levels

Response Levels
Description Probability, reliability, or generalized reliability levels corresponding to user-specified response levels
Location [increment:<N>]/response_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real response_levels User-specified response levels

Variance-Based Decomposition (Sobol' Indices)

Dakota's sampling method can produce main and total effects; stochastic expansions (polynomial_chaos, stoch_collocation) additionally can produce interaction effects.

Main Effects
Description First-order Sobol' indices
Location main_effects/<response descriptor>
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Total Effects
Description Total-effect Sobol' indices
Location total_effects/<response descriptor>
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors
Each order (pair, 3-way, 4-way, etc) of interaction is stored in a separate dataset. The scales are unusual in that they are two-dimensional to contain the labels of the variables that participate in each interaction.
Interaction Effects
Description Sobol' indices for interactions
Location order_<N>_interactions/<response descriptor>
Shape 1-dimensional: number of Nth order interactions
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Descriptors of the variables in the interaction Scales for interaction effects are 2D datasets with the dimensions (number of interactions, N)

Integration and Expansion Moments

Stochastic expansion methods can obtain moments two ways.

Integration Moments
Description Moments obtained via integration
Location integration_moments/<response descriptor>
Shape 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Expansion Moments
Description Moments obtained via expansion
Location expansion_moments/<response descriptor>
Shape 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Extreme Responses

sampling with epistemic variables produces extreme values (minimum and maximum) for each response.

Extreme Responses
Description The sample minimum and maximum of each response
Location [increment:<N>]/extreme_responses/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2
Type Real
Scales Dimension Type Label Contents Notes
0 String extremes minimum, maximum

Parameter Sets

All parameter studies (vector_parameter_study, list_parameter_study, multidim_parameter_study, centered_parameter_study) record tables of evaluations (parameter-response pairs), similar to Dakota's tabular output file. Centered parameter studies additionally store evaluations in an order that is more natural to intepret, which is described below.

In the tabular-like listing, variables are stored according to the scheme described in a previous section.

Parameter Sets
Description Parameter study evaluations in a tabular-like listing
Location parameter_sets/{continuous_variables, discrete_integer_variables, discrete_string_variables, discrete_real_variables, responses}
Shape 2-dimensional: number of evaluations by number of variables or responses
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
1 String variables or responses Variable or response descriptors

Variable Slices

Centered paramter studies store "slices" of the tabular data that make evaluating the effects of each variable on each response more convenient. The steps for each individual variable, including the initial or center point, and corresponding responses are stored in separate groups.

Variable Slices
Description Steps, including center/initial point, for a single variable
Location variable_slices/<variable descriptor>/steps
Shape 1-dimensional: number of user-specified steps for this variable
Type Real, String, or Integer, as applicable

Variable Slices - Responses
Description Responses for variable slices
Location variable_slices/<variable descriptor>/responses
Shape 2-dimensional: number of evaluations by number of responses
Type Real
Scales Dimension Type Label Contents Notes
1 String responses Response descriptors

Best Parameters

Dakota's optimization and calibration methods report the parameters at the best point (or points, for multiple final solutions) discovered. These are stored using the scheme decribed in the variables section. When more than one solution is reported, the best parameters are nested in groups named set:<N>, where <N> is a integer numbering the set and beginning with 1.

State (and other inactive variables) are reported when using objective functions and for some calibration studies. However, when using configuration variables in a calibration, state variables are suppressed.

Best Parameters
Description Best parameters discovered by optimization or calibration
Location [set:<N>]/best_parameters/{continuous, discrete_integer, discrete_string, discrete_real}
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Best Objective Functions

Dakota's optimization methods report the objective functions at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best objective functions are nested in groups named set:<N>, where <N> is a integer numbering the set and beginning with 1.

Best Objective Functions
Description Best objective functions discovered by optimization
Location [set:<N>]/best_objective_functions
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of objective functions
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Best Nonlinear Constraints

Dakota's optimization and calibration methods report the nonlinear constraints at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best constraints are nested in groups named set:<N>, where N is a integer numbering the set and beginning with 1.

Best Nonlinear Constraints
Description Best nonlinear constraints discovered by optimization or calibration
Location [set:<N>]/best_constraints
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of nonlinear constraints
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Calibration

When using calibration terms with an optimization method, or when using a nonlinear least squares method such as nl2sol, Dakota reports residuals and residual norms for the best point (or points, for multiple final solutions) discovered.

Best Residuals
Description Best residuals discovered
Location best_residuals
Shape 1-dimensional: number of residuals
Type Real

Best Residual Norm
Description Norm of best residuals discovered
Location best_norm
Shape Scalar
Type Real

Parameter Confidence Intervals

Least squares methods (nl2sol, nlssol_sqp, optpp_g_newton) compute confidence intervals on the calibration parameters.

Parameter Confidence Intervals
Description Lower and upper confidence intervals on calibrated parameters
Location confidence_intervals
Notes The confidence intervals are not stored when there is more than one experiment.
Shape 2-dimensional: 2x2
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable desriptors
1 String bounds lower, upper

Best Model Responses (without configuration variables)

When performing calibration with experimental data (but no configruation variables), Dakota records, in addition to the best residuals, the best original model resposnes.

Best Model Responses
Description Original model responses for the best residuals discovered
Location best_model_responses
Shape 1-dimensional: number of model responses
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Best Model Responses (with configuration variables)

When performing calibration with experimental data that includes configuration variables, Dakota reports the best model responses for each experiment. These results include the configuration variables, stored in the scheme described in the variables section, and the model responses.

Best Configuration Variables for Experiment
Description Configuration variables associated with experiment N
Location best_model_responses/experiment:<N>/{continuous_config_variables, discrete_integer_config_variables, discrete_string_config_variables, discrete_real_config_variables}
Shape 1-dimensional: number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Best Model Responses for Experiment
Description Original model responses for the best residuals discovered
Location best_model_responses/experiment:<N>/responses
Shape 1-dimensional: number of model responses
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Multistart and Pareto Set

The multi_start and pareto_set methods are meta-iterators that control multiple optimization sub-iterators. For both methods, Dakota stores the results of the sub-iterators (best parameters and best results). For multi_start, Dakota additionally stores the initial points, and for pareto_set, it stores the objective function weights.

Starting Points (multi_start)
Description Starting points for multi_start
Location starting_points/continuous
Notes Currently only continuous starting points are supported by multi_start
Shape 2-dimensional: number of sets by number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 Integer set_id set Ids
1 String variables Variable descriptors

Weights (pareto_set)
Description Response Weights for pareto_set
Location weights
Shape 2-dimensional: number of sets by number of responses
Type Real
Scales Dimension Type Label Contents Notes
0 Integer set_id set Ids
1 String weights w1, w2, ... wN

Best Parameters (multi_start or pareto_set)
Description Best parameters discovered by multi_start or pareto_set
Location best_parameters/{continuous, discrete_integer, discrete_string, discrete_real}
Shape 2-dimensional: number of sets by number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 Integer set_id set Ids
1 String variables Variable descriptors

Best responses
Description Best responses for multi_start and pareto_set
Location best_responses
Shape 2-dimensional: number of sets by number of responses
Type Real
Scales Dimension Type Label Contents Notes
0 Integer set_id set Ids
1 String responses Response descriptors

Organization of Evaluations

An evaluation is a mapping from variables to responses performed by a Dakota model or interface. Beginning with release 6.10, Dakota has the ability to report evaluation history in HDF5 format. The HDF5 format offers many advantages over existing console output and tabular output. Requring no "scraping", it is more convenient for most users than the former, and being unrestricted to a two-dimensional, tabular arragnment of information, it is far richer than the latter.

This section begins by describing the Dakota components that can generate evaluation data. It then documents the high-level organization of the data from those components. Detailed documentation of the individual datasets (the "low-level" organization) where data are stored follows. Finally, information is provided concerning input keywords that control which components report evaluations.

Sources of Evaluation Data

Evaluation data are produced by only two kinds of components in Dakota: models and interfaces. The purpose of this subsection is to provide a basic description of models and interfaces for the purpose of equipping users to manage and understand HDF5-format evaluation data.

Because interfaces and models must be specified in even simple Dakota studies, most novice users of Dakota will have some familiarity with these concepts. However, the exact nature of the relationship between methods, models, and interfaces may be unclear. Moreover, the models and interfaces present in a Dakota study are not always limited to those specified by the user. Some input keywords or combinations of components cause Dakota to create new models or interfaces "behind the scenes" and without the user's direct knowledge. Not only can user-specified models and interfaces write evaluation data to HDF5, but also these auto-generated components. Accordingly, it may be helpful for consumers of Dakota's evaluation data to have a basic understanding of how Dakota creates and employs models and interfaces.

Consider first the input file shown here.

environment
  tabular_data
  results_output
    hdf5

method
  id_method 'sampling'
  sampling
    samples 20
  model_pointer 'sim'

model
  id_model 'sim'
  single
  interface_pointer 'tb'

variables
  uniform_uncertain 2
    descriptors 'x1' 'x2'
    lower_bounds 0.0 0.0
    upper_bounds 1.0 1.0

responses
  response_functions 1
    descriptors 'f'
  no_gradients
  no_hessians

interface
  id_interface 'tb'
  fork
    analysis_drivers 'text_book'

This simple input file specifies a single method of type sampling, which also has the Id 'sampling'. The 'sampling' method possesses a model of type single (alias simulation) named 'sim', which it uses to perform evaluations. (Dakota would have automatically generated a single model had one not been specified.) That is to say, for each variables-to-response mapping required by the method, it provides variables to the model and receives back responses from it.

Single/simulation models like 'sim' perform evaluations by means of an interface, typically an interface to an external simulation. In this case, the interface is 'tb'. The model passes the variables to 'tb', which executes the text_book driver, and receives back responses.

It is clear that two components produce evaluation data in this study. The first is the single model 'sim', which receives and fulfills evaluation requests from the method 'sampling', and the second is the interface 'tb', which similarly receives requests from 'sim' and fulfills them by running the text_book driver.

Because tabular data was requested in the environment block, a record of the model's evaluations will be reported to a tabular file. The interface's evaluations could be dumped from the restart file using dakota_restart_util.

If we compared these evaluation histories from 'sim' and 'tb', we would see that they are identical to one another. The model 'sim' is a mere "middle man" whose only responsibility is passing variables from the method down to the interface, executing the interface, and passing responses back up to the method. However, this is not always the case.

For example, if this study were converted to a gradient-based optimzation using optpp_q_newton, and the user specified numerical_gradients :

# model and interface same as above. Replace the method, variables, and responses with:

method
  id_method 'opt'
  optpp_q_newton

variables
  continuous_design 2
    descriptors 'x1' 'x2'
    lower_bounds 0.0 0.0
    upper_bounds 1.0 1.0

responses
   objective_functions 1
    descriptors 'f'
  numerical_gradients
  no_hessians

Then the model would have the responsibility of performing finite differencing to estimate gradients of the response 'f' requested by the method. Multiple function evaluations of 'tb' would map to a single gradient evaluation at the model level, and the evaluation histories of 'sim' and 'tb' would contain different information.

Note that because it is unwieldy to report gradients (or Hessians) in a tabular format, they are not written to the tabular file, and historically were avialable only in the console output. The HDF5 format provides convenient access to both the "raw" evaluations performed by the interface and higher level model evaluations that include estimated gradients.

This pair of examples hopefully provides a basic understanding of the flow of evaluation data between a method, model, and interface, and explains why models and interfaces are producers of evaluation data.

Next consider a somewhat more complex study that includes a Dakota model of type surrogate. A surrogate model performs evaluations requested by a method by executing a special kind of interface called an approximation interface, which Dakota implicitly creates without the direct knowledge of the user. Approximation interfaces are a generic container for the various kinds of surrogates Dakota can use, such as gaussian processes.

A Dakota model of type global surrogate may use a user-specified dace method to construct the actual underlying model(s) that it evaluates via its approximation interface. The dace method will have its own model (typically of type single/simulation), which will have a user-specified interface.

In this more complicated case there are at least four components that produce evaluation data: (1) the surrogate model and (2) its approximation interface, and (3) the dace method's model and (4) its interface. Although only components (1), (3), and (4) are user-specified, evaluation data produced by (2) may be written to HDF5, as well. (As explained below, only evaluations performed by the surrogate model and the dace interface will be recorded by default. This can be overriden using hdf5 sub-keywords.) This is an example where "extra" and potentially confusing data appears in Dakota's output due to an auto-generated component.

An important family of implicitly-created models is the recast models, which have the responsibility of transforming variables and responses. One type of recast called a data transform model is responsible for computing residuals when a user provides experimental data in a calibration study. Scaling recast models are employed when scaling is requested by the user for variables and/or responses.

Recast models work on the principle of function composition, and "wrap" a submodel, which may itself also be a recast model. The innermost model in the recursion often will be the simulation or surrogate model specified by the user in the input file. Dakota is capable of recording evaluation data at each level of recast.

High-level Organization of Evaluation Data

This subsection describes how evaluation data produced by models and interfaces are organized at high level. A detailed description of the datasets and subgroups that contain evaluation data for a specific model or interface is given in the next subsection.

Two top level groups contain evaluation data, /interfaces and /models.

Interfaces

Because interfaces can be executed by more than one model, interface evaluations are more precisely thought of as evaluations of an interface/model combination. Consequently, interface evaluations are grouped not only by interface Id ('tb' in the example above), but also the Id of the model that requested them ('sim').

/interfaces/<interface Id>/<model Id>/

If the user does not provide an Id for an interface that he specifies, Dakota assigns it the Id NO_ID. Approximation interfaces receive the Id APPROX_INTERFACE_<N>, where N is an incrementing integer beginning at 1. Other kinds of automatically generated interfaces are named NOSPEC_INTERFACE_ID_<N>.

Models

The top-level group for model evaluations is /models. Within this group, model evaluations are grouped by type: simulation, surrogate, nested, or recast, and then by model Id. That is:

/models/<type>/<model Id>/    

Similar to interfaces, user-specified models that lack an Id are given one by Dakota. A single model is named NO_MODEL_ID. Some automatically generated models receive the name NOSPEC_MODEL_ID.

Recast models are a special case and receive the name RECAST_<WRAPPED-MODEL>_<TYPE>_<N>. In this string:

  • WRAPPED-MODEL is the Id of the innermost wrapped model, typically a user-specified model
  • TYPE is the specific kind of recast. The three most common recasts are:
    • RECAST: several generic responsibilities, including summing objective functions to present to a single-objective optimizer
    • DATA_TRANSFORM: Compute residuals in a calibration
    • SCALING: scale variables and responses
  • N is an incrementing integer that begins with 1. It is employed to distinguish recasts of the same type that wrap the same underlying model.

The model's evaluations may be the result of combining information from multiple sources. A simulation/single model will receive all the information it requires from its interface, but more complicated model types may use information not only from interfaces, but also other models and the results of method executions. Nested models, for instance, receive information from a submethod (the mean of a response from a sampling study, for instance) and potentially also an optional interface.

The sources of a model's evaluations may be roughly identified by examining the contents of that models' sources group. The sources group contains softlinks (note: softlinks are an HDF5 feature analogous to soft or symbolic links on many file systems) to groups for the interfaces, models, or methods that the model used to produce its evaluation data. (At this time, Dakota does not report the specific interface or model evaluations or method executions that were used to produce a specific model evaluation, but this is a planned feature.)

Method results likewise have a sources group that identifies the models or methods employed by that method. By following the softlinks contained in a method's or model's sources group, it is possible to "drill down" from a method to its ultimate sources of information. In the sampling example above, interface evaluations performed via the 'sim' model at the request of the 'sampling' method could be obtained at the HDF5 path: /methods/sampling/sources/sim/sources/tb/

Low-Level Organization of Evaluation Data

Within each model and interface's "high-level" group, evaluation data are stored according to a "low-level" schema. This section desribes the "low-level" schema.

Data are divided first of all into variables, responses, and properties groups. In addition, if a a user specifies metadata responses in his Dakota input, a metadata dataset will be present.

Variables

The variables group contains datasets that store the variables information for each evaluation. Four datasets may be present, one for each "domain": continuous, discrete_integer, discrete_string, and discrete_real. These datasets are two-dimensional, with a row (0th dimension) for each evaluation and a column (1st dimension) for each variable. The 0th dimension has one dimension scale for the integer-valued evaluation Id. The 1st dimension has two scales. The 0th scale contains descriptors of the variables, and the 1st contains their variable Ids. In this context, the Ids are a 1-to-N ranking of the variables in Dakota "input spec" order.

Variables
Description Values of variables in evaluations
Location variables/{continuous, discrete_integer, discrete_string, discrete_real}
Shape 2-dimensional: number of evaluations by number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String variables Variable descriptors
1 Integer variables Variable Ids 1-to-N rank of the variable in Dakota input spec order
1 String types Variable types Type of each variable, e.g. CONTINUOUS_DESIGN, DISCRETE_DESIGN_SET_INT

Responses

The responses group contains datasets for functions and, when available, gradients and Hessians.

Functions: The functions dataset is two-dimensional and contains function values for all responses. Like the variables datasets, evaluations are stored along the 0th dimension, and responses are stored along the 1st. The evaluation Ids and response descriptors are attached as scales to these axes, respectively.

Variables
Description Values of functions in evaluations
Location responses/functions
Shape 2-dimensional: number of evaluations by number of responses
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Gradients: The gradients dataset is three-dimensional. It has the shape $ evaluations \times responses \times variables$. Dakota supports a specification of mixed_gradients, and the gradients dataset is sized and organized such that only those responses for which gradients are available are stored. When mixed_gradients are employed, a response will not necessarily have the same index in the functions and gradients datasets.

Because it is possible that the gradient could be computed with respect to any of the continuous variables, active or inactive, that belong to the associated model, the gradients dataset is sized to accomodate gradients taken with respect to all continuous variables. Components that were not included in a particular evaluation will be set to NaN (not a number), and the derivative_variables_vector (in the matadata group) for that evaluation can be examined as well.

Gradients
Description Values of gradients in evaluations
Location responses/gradients
Shape 3-dimensional: number of evaluations by number of responses by number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Hessians: Hessians are stored in a four-dimensional dataset, $ evaluations \times responses \times \times variables \times variables $. The hessians dataset shares many of the characteristics with the gradients: in the mixed_hessians case, it will be smaller in the response dimension than the functions dataset, and unrequested components are set to NaN.

Hessians
Description Values of Hessians in evaluations
Location responses/hessians
Shape 4-dimensional: number of evaluations by number of responses by number of variables by number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Properties

The properties group contains up to four members.

Active Set Vector: The first is the active_set_vector dataset. It is two dimensional, with rows corresponding to evaluations and columns corresponding to responses. Each element contains an integer in the range 0-7, which indicates the request (function, gradient, Hessian) for the corresponding response for that evaluation. The 0th dimension has the evaluations Ids scale, and the 1st dimension has two scales: the response descriptors and the "default" or "maximal" ASV, an integer 0-7 for each response that indicates the information (function, gradient, Hessian) that possibly could have been requested during the study.

Active Set Vector
Description Values of the active set vector in evaluations
Location properties/active_set_vector
Shape 2-dimensional: number of evaluations by number of responses
Type Integer
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Derivative Variables Vector: The second item in the properties group is the derivative_variables_vector dataset. It is included only when gradients or Hessians are available. Like the ASV, it is two-dimensional. Each column of the DVV dataset corresponds to a continuous variable and contains a 0 or 1, indicating whether gradients and Hessians were computed with respect to that variaable for the evaluation. The 0th dimension has the evaluation Ids as a scale, and the 1st dimension has two scales. The 0th is the descriptors of the continuous variables. The 1st contains the variable Ids of the continuous variables.

Derivative Variables Vector
Description Values of the derivative variables vector in evaluations
Location properties/derivative_variables_vector
Shape 2-dimensional: number of evaluations by number of continuous variables
Type Integer
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String variables Variable descriptors

Analysis Components: The third member of the properties group is the analysis_components dataset. It is a 1D dataset that is present only when the user specified analysis components, and it contains those components as strings.

Analysis Components
Description Values of the analysis components in evaluations
Location properties/analysis_components
Shape 1-dimensional: number of analysis components
Type String

The final possible member of the properties group is the variable_parameters group. It is included only for models, which possess variables, and is described in a separate section below.

Metadata

Beginning with release 6.16, Dakota supports response metadata. If configured, metadata values are stored in the metadata dataset.

Variables
Description Values of metadata in evaluations
Location metadata
Shape 2-dimensional: number of evaluations by number of metadata
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String metadata Metadata descriptors

Selecting Models and Interfaces to Store

When HDF5 output is enabled (by including the hdf5 keyword), then by default evaluation data for the following components will be stored:

  • The model that belongs to the top-level method. (Currently, if the top-level method is a metaiterator such as method-hybrid, no model evaluation data will be stored.)
  • All simulation interfaces. (interfaces of type fork, system, direct, etc).

The user can override these defaults using the keywords model_selection and interface_selection.

The choices for model_selection are:

  • top_method : (default) Store evaluation data for the top method's model only.
  • all_methods : Store evaluation data for all models that belong directly to a method. Note that a these models may be recasts of user-specified models, not the user-specified models themselves.
  • all : Store evaluation data for all models.
  • none : Store evaluation data for no models.

The choices for interface_selection are:

  • simulation : (default) Store evaluation data for simulation interfaces.
  • all : Store evaluation data for all interfaces.
  • none : Store evaluation data for no interfaces.

If a model or interface is excluded from storage by these selections, then they cannot appear in the sources group for methods or models.

Distribution Parameters

Variables are characterized by parameters such as the mean and standard deviation or lower and upper bounds. Typically, users provide these parameters as part of their input to Dakota, but Dakota itself may also compute them as it scales and transforms variables, normalizes empirical distributions (e.g. for histogram_bin_uncertain variables), or calculates alternative parameterizations (lambda and zeta vs mean and standard deviation for a lognormal_uncertain).

Beginning with release 6.11, models write their variable’s parameters to HDF5. The information is located in each model's properties/variable_parameters subgroup. Within this group, parameters are stored by Dakota variable type (e.g. normal_uncertain), with one 1D dataset per type. The datasets have the same names as their variable types and have one element per variable. Parameters are stored by name.

Consider the following variable specification, which includes two normal and two uniform variables:

  variables 
    normal_uncertain 2 
      descriptors 'nuv_1' 'nuv_2' 
      means 0.0 1.0
      std_devations 1.0 0.5
    uniform_uncertain 2
    descriptors 'uuv_1' 'uuv_2'
      lower_bounds -1.0 0.0
      upper_bounds  1.0 1.0

Given this specification, and assuming a model ID of “tb_model”, Dakota will write two 1D datasets, both of length 2, to the group /models/simulation/tb_model/metadata/variable_parameters, the first named normal_uncertain, and the second named uniform_uncertain. Using a JSON-like representation for illustration, the normal_uncertain dataset will appear as:

  [
    {
      "mean": 0.0,
      "std_deviation": 1.0,
      "lower_bound": -inf, 
      "upper_bound": inf
    },
    { 
      "mean": 1.0,
      "std_deviation": 0.5,
      "lower_bound": -inf,
      "upper_bound": inf
    }
  ]

The uniform_uncertain dataset will contain:

  [
    {
      "lower_bound": -1.0, 
      "upper_bound":  1.0
    },
    { 
      "lower_bound": 0.0,
      "upper_bound": 1.0
    }
  ]

In these representations of the normal_uncertain and uniform_uncertain datasets, the outer square brackets ([]) enclose the dataset, and each element within the datasets are enclosed in curly braces ({}). The curly braces are meant to indicate that the elements are dictionary-like objects that support access by string field name. A bit more concretely, the following code snippet demonstrates reading the mean of the second normal variable, nuv_2.

1 import h5py
2 
3 with h5py.File("dakota_results.h5') as h:
4  model = h["/models/simulation/tb_model/"]
5  # nu_vars is the dataset that contains distribution parameters for
6  # normal_uncertain variables
7  nu_vars = model["variable_parameters/normal_uncertain"]
8  nuv_2_mu = nu_vars[1]["mean"] # 1 is the 0-based index of nuv_2, and
9  # "mean" is the name of the field where
10  # the mean is stored; nuv_2_mu now contains
11  # 1.0.

The feature in HDF5 that underlies this name-based storage of fields is compound datatypes, which are similar to C/C++ structs or Python dictionaries. Further information about how to work with compound datatypes is available in the h5py documentation.

Naming Conventions and Layout

In most cases, datasets for storing parameters have names that match their variable types. The normal_uncertain and uniform_uncertain datasets illustrated above are examples. Exceptions include types such as discrete_design_set, which has string, integer, and real subtypes. For these, the dataset name is the top-level type with _string, _int, or _real appended: discrete_design_set_string, discrete_design_set_int, and discrete_design_set_real.

Most Dakota variable types have scalar parameters. For these, the names of the parameters are generally the singular form of the associated Dakota keyword. For example, triangular_uncertain variables are characterized in Dakota input using the plural keywords modes, lower_bounds, and upper_bounds. The singular field names are, respectively, "mode", "lower_bound", and "upper_bound". In this case, all three parameters are real-valued and stored as floating point numbers, but variable types/fields can also be integer-valued (e.g. binomial_uncertain/num_trials) or string-valued.

Some variable/parameter fields contain 1D arrays or vectors of information. Consider histogram_bin_uncertain variables, for which the user specifies not just one value, but an ordered collection of abscissas and corresponding ordinates or counts. Dakota stores the abscissas in the "abscissas" field, which is a 1D dataset of floating-point numbers. It similarly stores the counts in the "counts" field. (In this case, only the normalized counts are stored, regardless of whether the user provided counts or ordinates.)

When the user specifies more than one histogram_bin_uncertain variable, it often is also necessary to include the pairs_per_variable keyword to divide the abscissa/count pairs among the variables. This raises the question of how lists of parameters that vary in length across the variables ought to be stored.

Although HDF5 supports variable-length datasets, for simplicity (and due to limitations in h5py at the time of the 6.11 release), Dakota stores vector parameter fields in conventional fixed-length datasets. The lengths of these datasets are determined at runtime in the following way: For a particular variable type and field, the field for all variables is sized to be large enough to accommodate the variable with the longest list of parameters. Any unused space for a particular variable is filled with NaN (if the parameter is real-valued), INTMAX (integer-valued), or an empty string (string-valued). In addition, each variable has an additional field, "num_elements", that reports the number of elements in the fields that contain actual data and not fill values.

Consider this example, in which the user has specified a pair of histogram_bin_uncertain variables. The first has 3 pairs, and the second has 4.

  variables
    histogram_bin_uncertain 2
      pairs_per_variable 2 3
      abscissas  0.0   0.5  1.0 
                -1.0  -0.5  0.5  1.0 
      counts     0.25  0.75 0.0 
                 0.2   0.4  0.2  0.0

For this specification, Dakota will write a dataset named histogram_bin_uncertain to the metadata/variable_parameters/ subgroup for the model. It will be of length 2, one element for each variable, and contain the following:

  [
    {
      "num_elements": 3,
      "abscissas": [0.0, 0.5, 1.0, NaN],
      "counts": [0.25, 0.75, 0.0, NaN]
    },
    {
      "num_elements": 4,
      "abscissas": [-1.0, -0.5, 0.5, 1.0],
      "counts": [0.2, 0.4, 0.2, 0.0]
    }
  ]

h5py Examples

The fields available for a variable parameters dataset can be determined in h5py by examining the datatype of the dataset.

1 import h5py
2 with h5py.File("dakota_results.h5") as h:
3  model = h["/models/simulation/NO_MODEL_ID/"]
4  md = model["metadata/variable_parameters"]
5  nu = md["normal_uncertain"]
6  nu_param_names = nu.dtype.names
7  # nu_param_names is a tuple of strings: ('mean', 'std_deviation',
8  # 'lower_bound', 'upper_bound')

Known Limitations

h5py has a known bug that prevents parameters for some types of variables from being accessed (the Python interpreter crashes with a segfault). These include:

  • histogram_point_uncertain string
  • discrete_uncertain_set string

Metadata

The variable parameter datasets have two dimension scales. The first (index 0) contains the variable descriptors, and the second (index 1) contains variable Ids. Available Parameters

Parameter Listing for All Types

The table below lists all Dakota variables and parameters that can be stored.

Distribution Parameters
Variable Type Parameter Name Type Rank
continuous_design lower_bound real scalar
upper_bound real scalar
discrete_design_range lower_bound integer scalar
upper_bound integer scalar
discrete_design_set_int num_elements integer scalar
elements integer vector
discrete_design_set_string num_elements integer scalar
elements string vector
discrete_design_set_real num_elements integer scalar
elements real vector
normal_uncertain mean real scalar
std_deviation real scalar
lower_bound real scalar
upper_bound real scalar
lognormal_uncertain lower_bound real scalar
upper_bound real scalar
mean real scalar
std_deviation real scalar
error_factor real scalar
lambda real scalar
zeta real scalar
uniform_uncertain lower_bound real scalar
upper_bound real scalar
loguniform_uncertain lower_bound real scalar
upper_bound real scalar
triangular_uncertain mode real scalar
lower_bound real scalar
upper_bound real scalar
exponential_uncertain beta real scalar
gamma_uncertain alpha real scalar
beta real scalar
gumbel_uncertain alpha real scalar
beta real scalar
frechet_uncertain alpha real scalar
beta real scalar
weibull_uncertain alpha real scalar
beta real scalar
histogram_bin_uncertain num_elements integer scalar
abscissas real vector
counts real vector
poisson_uncertain lambda real scalar
binomial_uncertain probability_per_trial real scalar
num_trials integer scalar
negative_binomial_uncertain probability_per_trial real scalar
num_trials integer scalar
geometric_uncertain probability_per_trial real scalar
hypergeometric_uncertain total_population integer scalar
selected_population integer scalar
num_drawn integer scalar
histogram_point_uncertain_int num_elements integer scalar
abscissas integer vector
counts real vector
histogram_point_uncertain_real num_elements integer scalar
abscissas real vector
counts real vector
continuous_interval_uncertain num_elements integer scalar
interval_probabilities real vector
lower_bounds real vector
upper_bounds real vector
discrete_interval_uncertain num_elements integer scalar
interval_probabilities real vector
lower_bounds integer vector
upper_bounds integer vector
discrete_uncertain_set_int num_elements integer scalar
elements integer vector
set_probabilities real vector
discrete_uncertain_set_real num_elements integer scalar
elements real vector
set_probabilities real vector
continuous_state lower_bound real scalar
upper_bound real scalar
discrete_state_range lower_bound integer scalar
upper_bound integer scalar
discrete_state_set_int num_elements integer scalar
elements integer vector
discrete_state_set_string num_elements integer scalar
elements string vector
discrete_state_set_real num_elements integer scalar
elements real vector