Dakota Reference Manual  Version 6.9
Explore and Predict with Confidence
 All Pages
Dakota HDF5 Output

Beginning with release 6.9, Dakota can write many method results such as the correlation matrices computed by sampling studies and the best parameters discovered by optimization methods to disk in HDF5. Many users may find this newly supported format more convenient than scraping or copying and pasting from Dakota's console output.

To enable HDF5 output, the results_output keyword with the hdf5 option must be added to the Dakota input file. In additon, Dakota must have been built with HDF5 support. HDF5 support is considered a somewhat experimental feature in this release, and therefore HDF5 is not enabled in binaries provided on the Download page of the Dakota website; building from source is necessary. See the instructions on the Dakota website.

HDF5 Concepts

HDF5 is a format that is widely used in scientific software for efficiently storing and organizing data. The HDF5 standard and libraries are maintained by the HDF Group.

In HDF5, data are stored in multidimensional arrays called datasets. Datasets are organized hierarchically in groups, which also can contain other groups. Datasets and groups are conceptually similar to files and directories in a filesystem. In fact, every HDF5 file contains at least one group, the root group, denoted "/", and groups and datasets are referred to using slash-delimited absolute or relative paths, which are more accurately called link names.

hdf5_layout.png
Example HDF5 Layout

HDF5 has as one goal that data be "self-documenting" through the use of metadata. Dakota output files include two kinds of metadata.

  • Dimension Scales. Each dimension of a dataset may have zero or more scales, which are themselves datasets. Scales are often used to provide, for example, labels analogous to column headings in a table (see the dimension scales that Dakota applies to moments) or numerical values of an indepenent variable (user-specified probability levels in level mappings).
  • Attributes. key:value pairs that annotate a group or dataset. A key is always a character string, such as dakota_version, and (in Dakota output) the value can be a string-, integer-, or real-valued scalar. Dakota stores the number of samples that were requested in a sampling study in the attribute 'samples'.

Accessing Results

Many popular programming languages have support, either natively or from a third-party library, for reading and writing HDF5 files. The HDF Group itself supports C/C++ and Java libraries. The Dakota Project suggests the h5py module for Python. Examples that demonstrate using h5py to access and use Dakota HDF5 output may be found in the Dakota installation at share/dakota/examples/hdf5.

Organization of Results

Currently, complete or nearly complete coverage of results from sampling, optimization and calibration methods, parameter studies, and stochastic expansions exists. Coverage will continue to expand in future releases to include not only the results of all methods, but other potentially useful information such as interface evaluations and model tranformations.

Methods in Dakota have a character string ID and are executed by Dakota one or more times. (Methods are executed more than once in studies that include a nested model, for example.) The ID may be provided by the user in the input file using the id_method keyword, or it may be automatically generated by Dakota. Dakota uses the label NO_ID for methods that are specified in the input file without an id_method, and NOSPEC_ID_<N> for methods that it generates for its own internal use. The <N> in the latter case is an incrementing integer that begins at 1.

The results for the <N>th execution of a method that has the label <method_id> are stored in the group

/methods/<method_id>/execution:<N>/

The /methods group is always present in Dakota HDF5 files, provided at least one method added results to the output. (In a future Dakota release, the top level groups /interfaces and /models will be added.) The group execution:1 also is always present, even if there is only a single execution.

The groups and datasets for each type of result that Dakota is currently capable of storing are described in the following sections. Every dataset is documented in its own table. These tables include:

  • A brief description of the dataset.
  • The location of the dataset relative to /methods/<method_id>/execution:<N>. This path may include both literal text that is always present and replacement text. Replacement text is <enclosed in angle brackets and italicized>. Two examples of replacement text are <response descriptor> and <variable descriptor>, which indicate that the name of a Dakota response or variable makes up a portion of the path.
  • Clarifying notes, where appropriate.
  • The type (String, Integer, or Real) of the information in the dataset.
  • The shape of the dataset; that is, the number of dimensions and the size of each dimension.
  • A description of the dataset's scales, which includes
    • The dimension of the dataset that the scale belongs to.
    • The type (String, Integer, or Real) of the information in the scale.
    • The label or name of the scale.
    • The contents of the scale. Contents that appear in plaintext are literal and will always be present in a scale. Italicized text describes content that varies.
    • notes that provide further clarification about the scale.
  • A description of the dataset's attributes, which are key:value pairs that provide helpful context for the dataset.

The Expected Output section of each method's keyword documentation indicates the kinds of output, if any, that method currently can write to HDF5. These are typically in the form of bulleted lists with clariying notes that refer back to the sections that follow.

Study Metadata

Several pieces of information about the Dakota study are stored as attributes of the top-level HDF5 root group ("/"). These include:

Study Attributes
Label Type Description
dakota_version String Version of Dakota used to run the study
dakota_revision String Dakota version control information
output_version String Version of the output file
input String Dakota input file
total_cpu_time Real Combined parent and child CPU time in seconds
parent_cpu_time Real Parent CPU time in seconds (when Dakota is built with UTILIB)
child_cpu_time Real Child CPU time in seconds (when Dakota is built with UTILIB)
total_wallclock_time Real Total wallclock time in seconds (when Dakota is built with UTILIB)
mpi_init_wallclock_time Real Wallclock time to MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)
run_wallclock_time Real Wallclock time since MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)
mpi_wallclock_time Real Wallclock time since MPI_Init in seconds (when Dakota is not built with UTILIB and run in parallel)

A Note about Variables Storage

Variables in most Dakota output (e.g. tabular data files) and input (e.g. imported data to construct surrogates) are listed in "input spec" order. (The variables keyword section is arranged by input spec order.) In this ordering, they are sorted first by function:

  1. Design
  2. Aleatory
  3. Epistemic
  4. State

And within each of these categories, they are sorted by domain:

  1. Continuous
  2. Discrete integer (sets and ranges)
  3. Discrete string
  4. Discrete real

A shortcoming of HDF5 is that datasets are homogeneous; for example, string- and real-valued data cannot readily be stored in the same dataset. As a result, Dakota has chosen to flip "input spec" order for HDF5 and sort first by domain, then by function when storing variable information. When applicable, there may be as many as four datasets to store variable information: one to store continuous variables, another to store discrete integer variables, and so on. Within each of these, variables will be ordered by function.

Sampling Moments

sampling produces moments (e.g. mean, standard deviation or variance) of all responses, as well as 95% lower and upper confidence intervals for the 1st and 2nd moments. These are stored as described below. When sampling is used in incremental mode by specifying refinement_samples, all results, including the moments group, are placed within groups named increment:<N>, where <N> indicates the increment number beginning with 1.

Moments
Description 1st through 4th moments for each response
Location [increment:<N>]/moments/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: length of 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Moment Confidence Intervals
Description Lower and upper 95% confidence intervals on the 1st and 2nd moments
Location moment_confidence_intervals/<response descriptor>
Shape 2-dimensional: 2x2
Type Real
Scales Dimension Type Label Contents Notes
0 String bounds lower, upper
1 String moments mean, std_deviation Only for standard moments
1 String moments mean, variance Only for central moments

Correlations

A few different methods produce information about the correlations between pairs of variables and responses (collectively: factors). The four tables in this section describe how correlation information is stored. One important note is that HDF5 has no special, native type for symmetric matrices, and so the simple correlations and simple rank correlations are stored in dense 2D datasets.

Simple Correlations
Description Simple correlation matrix
Location [increment:<N>]/simple_correlations
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2-dimensional: number of factors by number of factors
Type Real
Scales Dimension Type Label Contents Notes
0, 1 String moments Variable and response descriptors The scales for both dimensions are identical

Simple Rank Correlations
Description Simple rank correlation matrix
Location [increment:<N>]/simple_rank_correlations
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2-dimensional: number of factors by number of factors
Type Real
Scales Dimension Type Label Contents Notes
0, 1 String moments Variable and response descriptors The scales for both dimensions are identical

Partial Correlations
Description Partial correlations
Location [increment:<N>]/partial_correlations/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Partial Rank Correlations
Description Partial Rank correlations
Location [increment:<N>]/partial_rank_correlations/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Probability Density

Some aleatory UQ methods estimate the probability density of resposnes.

Probability Density
Description Probability density of a response
Location [increment:<N>]/probability_density/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of bins in probability density
Type Real
Scales Dimension Type Label Contents Notes
0 Real lower_bounds Lower bin edges
0 Real upper_bounds Upper bin edges

Level Mappings

Aleatory UQ methods can calculate level mappings (from user-specified probability, reliability, or generalized reliability to response, or vice versa).

Probability Levels
Description Response levels corresponding to user-specified probability levels
Location [increment:<N>]/probability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real probability_levels User-specified probability levels

Reliability Levels
Description Response levels corresponding to user-specified reliability levels
Location [increment:<N>]/reliability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real reliability_levels User-specified reliability levels

Generalized Reliability Levels
Description Response levels corresponding to user-specified generalized reliability levels
Location [increment:<N>]/gen_reliability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real gen_reliability_levels User-specified generalized reliability levels

Response Levels
Description Probability, reliability, or generalized reliability levels corresponding to user-specified response levels
Location [increment:<N>]/response_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real response_levels User-specified response levels

Variance-Based Decomposition (Sobol' Indices)

Dakota's sampling method can produce main and total effects; stochastic expansions (polynomial_chaos, stoch_collocation) additionally can produce interaction effects.

Main Effects
Description First-order Sobol' indices
Location main_effects/<response descriptor>
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Total Effects
Description Total-effect Sobol' indices
Location total_effects/<response descriptor>
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors
Each order (pair, 3-way, 4-way, etc) of interaction is stored in a separate dataset. The scales are unusual in that they are two-dimensional to contain the labels of the variables that participate in each interaction.
Interaction Effects
Description Sobol' indices for interactions
Location order_<N>_interactions/<response descriptor>
Shape 1-dimensional: number of Nth order interactions
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Descriptors of the variables in the interaction Scales for interaction effects are 2D datasets with the dimensions (number of interactions, N)

Integration and Expansion Moments

Stochastic expansion methods can obtain moments two ways.

Integration Moments
Description Moments obtained via integration
Location integration_moments/<response descriptor>
Shape 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Expansion Moments
Description Moments obtained via expansion
Location expansion_moments/<response descriptor>
Shape 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Extreme Responses

sampling with epistemic variables produces extreme values (minimum and maximum) for each response.

Extreme Responses
Description The sample minimum and maximum of each response
Location [increment:<N>]/extreme_responses/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2
Type Real
Scales Dimension Type Label Contents Notes
0 String extremes minimum, maximum

Parameter Sets

All parameter studies (vector_parameter_study, list_parameter_study, multidim_parameter_study, centered_parameter_study) record tables of evaluations (parameter-response pairs), similar to Dakota's tabular output file. Centered parameter studies additionally store evaluations in an order that is more natural to intepret, which is described below.

In the tabular-like listing, variables are stored according to the scheme described in a previous section.

Parameter Sets
Description Parameter study evaluations in a tabular-like listing
Location parameter_sets/{continuous_variables, discrete_integer_variables, discrete_string_variables, discrete_real_variables, responses}
Shape 2-dimensional: number of evaluations by number of variables or responses
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
1 String variables or responses Variable or response descriptors

Variable Slices

Centered paramter studies store "slices" of the tabular data that make evaluating the effects of each variable on each response more convenient. The steps for each individual variable, including the initial or center point, and corresponding responses are stored in separate groups.

Variable Slices
Description Steps, including center/initial point, for a single variable
Location variable_slices/<variable descriptor>/steps
Shape 1-dimensional: number of user-specified steps for this variable
Type Real, String, or Integer, as applicable

Variable Slices - Responses
Description Responses for variable slices
Location variable_slices/<variable descriptor>/responses
Shape 2-dimensional: number of evaluations by number of responses
Type Real
Scales Dimension Type Label Contents Notes
1 String responses Response descriptors

Best Parameters

Dakota's optimization and calibration methods report the parameters at the best point (or points, for multiple final solutions) discovered. These are stored using the scheme decribed in the variables section. When more than one solution is reported, the best parameters are nested in groups named set:<N>, where <N> is a integer numbering the set and beginning with 1.

State (and other inactive variables) are reported when using objective functions and for some calibration studies. However, when using configuration variables in a calibration, state variables are suppressed.

Best Parameters
Description Best parameters discovered by optimization or calibration
Location [set:<N>]/best_parameters/{continuous, discrete_integer, discrete_string, discrete_real}
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 2-dimensional: number of evaluations by number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Best Objective Functions

Dakota's optimization methods report the objective functions at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best objective functions are nested in groups named set:<N>, where <N> is a integer numbering the set and beginning with 1.

Best Objective Functions
Description Best objective functions discovered by optimization
Location [set:<N>]/best_objective_functions
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of objective functions
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Best Nonlinear Constraints

Dakota's optimization and calibration methods report the nonlinear constraints at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best constraints are nested in groups named set:<N>, where N is a integer numbering the set and beginning with 1.

Best Nonlinear Constraints
Description Best nonlinear constraints discovered by optimization or calibration
Location [set:<N>]/best_constraints
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of nonlinear constraints
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Calibration

When using calibration terms with an optimization method, or when using a nonlinear least squares method such as nl2sol, Dakota reports residuals and residual norms for the best point (or points, for multiple final solutions) discovered.

Best Residuals
Description Best residuals discovered
Location best_residuals
Shape 1-dimensional: number of residuals
Type Real

Best Residual Norm
Description Norm of best residuals discovered
Location best_norm
Shape Scalar
Type Real

Parameter Confidence Intervals

Least squares methods (nl2sol, nlssol_sqp, optpp_g_newton) compute confidence intervals on the calibration parameters.

Parameter Confidence Intervals
Description Lower and upper confidence intervals on calibrated parameters
Location confidence_intervals
Notes The confidence intervals are not stored when there is more than one experiment.
Shape 2-dimensional: 2x2
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable desriptors
1 String bounds lower, upper

Best Model Responses (without configuration variables)

When performing calibration with experimental data (but no configruation variables), Dakota records, in addition to the best residuals, the best original model resposnes.

Best Model Responses
Description Original model responses for the best residuals discovered
Location best_model_responses
Shape 1-dimensional: number of model responses
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Best Model Responses (with configuration variables)

When performing calibration with experimental data that includes configuration variables, Dakota reports the best model responses for each experiment. These results include the configuration variables, stored in the scheme described in the variables section, and the model responses.

Best Configuration Variables for Experiment
Description Configuration variables associated with experiment N
Location best_model_responses/experiment:<N>/{continuous_config_variables, discrete_integer_config_variables, discrete_string_config_variables, discrete_real_config_variables}
Shape 1-dimensional: number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Best Model Responses for Experiment
Description Original model responses for the best residuals discovered
Location best_model_responses/experiment:<N>/responses
Shape 1-dimensional: number of model responses
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors