Dakota Reference Manual  Version 6.10
Explore and Predict with Confidence
 All Pages
Dakota HDF5 Output

Beginning with release 6.9, Dakota gained the ability to write many method results such as the correlation matrices computed by sampling studies and the best parameters discovered by optimization methods to disk in HDF5. In Dakota 6.10 and above, evaluation data (variables and responses for each model or interface evaluation) may also be written. Many users may find this newly supported format more convenient than scraping or copying and pasting from Dakota's console output.

To enable HDF5 output, the results_output keyword with the hdf5 option must be added to the Dakota input file. In additon, Dakota must have been built with HDF5 support. Beginning with Dakota 6.10, HDF5 is enabled in our publicly available downloads. HDF5 support is considered a somewhat experimental feature. The results of some Dakota methods are not yet written to HDF5, and in a few, limited situations, enabling HDF5 will cause Dakota to crash.

HDF5 Concepts

HDF5 is a format that is widely used in scientific software for efficiently storing and organizing data. The HDF5 standard and libraries are maintained by the HDF Group.

In HDF5, data are stored in multidimensional arrays called datasets. Datasets are organized hierarchically in groups, which also can contain other groups. Datasets and groups are conceptually similar to files and directories in a filesystem. In fact, every HDF5 file contains at least one group, the root group, denoted "/", and groups and datasets are referred to using slash-delimited absolute or relative paths, which are more accurately called link names.

hdf5_layout.png
Example HDF5 Layout

HDF5 has as one goal that data be "self-documenting" through the use of metadata. Dakota output files include two kinds of metadata.

  • Dimension Scales. Each dimension of a dataset may have zero or more scales, which are themselves datasets. Scales are often used to provide, for example, labels analogous to column headings in a table (see the dimension scales that Dakota applies to moments) or numerical values of an indepenent variable (user-specified probability levels in level mappings).
  • Attributes. key:value pairs that annotate a group or dataset. A key is always a character string, such as dakota_version, and (in Dakota output) the value can be a string-, integer-, or real-valued scalar. Dakota stores the number of samples that were requested in a sampling study in the attribute 'samples'.

Accessing Results

Many popular programming languages have support, either natively or from a third-party library, for reading and writing HDF5 files. The HDF Group itself supports C/C++ and Java libraries. The Dakota Project suggests the h5py module for Python. Examples that demonstrate using h5py to access and use Dakota HDF5 output may be found in the Dakota installation at share/dakota/examples/hdf5.

Organization of Results

Currently, complete or nearly complete coverage of results from sampling, optimization and calibration methods, parameter studies, and stochastic expansions exists. Coverage will continue to expand in future releases to include not only the results of all methods, but other potentially useful information such as interface evaluations and model tranformations.

Methods in Dakota have a character string Id and are executed by Dakota one or more times. (Methods are executed more than once in studies that include a nested model, for example.) The Id may be provided by the user in the input file using the id_method keyword, or it may be automatically generated by Dakota. Dakota uses the label NO_METHOD_ID for methods that are specified in the input file without an id_method, and NOSPEC_METHOD_ID_<N> for methods that it generates for its own internal use. The <N> in the latter case is an incrementing integer that begins at 1.

The results for the <N>th execution of a method that has the label <method Id> are stored in the group

/methods/<method Id>/results/execution:<N>/

The /methods group is always present in Dakota HDF5 files, provided at least one method added results to the output. (In a future Dakota release, the top level groups /interfaces and /models will be added.) The group execution:1 also is always present, even if there is only a single execution.

The groups and datasets for each type of result that Dakota is currently capable of storing are described in the following sections. Every dataset is documented in its own table. These tables include:

  • A brief description of the dataset.
  • The location of the dataset relative to /methods/<method Id>/execution:<N>. This path may include both literal text that is always present and replacement text. Replacement text is <enclosed in angle brackets and italicized>. Two examples of replacement text are <response descriptor> and <variable descriptor>, which indicate that the name of a Dakota response or variable makes up a portion of the path.
  • Clarifying notes, where appropriate.
  • The type (String, Integer, or Real) of the information in the dataset.
  • The shape of the dataset; that is, the number of dimensions and the size of each dimension.
  • A description of the dataset's scales, which includes
    • The dimension of the dataset that the scale belongs to.
    • The type (String, Integer, or Real) of the information in the scale.
    • The label or name of the scale.
    • The contents of the scale. Contents that appear in plaintext are literal and will always be present in a scale. Italicized text describes content that varies.
    • notes that provide further clarification about the scale.
  • A description of the dataset's attributes, which are key:value pairs that provide helpful context for the dataset.

The Expected Output section of each method's keyword documentation indicates the kinds of output, if any, that method currently can write to HDF5. These are typically in the form of bulleted lists with clariying notes that refer back to the sections that follow.

Study Metadata

Several pieces of information about the Dakota study are stored as attributes of the top-level HDF5 root group ("/"). These include:

Study Attributes
Label Type Description
dakota_version String Version of Dakota used to run the study
dakota_revision String Dakota version control information
output_version String Version of the output file
input String Dakota input file
total_cpu_time Real Combined parent and child CPU time in seconds
parent_cpu_time Real Parent CPU time in seconds (when Dakota is built with UTILIB)
child_cpu_time Real Child CPU time in seconds (when Dakota is built with UTILIB)
total_wallclock_time Real Total wallclock time in seconds (when Dakota is built with UTILIB)
mpi_init_wallclock_time Real Wallclock time to MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)
run_wallclock_time Real Wallclock time since MPI_Init in seconds (when Dakota is built with UTILIB and run in parallel)
mpi_wallclock_time Real Wallclock time since MPI_Init in seconds (when Dakota is not built with UTILIB and run in parallel)

A Note about Variables Storage

Variables in most Dakota output (e.g. tabular data files) and input (e.g. imported data to construct surrogates) are listed in "input spec" order. (The variables keyword section is arranged by input spec order.) In this ordering, they are sorted first by function:

  1. Design
  2. Aleatory
  3. Epistemic
  4. State

And within each of these categories, they are sorted by domain:

  1. Continuous
  2. Discrete integer (sets and ranges)
  3. Discrete string
  4. Discrete real

A shortcoming of HDF5 is that datasets are homogeneous; for example, string- and real-valued data cannot readily be stored in the same dataset. As a result, Dakota has chosen to flip "input spec" order for HDF5 and sort first by domain, then by function when storing variable information. When applicable, there may be as many as four datasets to store variable information: one to store continuous variables, another to store discrete integer variables, and so on. Within each of these, variables will be ordered by function.

Sampling Moments

sampling produces moments (e.g. mean, standard deviation or variance) of all responses, as well as 95% lower and upper confidence intervals for the 1st and 2nd moments. These are stored as described below. When sampling is used in incremental mode by specifying refinement_samples, all results, including the moments group, are placed within groups named increment:<N>, where <N> indicates the increment number beginning with 1.

Moments
Description 1st through 4th moments for each response
Location [increment:<N>]/moments/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: length of 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Moment Confidence Intervals
Description Lower and upper 95% confidence intervals on the 1st and 2nd moments
Location moment_confidence_intervals/<response descriptor>
Shape 2-dimensional: 2x2
Type Real
Scales Dimension Type Label Contents Notes
0 String bounds lower, upper
1 String moments mean, std_deviation Only for standard moments
1 String moments mean, variance Only for central moments

Correlations

A few different methods produce information about the correlations between pairs of variables and responses (collectively: factors). The four tables in this section describe how correlation information is stored. One important note is that HDF5 has no special, native type for symmetric matrices, and so the simple correlations and simple rank correlations are stored in dense 2D datasets.

Simple Correlations
Description Simple correlation matrix
Location [increment:<N>]/simple_correlations
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2-dimensional: number of factors by number of factors
Type Real
Scales Dimension Type Label Contents Notes
0, 1 String factors Variable and response descriptors The scales for both dimensions are identical

Simple Rank Correlations
Description Simple rank correlation matrix
Location [increment:<N>]/simple_rank_correlations
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2-dimensional: number of factors by number of factors
Type Real
Scales Dimension Type Label Contents Notes
0, 1 String factors Variable and response descriptors The scales for both dimensions are identical

Partial Correlations
Description Partial correlations
Location [increment:<N>]/partial_correlations/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Partial Rank Correlations
Description Partial Rank correlations
Location [increment:<N>]/partial_rank_correlations/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Probability Density

Some aleatory UQ methods estimate the probability density of resposnes.

Probability Density
Description Probability density of a response
Location [increment:<N>]/probability_density/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of bins in probability density
Type Real
Scales Dimension Type Label Contents Notes
0 Real lower_bounds Lower bin edges
0 Real upper_bounds Upper bin edges

Level Mappings

Aleatory UQ methods can calculate level mappings (from user-specified probability, reliability, or generalized reliability to response, or vice versa).

Probability Levels
Description Response levels corresponding to user-specified probability levels
Location [increment:<N>]/probability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real probability_levels User-specified probability levels

Reliability Levels
Description Response levels corresponding to user-specified reliability levels
Location [increment:<N>]/reliability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real reliability_levels User-specified reliability levels

Generalized Reliability Levels
Description Response levels corresponding to user-specified generalized reliability levels
Location [increment:<N>]/gen_reliability_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real gen_reliability_levels User-specified generalized reliability levels

Response Levels
Description Probability, reliability, or generalized reliability levels corresponding to user-specified response levels
Location [increment:<N>]/response_levels/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 1-dimensional: number of requested levels for the response
Type Real
Scales Dimension Type Label Contents Notes
0 Real response_levels User-specified response levels

Variance-Based Decomposition (Sobol' Indices)

Dakota's sampling method can produce main and total effects; stochastic expansions (polynomial_chaos, stoch_collocation) additionally can produce interaction effects.

Main Effects
Description First-order Sobol' indices
Location main_effects/<response descriptor>
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Total Effects
Description Total-effect Sobol' indices
Location total_effects/<response descriptor>
Shape 1-dimensional: number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors
Each order (pair, 3-way, 4-way, etc) of interaction is stored in a separate dataset. The scales are unusual in that they are two-dimensional to contain the labels of the variables that participate in each interaction.
Interaction Effects
Description Sobol' indices for interactions
Location order_<N>_interactions/<response descriptor>
Shape 1-dimensional: number of Nth order interactions
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Descriptors of the variables in the interaction Scales for interaction effects are 2D datasets with the dimensions (number of interactions, N)

Integration and Expansion Moments

Stochastic expansion methods can obtain moments two ways.

Integration Moments
Description Moments obtained via integration
Location integration_moments/<response descriptor>
Shape 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Expansion Moments
Description Moments obtained via expansion
Location expansion_moments/<response descriptor>
Shape 4
Type Real
Scales Dimension Type Label Contents Notes
0 String moments mean, std_deviation, skewness, kurtosis Only for standard moments
0 String moments mean, variance, third_central, fourth_central Only for central moments

Extreme Responses

sampling with epistemic variables produces extreme values (minimum and maximum) for each response.

Extreme Responses
Description The sample minimum and maximum of each response
Location [increment:<N>]/extreme_responses/<response descriptor>
Notes The [increment:<N>] group is present only for sampling with refinement
Shape 2
Type Real
Scales Dimension Type Label Contents Notes
0 String extremes minimum, maximum

Parameter Sets

All parameter studies (vector_parameter_study, list_parameter_study, multidim_parameter_study, centered_parameter_study) record tables of evaluations (parameter-response pairs), similar to Dakota's tabular output file. Centered parameter studies additionally store evaluations in an order that is more natural to intepret, which is described below.

In the tabular-like listing, variables are stored according to the scheme described in a previous section.

Parameter Sets
Description Parameter study evaluations in a tabular-like listing
Location parameter_sets/{continuous_variables, discrete_integer_variables, discrete_string_variables, discrete_real_variables, responses}
Shape 2-dimensional: number of evaluations by number of variables or responses
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
1 String variables or responses Variable or response descriptors

Variable Slices

Centered paramter studies store "slices" of the tabular data that make evaluating the effects of each variable on each response more convenient. The steps for each individual variable, including the initial or center point, and corresponding responses are stored in separate groups.

Variable Slices
Description Steps, including center/initial point, for a single variable
Location variable_slices/<variable descriptor>/steps
Shape 1-dimensional: number of user-specified steps for this variable
Type Real, String, or Integer, as applicable

Variable Slices - Responses
Description Responses for variable slices
Location variable_slices/<variable descriptor>/responses
Shape 2-dimensional: number of evaluations by number of responses
Type Real
Scales Dimension Type Label Contents Notes
1 String responses Response descriptors

Best Parameters

Dakota's optimization and calibration methods report the parameters at the best point (or points, for multiple final solutions) discovered. These are stored using the scheme decribed in the variables section. When more than one solution is reported, the best parameters are nested in groups named set:<N>, where <N> is a integer numbering the set and beginning with 1.

State (and other inactive variables) are reported when using objective functions and for some calibration studies. However, when using configuration variables in a calibration, state variables are suppressed.

Best Parameters
Description Best parameters discovered by optimization or calibration
Location [set:<N>]/best_parameters/{continuous, discrete_integer, discrete_string, discrete_real}
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 2-dimensional: number of evaluations by number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Best Objective Functions

Dakota's optimization methods report the objective functions at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best objective functions are nested in groups named set:<N>, where <N> is a integer numbering the set and beginning with 1.

Best Objective Functions
Description Best objective functions discovered by optimization
Location [set:<N>]/best_objective_functions
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of objective functions
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Best Nonlinear Constraints

Dakota's optimization and calibration methods report the nonlinear constraints at the best point (or points, for multiple final solutions) discovered. When more than one solution is reported, the best constraints are nested in groups named set:<N>, where N is a integer numbering the set and beginning with 1.

Best Nonlinear Constraints
Description Best nonlinear constraints discovered by optimization or calibration
Location [set:<N>]/best_constraints
Notes The [set:<N>] group is present only when multiple final solutions are reported.
Shape 1-dimensional: number of nonlinear constraints
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Calibration

When using calibration terms with an optimization method, or when using a nonlinear least squares method such as nl2sol, Dakota reports residuals and residual norms for the best point (or points, for multiple final solutions) discovered.

Best Residuals
Description Best residuals discovered
Location best_residuals
Shape 1-dimensional: number of residuals
Type Real

Best Residual Norm
Description Norm of best residuals discovered
Location best_norm
Shape Scalar
Type Real

Parameter Confidence Intervals

Least squares methods (nl2sol, nlssol_sqp, optpp_g_newton) compute confidence intervals on the calibration parameters.

Parameter Confidence Intervals
Description Lower and upper confidence intervals on calibrated parameters
Location confidence_intervals
Notes The confidence intervals are not stored when there is more than one experiment.
Shape 2-dimensional: 2x2
Type Real
Scales Dimension Type Label Contents Notes
0 String variables Variable desriptors
1 String bounds lower, upper

Best Model Responses (without configuration variables)

When performing calibration with experimental data (but no configruation variables), Dakota records, in addition to the best residuals, the best original model resposnes.

Best Model Responses
Description Original model responses for the best residuals discovered
Location best_model_responses
Shape 1-dimensional: number of model responses
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Best Model Responses (with configuration variables)

When performing calibration with experimental data that includes configuration variables, Dakota reports the best model responses for each experiment. These results include the configuration variables, stored in the scheme described in the variables section, and the model responses.

Best Configuration Variables for Experiment
Description Configuration variables associated with experiment N
Location best_model_responses/experiment:<N>/{continuous_config_variables, discrete_integer_config_variables, discrete_string_config_variables, discrete_real_config_variables}
Shape 1-dimensional: number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 String variables Variable descriptors

Best Model Responses for Experiment
Description Original model responses for the best residuals discovered
Location best_model_responses/experiment:<N>/responses
Shape 1-dimensional: number of model responses
Type Real
Scales Dimension Type Label Contents Notes
0 String responses Response descriptors

Organization of Evaluations

An evaluation is a mapping from variables to responses performed by a Dakota model or interface. Beginning with release 6.10, Dakota has the ability to report evaluation history in HDF5 format. The HDF5 format offers many advantages over existing console output and tabular output. Requring no "scraping", it is more convenient for most users than the former, and being unrestricted to a two-dimensional, tabular arragnment of information, it is far richer than the latter.

This section begins by describing the Dakota components that can generate evaluation data. It then documents the high-level organization of the data from those components. Detailed documentation of the individual datasets (the "low-level" organization) where data are stored follows. Finally, information is provided concerning input keywords that control which components report evaluations.

Sources of Evaluation Data

Evaluation data are produced by only two kinds of components in Dakota: models and interfaces. The purpose of this subsection is to provide a basic description of models and interfaces for the purpose of equipping users to manage and understand HDF5-format evaluation data.

Because interfaces and models must be specified in even simple Dakota studies, most novice users of Dakota will have some familiarity with these concepts. However, the exact nature of the relationship between methods, models, and interfaces may be unclear. Moreover, the models and interfaces present in a Dakota study are not always limited to those specified by the user. Some input keywords or combinations of components cause Dakota to create new models or interfaces "behind the scenes" and without the user's direct knowledge. Not only can user-specified models and interfaces write evaluation data to HDF5, but also these auto-generated components. Accordingly, it may be helpful for consumers of Dakota's evaluation data to have a basic understanding of how Dakota creates and employs models and interfaces.

Consider first the input file shown here.

environment
  tabular_data
  results_output
    hdf5

method
  id_method 'sampling'
  sampling
    samples 20
  model_pointer 'sim'

model
  id_model 'sim'
  single
  interface_pointer 'tb'

variables
  uniform_uncertain 2
    descriptors 'x1' 'x2'
    lower_bounds 0.0 0.0
    upper_bounds 1.0 1.0

responses
  response_functions 1
    descriptors 'f'
  no_gradients
  no_hessians

interface
  id_interface 'tb'
  fork
    analysis_drivers 'text_book'

This simple input file specifies a single method of type sampling, which also has the Id 'sampling'. The 'sampling' method possesses a model of type single (alias simulation) named 'sim', which it uses to perform evaluations. (Dakota would have automatically generated a single model had one not been specified.) That is to say, for each variables-to-response mapping required by the method, it provides variables to the model and receives back responses from it.

Single/simulation models like 'sim' perform evaluations by means of an interface, typically an interface to an external simulation. In this case, the interface is 'tb'. The model passes the variables to 'tb', which executes the text_book driver, and receives back responses.

It is clear that two components produce evaluation data in this study. The first is the single model 'sim', which receives and fulfills evaluation requests from the method 'sampling', and the second is the interface 'tb', which similarly receives requests from 'sim' and fulfills them by running the text_book driver.

Because tabular data was requested in the environment block, a record of the model's evaluations will be reported to a tabular file. The interface's evaluations could be dumped from the restart file using dakota_restart_util.

If we compared these evaluation histories from 'sim' and 'tb', we would see that they are identical to one another. The model 'sim' is a mere "middle man" whose only responsibility is passing variables from the method down to the interface, executing the interface, and passing responses back up to the method. However, this is not always the case.

For example, if this study were converted to a gradient-based optimzation using optpp_q_newton, and the user specified numerical_gradients :

# model and interface same as above. Replace the method, variables, and responses with:

method
  id_method 'opt'
  optpp_q_newton

variables
  continuous_design 2
    descriptors 'x1' 'x2'
    lower_bounds 0.0 0.0
    upper_bounds 1.0 1.0

responses
   objective_functions 1
    descriptors 'f'
  numerical_gradients
  no_hessians

Then the model would have the responsibility of performing finite differencing to estimate gradients of the response 'f' requested by the method. Multiple function evaluations of 'tb' would map to a single gradient evaluation at the model level, and the evaluation histories of 'sim' and 'tb' would contain different information.

Note that because it is unwieldy to report gradients (or Hessians) in a tabular format, they are not written to the tabular file, and historically were avialable only in the console output. The HDF5 format provides convenient access to both the "raw" evaluations performed by the interface and higher level model evaluations that include estimated gradients.

This pair of examples hopefully provides a basic understanding of the flow of evaluation data between a method, model, and interface, and explains why models and interfaces are producers of evaluation data.

Next consider a somewhat more complex study that includes a Dakota model of type surrogate. A surrogate model performs evaluations requested by a method by executing a special kind of interface called an approximation interface, which Dakota implicitly creates without the direct knowledge of the user. Approximation interfaces are a generic container for the various kinds of surrogates Dakota can use, such as gaussian processes.

A Dakota model of type global surrogate may use a user-specified dace method to construct the actual underlying model(s) that it evaluates via its approximation interface. The dace method will have its own model (typically of type single/simulation), which will have a user-specified interface.

In this more complicated case there are at least four components that produce evaluation data: (1) the surrogate model and (2) its approximation interface, and (3) the dace method's model and (4) its interface. Although only components (1), (3), and (4) are user-specified, evaluation data produced by (2) may be written to HDF5, as well. (As explained below, only evaluations performed by the surrogate model and the dace interface will be recorded by default. This can be overriden using hdf5 sub-keywords.) This is an example where "extra" and potentially confusing data appears in Dakota's output due to an auto-generated component.

An important family of implicitly-created models is the recast models, which have the responsibility of transforming variables and responses. One type of recast called a data transform model is responsible for computing residuals when a user provides experimental data in a calibration study. Scaling recast models are employed when scaling is requested by the user for variables and/or responses.

Recast models work on the principle of function composition, and "wrap" a submodel, which may itself also be a recast model. The innermost model in the recursion often will be the simulation or surrogate model specified by the user in the input file. Dakota is capable of recording evaluation data at each level of recast.

High-level Organization of Evaluation Data

This subsection describes how evaluation data produced by models and interfaces are organized at high level. A detailed description of the datasets and subgroups that contain evaluation data for a specific model or interface is given in the next subsection.

Two top level groups contain evaluation data, /interfaces and /models.

Interfaces

Because interfaces can be executed by more than one model, interface evaluations are more precisely thought of as evaluations of an interface/model combination. Consequently, interface evaluations are grouped not only by interface Id ('tb' in the example above), but also the Id of the model that requested them ('sim').

/interfaces/<interface Id>/<model Id>/

If the user does not provide an Id for an interface that he specifies, Dakota assigns it the Id NO_ID. Approximation interfaces receive the Id APPROX_INTERFACE_<N>, where N is an incrementing integer beginning at 1. Other kinds of automatically generated interfaces are named NOSPEC_INTERFACE_ID_<N>.

Models

The top-level group for model evaluations is /models. Within this group, model evaluations are grouped by type: simulation, surrogate, nested, or recast, and then by model Id. That is:

/models/<type>/<model Id>/    

Similar to interfaces, user-specified models that lack an Id are given one by Dakota. A single model is named NO_MODEL_ID. Some automatically generated models receive the name NOSPEC_MODEL_ID.

Recast models are a special case and receive the name RECAST_<WRAPPED-MODEL>_<TYPE>_<N>. In this string:

  • WRAPPED-MODEL is the Id of the innermost wrapped model, typically a user-specified model
  • TYPE is the specific kind of recast. The three most common recasts are:
    • RECAST: several generic responsibilities, including summing objective functions to present to a single-objective optimizer
    • DATA_TRANSFORM: Compute residuals in a calibration
    • SCALING: scale variables and responses
  • N is an incrementing integer that begins with 1. It is employed to distinguish recasts of the same type that wrap the same underlying model.

The model's evaluations may be the result of combining information from multiple sources. A simulation/single model will receive all the information it requires from its interface, but more complicated model types may use information not only from interfaces, but also other models and the results of method executions. Nested models, for instance, receive information from a submethod (the mean of a response from a sampling study, for instance) and potentially also an optional interface.

The sources of a model's evaluations may be roughly identified by examining the contents of that models' sources group. The sources group contains softlinks (note: softlinks are an HDF5 feature analogous to soft or symbolic links on many file systems) to groups for the interfaces, models, or methods that the model used to produce its evaluation data. (At this time, Dakota does not report the specific interface or model evaluations or method executions that were used to produce a specific model evaluation, but this is a planned feature.)

Method results likewise have a sources group that identifies the models or methods employed by that method. By following the softlinks contained in a method's or model's sources group, it is possible to "drill down" from a method to its ultimate sources of information. In the sampling example above, interface evaluations performed via the 'sim' model at the request of the 'sampling' method could be obtained at the HDF5 path: /methods/sampling/sources/sim/sources/tb/

Low-Level Organization of Evaluation Data

The evaluation data for each interface and model are stored using the same schema in a collection of groups and datasets that reside within that interface or model's high-level location in the HDF5 file. This section describes that "low-level" schema.

Data are divided first of all into variables, responses, and metadata groups.

Variables

The variables group contains datasets that store the variables information for each evaluation. Four datasets may be present, one for each "domain": continuous, discrete_integer, discrete_string, and discrete_real. These datasets are two-dimensional, with a row (0th dimension) for each evaluation and a column (1st dimension) for each variable. The 0th dimension has one dimension scale for the integer-valued evaluation Id. The 1st dimension has two scales. The 0th scale contains descriptors of the variables, and the 1st contains their variable Ids. In this context, the Ids are a 1-to-N ranking of the variables in Dakota "input spec" order.

Variables
Description Best parameters discovered by optimization or calibration
Location variables/{continuous, discrete_integer, discrete_string, discrete_real}
Shape 2-dimensional: number of evaluations by number of variables
Type Real, String, or Integer, as applicable
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String variables Variable descriptors
1 Integer variables Variable Ids 1-to-N rank of the variable in Dakota input spec order

Responses

The responses group contains datasets for functions and, when available, gradients and Hessians.

Functions: The functions dataset is two-dimensional and contains function values for all responses. Like the variables datasets, evaluations are stored along the 0th dimension, and responses are stored along the 1st. The evaluation Ids and response descriptors are attached as scales to these axes, respectively.

Variables
Description Values of functions in evaluations
Location responses/functions
Shape 2-dimensional: number of evaluations by number of responses
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Gradients: The gradients dataset is three-dimensional. It has the shape $ evaluations \times responses \times variables$. Dakota supports a specification of mixed_gradients, and the gradients dataset is sized and organized such that only those responses for which gradients are available are stored. When mixed_gradients are employed, a response will not necessarily have the same index in the functions and gradients datasets.

Because it is possible that the gradient could be computed with respect to any of the continuous variables, active or inactive, that belong to the associated model, the gradients dataset is sized to accomodate gradients taken with respect to all continuous variables. Components that were not included in a particular evaluation will be set to NaN (not a number), and the derivative_variables_vector (in the matadata group) for that evaluation can be examined as well.

Gradients
Description Values of gradients in evaluations
Location responses/gradients
Shape 3-dimensional: number of evaluations by number of responses by number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Hessians: Hessians are stored in a four-dimensional dataset, $ evaluations \times responses \times \times variables \times variables $. The hessians dataset shares many of the characteristics with the gradients: in the mixed_hessians case, it will be smaller in the response dimension than the functions dataset, and unrequested components are set to NaN.

Hessians
Description Values of Hessians in evaluations
Location responses/hessians
Shape 4-dimensional: number of evaluations by number of responses by number of variables by number of variables
Type Real
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Metadata

The metadata group contains up to three datasets.

Active Set Vector: The first is the active_set_vector. It is two dimensional, with rows corresponding to evaluations and columns corresponding to responses. Each element contains an integer in the range 0-7, which indicates the request (function, gradient, Hessian) for the corresponding response for that evaluation. The 0th dimension has the evaluations Ids scale, and the 1st dimension has two scales: the response descriptors and the "default" or "maximal" ASV, an integer 0-7 for each response that indicates the information (function, gradient, Hessian) that possibly could have been requested during the study.

Active Set Vector
Description Values of the active set vector in evaluations
Location metadata/active_set_vector
Shape 2-dimensional: number of evaluations by number of responses
Type Integer
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String responses Response descriptors

Derivative Variables Vector: The second dataset in the metadata group is the derivative_variables_vector dataset. It is included only when gradients or Hessians are available. Like the ASV, it is two-dimensional. Each column of the DVV dataset corresponds to a continuous variable and contains a 0 or 1, indicating whether gradients and Hessians were computed with respect to that variaable for the evaluation. The 0th dimension has the evaluation Ids as a scale, and the 1st dimension has two scales. The 0th is the descriptors of the continuous variables. The 1st contains the variable Ids of the continuous variables.

Derivative Variables Vector
Description Values of the derivative variables vector in evaluations
Location metadata/derivative_variables_vector
Shape 2-dimensional: number of evaluations by number of continuous variables
Type Integer
Scales Dimension Type Label Contents Notes
0 Integer evaluation_ids Evaluation Ids
1 String variables Variable descriptors

Analysis Components: The final dataset in the metadata group is the analysis_components dataset. It is a 1D dataset that is present only when the user specified analysis components, and it contains those components as strings.

Analysis Components
Description Values of the analysis components in evaluations
Location metadata/analysis_components
Shape 1-dimensional: number of analysis components
Type String

Selecting Models and Interfaces to Store

When HDF5 output is enabled (by including the hdf5 keyword), then by default evaluation data for the following components will be stored:

  • The model that belongs to the top-level method. (Currently, if the top-level method is a metaiterator such as method-hybrid, no model evaluation data will be stored.)
  • All simulation interfaces. (interfaces of type fork, system, direct, etc).

The user can override these defaults using the keywords model_selection and interface_selection.

The choices for model_selection are:

  • top_method : (default) Store evaluation data for the top method's model only.
  • all_methods : Store evaluation data for all models that belong directly to a method. Note that a these models may be recasts of user-specified models, not the user-specified models themselves.
  • all : Store evaluation data for all models.
  • none : Store evaluation data for no models.

The choices for interface_selection are:

  • simulation : (default) Store evaluation data for simulation interfaces.
  • all : Store evaluation data for all interfaces.
  • none : Store evaluation data for no interfaces.

If a model or interface is excluded from storage by these selections, then they cannot appear in the sources group for methods or models.