Dakota Reference Manual  Version 6.4
Large-Scale Engineering Optimization and Uncertainty Analysis
 All Pages
calibration_data_file


Specify a text file containing calibration data for scalar responses

Specification

Alias: least_squares_data_file

Argument(s): STRING

Default: none

Required/Optional Description of Group Dakota Keyword Dakota Keyword Description
Optional
(Choose One)
tabular format (Group 1) annotated

Selects annotated tabular file format for experiment data

custom_annotated

Selects custom-annotated tabular file format for experiment data

freeform

Selects free-form tabular file format for experiment data

Optional num_experiments Add context to data: number of different experiments
Optional num_config_variables Add context to data: number of configuration variables.
Optional variance_type

Add context to experiment data description by specifying the type of experimental error.

Description

Enables text file import of experimental observations for use in calibration, for scalar responses only. Dakota will calibrate model variables to best match tese data. Key options include:

  • format: whether the data file is in annotated, custom_annotated, or freeform format
  • content: where num_experiments, num_config_variables, and variance_type indicate which columns appear in the data.

While some components may be omitted, the most complete version of a an annotated calibration data file could include columns corresponding to:

exp_id | configuration xvars | y data observations | y data variances

Each row in the file corresponds to an experiment or replicate observation of an experiment to be compared to the model output.

Usage Tips

  • The calibration_data_file used when ONLY scalar calibration terms are present. If there are field calibration terms, instead use calibration_data. For mixed scalar and field calibration terms, on may use the calibration_data specification, together with its sub-specification scalar_data_file, which uses the format described here.

Simple Case

In the simplest case, no data content descriptors are specified, so the data file must contain only the $ y^{Data} $ observations which represent a single experimental observation. In this case, the data file should have $ N_{terms} $ columns and 1 row, where $ N_{terms} $ is the value of calibration_terms.

For each function evaluation, Dakota will run the analysis driver, which must return $ N_{terms} $ model responses. Then the residuals are computed as:

\[ R_{i} = y^{Model}_i - y^{Data}_{i}. \]

These residuals can be weighted using weights.

With experimental variances

If information is known about the measurement error and the uncertainty in the measurement, that can be specified by sending the measurement error variance to Dakota. In this case, the keyword variance_type is added, followed by a string of variance types of length one or of length $N_{terms} $ , where $ N_{terms} $ is the value of calibration_terms. The variance_type for each response can be 'none' or 'scalar'. NOTE: you must specify the same variance_type for all scalar terms. That is, they will all be 'none' or all be 'scalar.'

For each response that has a 'scalar' variance type, each row of the datafile will now have $ N_{terms} $ of $ y $ data values followed by $ N_{terms} $ columns that specify the measurement error (in units of variance, not standard deviation of the measurement error) for $ y $ variances.

Dakota will run the analysis driver, which must return $ N_{terms} $ responses. Then the residuals are computed as:

\[ R_{i} = \frac{y^{Model}_i - y^{Data}_{i}}{\sqrt{{var}_i}} \]

for $ i = 1 \dots N_{terms} $.

Fully general case

In the most general case, the content of the data file is described by the arguments of three parameters. The parameters are optional, and defaults are described below.

  • num_experiments ( $ N_{exp} $ )

    Default: $ N_{exp} = 1 $

    This indicates that the data represents multiple experiments, where each experiment might be conducted with different values of configuration variables. An experiment can also be thought of as a replicate, where the experiments are run at the same values of the configuration variables.

  • num_config_variables ( $ N_{cfg} $ )

    This is not yet supported, but will specify the values of experimental conditions at which data were collected.

  • variance_type ('none' or 'scalar')

    This indicates if the data file contains variances for measurement error of the experimental data. The default is 'none'.

If the user does not specify variance_type, or if the variance_type = 'none', only the actual observations are specified in the calibration_data_file. If the user specifies variance_type = 'scalar', then the calibration_data_file must contain two times calibration_terms. The first calibration_terms columns are the experimental data, and the second calibration_terms columns are the experimental measurement error variance. For example, if the user has three calibration terms, and specifies variance_type = 'scalar', then the calibration data must contain six columns. The first three columns will contain the data, and the second three columns will contain the experimental error (in units of variance) for the data in the first three columns. These variances are used to weight the residuals in the sum-of-squares objective.

A more advanced use of the calibration_data_file might specify num_experiments $ N_E $ indicating that there are multiple experiments. When multiple experiments are present, Dakota will expand the number of residuals for the repeat measurement data and difference with the data accordingly. For example, if the user has five experiments in the example above with three calibration terms, the calibration_data_file would need to contain five rows (one for each experiment), and each row should contain three experimental data values that will be differenced with respect to the appropriate model response. In this example, $ N_E = 5 $. To summarize, Dakota will calculate the sum of the squared residuals as:

\[f = \sum_{i=1}^{N_E}R_{i}^2\]

where the residuals now are calculated as:

\[R_{i} = y^{Model}_i(\theta) - y^{Data}_{i}. \]