Interfacing with Dakota as a Library

Dakota Library Table of Contents

Introduction

It is possible to link the Dakota toolkit into another application for use as an algorithm library. This section describes facilities which permit this type of integration.

When compiling Dakota with CMake, files in Dakota/src (excepting the main.cpp, restart_util.cpp, and library_mode.cpp main programs) are compiled into libraries that get installed to CMAKE_INSTALL_PREFIX/lib. C/C++ code is in the library dakota_src, while Fortran code lives in the dakota_src_fortran library. Applications may link against these Dakota libraries by specifying appropriate include and link directives. Depending on the configuration used when building this library, other libraries for the vendor optimizers and vendor packages will also be needed to resolve Dakota symbols for DOT, NPSOL, OPT++, NCSUOpt, LHS, Teuchos, etc. Copies of these libraries are also placed in Dakota/lib. Refer to Linking against the Dakota library for additional information.

Warning:
Users may interface to Dakota as a library within other software applications provided that they abide by the terms of the GNU Lesser General Public License (LGPL). Refer to http://www.gnu.org/licenses/lgpl.html or contact the Dakota team for additional information.
Attention:
The use of Dakota as an algorithm library should be distinguished from the linking of simulations within Dakota using the direct application interface (see DirectApplicInterface). In the former, Dakota is providing algorithm services to another software application, and in the latter, a linked simulation is providing analysis services to Dakota. It is not uncommon for these two capabilities to be used in combination, where a simulation framework provides both the "front end" and the "back end" for Dakota.

Quick start: examples and test code

To learn by example, refer to the files PluginSerialDirectApplicInterface.[CH] and PluginParallelDirectApplicInterface.[CH] in Dakota/src for simple examples of serial and parallel plug-in interfaces. The file library_mode.cpp in Dakota/src provides example usage of these plug-ins within a mock simulator program that demonstrates the required object instantiation syntax in combination with the three problem database population approaches (input file parsing, data node insertion, and mixed mode). All of this code may be compiled and tested by configuring Dakota using the --with-plugin option.

Comparison to main.cpp

The procedure for utilizing Dakota as a library within another application involves a number of steps that are similar to those used in the stand-alone Dakota application. The stand-alone procedure can be viewed in the file main.cpp, and the differences for the library approach are most easily explained with reference to that file. The basic steps of executing Dakota include instantiating the ParallelLibrary, CommandLineHandler, and ProblemDescDB objects; managing the Dakota input file (ProblemDescDB::manage_inputs()); specifying restart files and output streams (ParallelLibrary::specify_outputs_restart()); and instantiating the Strategy and running it (Strategy::run_strategy()). When using Dakota as an algorithm library, the operations are quite similar, although command line information (argc, argv, and therefore CommandLineHandler) will not in general be accessible. In particular, main.cpp can pass argc and argv into the ParallelLibrary and CommandLineHandler constructors and then pass the CommandLineHandler object into ProblemDescDB::manage_inputs() and ParallelLibrary::specify_outputs_restart(). In an algorithm library approach, a CommandLineHandler object is not instantiated and overloaded forms of the ParallelLibrary constructor, ProblemDescDB::manage_inputs(), and ParallelLibrary::specify_outputs_restart() are used.

The overloaded forms of these functions are as follows. For instantiation of the ParallelLibrary object, the default constructor may be used. This constructor assumes that MPI is administered by the parent application such that the MPI configuration will be detected rather than explicitly created (i.e., Dakota will not call MPI_Init or MPI_Finalize). In code, the instantiation

  ParallelLibrary parallel_lib(argc, argv);

is replaced with

  ParallelLibrary parallel_lib;

In the case of specifying restart files and output streams, the call to

  parallel_lib.specify_outputs_restart(cmd_line_handler);

should be replaced with its overloaded form in order to pass the required information through the parameter list

  parallel_lib.specify_outputs_restart(std_output_filename, std_error_filename,
    read_restart_filename, write_restart_filename, stop_restart_evals);

where file names for standard output and error and restart read and write as well as the integer number of restart evaluations are passed through the parameter list rather than read from the command line of the main Dakota program. The definition of these attributes is performed elsewhere in the parent application (e.g., specified in the parent application input file or GUI). In this function call, specify NULL for any files not in use, which will elicit the desired subset of the following defaults: standard output and standard error are directed to the terminal, no restart input, and restart output to file dakota.rst. The stop_restart_evals specification is an optional parameter with a default of 0, which indicates that restart processing should process all records. If no overrides of these defaults are intended, the call to specify_outputs_restart() may be omitted entirely.

With respect to alternate forms of ProblemDescDB::manage_inputs(), the following section describes different approaches to populating data within Dakota's problem description database. It is this database from which all Dakota objects draw data upon instantiation.

Problem database population

Now that the ProblemDescDB object has been instantiated, we must populate it with data, either via parsing an input file, direct data insertion, or a mixed approach, as described in the following sections.

Input file parsing

The simplest approach to linking an application with the Dakota library is to rely on Dakota's normal parsing system to populate Dakota's problem database (ProblemDescDB) through the reading of an input file. The disadvantage to this approach is the requirement for an additional input file beyond those already required by the parent application.

In this approach, the main.cpp call to

  problem_db.manage_inputs(cmd_line_handler);

would be replaced with its overloaded form

  problem_db.manage_inputs(dakota_input_file);

where the file name for the Dakota input is passed through the parameter list rather than read from the command line of the main Dakota program. Again, the definition of the Dakota input file name is performed elsewhere in the parent application (e.g., specified in the parent application input file or GUI). Refer to run_dakota_parse() in library_mode.cpp for a complete example listing.

ProblemDescDB::manage_inputs() invokes ProblemDescDB::parse_inputs() (which in turn invokes ProblemDescDB::check_input()), ProblemDescDB::broadcast(), and ProblemDescDB::post_process(), which are lower level functions that will be important in the following two sections. Thus, the input file parsing approach may employ a single coarse grain function to coordinate all aspects of problem database population, whereas the two approaches to follow will use lower level functions to accomplish a finer grain of control.

Data node insertion

This approach is more involved than the previous approach, but it allows the application to publish all needed data to Dakota's database directly, thereby eliminating the need for the parsing of a separate Dakota input file. In this case, ProblemDescDB::manage_inputs() is not called. Rather, DataStrategy, DataMethod, DataModel, DataVariables, DataInterface, and DataResponses objects are instantiated and populated with the desired problem data. These objects are then published to the problem database using ProblemDescDB::insert_node(), e.g.:

  // instantiate the data object
  DataMethod data_method;

  // set the attributes within the data object
  data_method.methodName = "nond_sampling";
  ...

  // publish the data object to the ProblemDescDB
  problem_db.insert_node(data_method);

The data objects are populated with their default values upon instantiation, so only the non-default values need to be specified. Refer to the DataStrategy, DataMethod, DataModel, DataVariables, DataInterface, and DataResponses class documentation and source code for lists of attributes and their defaults.

The default strategy is single_method, which runs a single iterator on a single model, and the default model is single, so it is not necessary to instantiate and publish a DataStrategy or DataModel object if advanced multi-component capabilities are not required. Rather, instantiation and insertion of a single DataMethod, DataVariables, DataInterface, and DataResponses object is sufficient for basic Dakota capabilities.

Once the data objects have been published to the ProblemDescDB object, calls to

  problem_db.check_input();
  problem_db.broadcast();
  problem_db.post_process();

will perform basic database error checking, broadcast a packed MPI buffer of the specification data to other processors, and post-process specification data to fill in vector defaults (scalar defaults are handled in the Data class constructors), respectively. For parallel applications, processor rank 0 should be responsible for Data node population and insertion and the call to ProblemDescDB::check_input(), and all processors should participate in ProblemDescDB::broadcast() and ProblemDescDB::post_process(). Moreover, preserving the order shown assures that large default vectors are not transmitted by MPI. Refer to run_dakota_data() in library_mode.cpp for a complete example listing.

Mixed mode

In this case, we will combine the parsing of a Dakota input file with some direct database updates. The motivation for this approach arises in large-scale applications where large vectors can be awkward to specify in a Dakota input file. The first step is to parse the input file, but rather than using

  problem_db.manage_inputs(dakota_input_file);

as described in Input file parsing, we will use the lower level function

  problem_db.parse_inputs(dakota_input_file);

to provide a finer grain of control. The passed input file dakota_input_file must contain all required inputs. Since vector data like variable values/bounds/tags, linear/nonlinear constraint coefficients/bounds, etc. are optional, these potentially large vector specifications can be omitted from the input file. Only the variable/response counts, e.g.:

method
        linear_inequality_constraints = 500

variables
        continuous_design = 1000

responses
        objective_functions = 1
        nonlinear_inequality_constraints = 100000

are required in this case. To update the data omissions from their defaults, one uses the ProblemDescDB::set() family of overloaded functions, e.g.

  Dakota::RealVector drv(1000, 1.); // vector of length 1000, values initialized to 1.
  problem_db.set("variables.continuous_design.initial_point", drv);

where the string identifiers are the same identifiers used when pulling information from the database using one of the get_<datatype>() functions (refer to the source code of ProblemDescDB.cpp for a full list). However, the supported ProblemDescDB::set() options are a restricted subset of the database attributes, focused on vector inputs that can be large scale.

If performing these updates within the constructor of a DirectApplicInterface extension/derivation (see Defining the direct application interface), then this code is sufficient since the database is unlocked, the active list nodes of the ProblemDescDB have been set for you, and the correct strategy/method/model/variables/interface/responses specification instance will get updated. The difficulty in this case stems from the order of instantiation. Since the Variables and Response instances are constructed in the base Model class, prior to construction of Interface instances in derived Model classes, database information related to Variables and Response objects will have already been extracted by the time the Interface constructor is invoked and the database update will not propagate.

Therefore, it is preferred to perform these operations at a higher level (e.g., within your main program), prior to Strategy instantiation and execution, such that instantiation order is not an issue. However, in this case, it is necessary to explicitly manage the list nodes of the ProblemDescDB using a specification instance identifier that corresponds to an identifier from the input file, e.g.:

  problem_db.set_db_variables_node("MY_VARIABLES_ID");
  Dakota::RealVector drv(1000, 1.); // vector of length 1000, values initialized to 1.
  problem_db.set("variables.continuous_design.initial_point", drv);

Alternatively, rather than setting just a single data node, all data nodes may be set using a method specification identifier:

  problem_db.set_db_list_nodes("MY_METHOD_ID");

since the method specification is responsible for identifying a model specification, which in turn identifies variables, interface, and responses specifications. If hardwiring specification identifiers is undesirable, then

  problem_db.resolve_top_method();

can also be used to deduce the active method specification and set all list nodes based on it. This is most appropriate in the case where only single specifications exist for method/model/variables/interface/responses. In each of these cases, setting list nodes unlocks the corresponding portions of the database, allowing set/get operations.

Once all direct database updates have been performed in this manner, calls to ProblemDescDB::broadcast() and ProblemDescDB::post_process() should be used on all processors. The former will broadcast a packed MPI buffer with the aggregated set of specification data from rank 0 to other processors, and the latter will post-process specification data to fill in any vector defaults that have not yet been provided through either file parsing or direct updates (Note: scalar defaults are handled in the Data class constructors). Refer to run_dakota_mixed() in library_mode.cpp for a complete example listing.

Instantiating the strategy

With the ProblemDescDB object populated with problem data, we may now instantiate the strategy.

  // instantiate the strategy
  Strategy selected_strategy(problem_db);

Following strategy construction, all MPI communicator partitioning has been performed and the ParallelLibrary instance may be interrogated for parallel configuration data. For example, the lowest level communicators in Dakota's multilevel parallel partitioning are the analysis communicators, which can be retrieved using:

  // retrieve the set of analysis communicators for simulation initialization:
  // one analysis comm per ParallelConfiguration (PC), one PC per Model.
  Array<MPI_Comm> analysis_comms = parallel_lib.analysis_intra_communicators();

These communicators can then be used for initializing parallel simulation instances, where the number of MPI communicators in the array corresponds to one communicator per ParallelConfiguration instance.

Defining the direct application interface

When employing a library interface to Dakota, it is frequently desirable to also use a direct interface between Dakota and the simulation. There are two approaches to defining this direct interface.

Extension

The first approach involves extending the existing DirectApplicInterface class to support additional direct simulation interfaces. In this case, a new simulation interface function can be added to Dakota/src/DirectApplicInterface.[CH] for the simulation of interest. If the new function will not be a member function, then the following prototype should be used in order to pass the required data:

  int sim(const Dakota::Variables& vars, const Dakota::ActiveSet& set, 
          Dakota::Response& response);

If the new function will be a member function, then this can be simplified to

  int sim();

since the data access can be performed through the DirectApplicInterface class attributes.

This simulation can then be added to the logic blocks in DirectApplicInterface::derived_map_ac(). In addition, DirectApplicInterface::derived_map_if() and DirectApplicInterface::derived_map_of() can be extended to perform pre- and post-processing tasks if desired, but this is not required.

While this approach is the simplest, it has the disadvantage that the Dakota library may need to be recompiled when the simulation or its direct interface is modified. If it is desirable to maintain the independence of the Dakota library from the host application, then the following derivation approach should be employed.

Derivation

The second approach is to derive a new interface from DirectApplicInterface in order to redefine several virtual functions. A typical derived class declaration might be

namespace SIM {

class SerialDirectApplicInterface: public Dakota::DirectApplicInterface
{
public:

  // Constructor and destructor

  SerialDirectApplicInterface(const Dakota::ProblemDescDB& problem_db);
  ~SerialDirectApplicInterface();

protected:

  // Virtual function redefinitions

  int derived_map_if(const Dakota::String& if_name);
  int derived_map_ac(const Dakota::String& ac_name);
  int derived_map_of(const Dakota::String& of_name);

private:

  // Data
}

} // namespace SIM

where the new derived class resides in the simulation's namespace. Similar to the case of Extension, the DirectApplicInterface::derived_map_ac() function is the required redefinition, and DirectApplicInterface::derived_map_if() and DirectApplicInterface::derived_map_of() are optional.

The new derived interface object (from namespace SIM) must now be plugged into the strategy. In the simplest case of a single model and interface, one could use

  // retrieve the interface of interest
  ModelList& all_models  = problem_db.model_list();
  Model&     first_model = *all_models.begin();
  Interface& interface   = first_model.interface();
  // plug in the new direct interface instance (DB does not need to be set)
  interface.assign_rep(new SIM::SerialDirectApplicInterface(problem_db), false);

from within the Dakota namespace. In a more advanced case of multiple models and multiple interface plug-ins, one might use

  // retrieve the list of Models from the Strategy
  ModelList& models = problem_db.model_list();
  // iterate over the Model list
  for (ModelLIter ml_iter = models.begin(); ml_iter != models.end(); ml_iter++) {
    Interface& interface = ml_iter->interface();
    if (interface.interface_type() == "direct" &&
        interface.analysis_drivers().contains("SIM") ) {
      // set the correct list nodes within the DB prior to new instantiations
      problem_db.set_db_model_nodes(ml_iter->model_id());
      // plug in the new direct interface instance
      interface.assign_rep(new SIM::SerialDirectApplicInterface(problem_db), false);
    }
  }

In the case where the simulation interface instance should manage parallel simulations within the context of an MPI communicator, one should pass in the relevant analysis communicator(s) to the derived constructor. For the latter case of looping over a set of models, the simplest approach of passing a single analysis communicator would use code similar to

  const ParallelLevel& ea_level =
    ml_iter->parallel_configuration_iterator()->ea_parallel_level();
  const MPI_Comm& analysis_comm = ea_level.server_intra_communicator();
  interface.assign_rep(new SIM::ParallelDirectApplicInterface(problem_db, analysis_comm),
                       false);

Since Models may be used in multiple parallel contexts and may therefore have a set of parallel configurations, a more general approach would extract and pass an array of analysis communicators to allow initialization for each of the parallel configurations.

New derived direct interface instances inherit various attributes of use in configuring the simulation. In particular, the ApplicationInterface::parallelLib reference provides access to MPI communicator data (e.g., the analysis communicators discussed in Instantiating the strategy), DirectApplicInterface::analysisDrivers provides the analysis driver names specified by the user in the input file, and DirectApplicInterface::analysisComponents provides additional analysis component identifiers (such as mesh file names) provided by the user which can be used to distinguish different instances of the same simulation interface. It is worth noting that inherited attributes that are set as part of the parallel configuration (instead of being extracted from the ProblemDescDB) will be set to their defaults following construction of the base class instance for the derived class plug-in. It is not until run-time (i.e., within derived_map_if/derived_map_ac/derived_map_of) that the parallel configuration settings are re-propagated to the plug-in instance. This is the reason that the analysis communicator should be passed in to the constructor of a parallel plug-in, if the constructor will be responsible for parallel application initialization.

Additional updates

As part of strategy instantiation, all problem specification data is extracted from ProblemDescDB as various objects are constructed. Therefore, any updates that need to be performed following strategy instantiation must be performed through direct set operations on the constructed objects. In the previous section, the process for updating the Interface object used within a Model was shown. To update other data such as variable values/bounds/tags or response bounds/targets/tags, refer to the set functions documented in Iterator and Model. As an example, the following code updates the active continuous variable values, which will be employed as the initial guess for certain classes of Iterators:

  ModelList& all_models  = problem_db.model_list();
  Model&     first_model = *all_models.begin();
  Dakota::RealVector drv(1000, 1.); // vector of length 1000, values initialized to 1.
  first_model.continuous_variables(drv);

Executing the strategy

Finally, with simulation configuration and plug-ins completed, we execute the strategy:

  // run the strategy
  selected_strategy.run_strategy();

Retrieving data after a run

After executing the strategy, final results can be obtained through the use of Strategy::variables_results() and Strategy::response_results(), e.g.:

  // retrieve the final parameter values
  const Variables& vars = selected_strategy.variables_results();

  // retrieve the final response values
  const Response& resp  = selected_strategy.response_results();

In the case of optimization, the final design is returned, and in the case of uncertainty quantification, the final statistics are returned.

Linking against the Dakota library

This section presumes Dakota has been configured with CMake, compiled, and installed to a CMAKE_INSTALL_PREFIX using 'make install' or equivalent. The Dakota libraries against which you must link will install to CMAKE_INSTALL_PREFIX/bin and CMAKE_INSTALL_PREFIX/lib. When running CMake, Dakota and Dakota-included third-party libraries will be output as TPL LIBS, e.g.,

-- TPL LIBS: nidr;teuchos;pecos;pecos_src;lhs;mods;mod;dfftpack;sparsegrid;
surfpack;surfpack;surfpack_fortran;utilib;colin;interfaces;scolib;3po;pebbl;
tinyxml;conmin;dace;analyzer;random;sampling;bose;dot;fsudace;hopspack;jega;
jega_fe;moga;soga;eutils;utilities;ncsuopt;nlpql;cport;npsol;optpp;psuade;
DGraphics;amplsolver

While external dependencies will be output as EXTRA TPL LIBS:

-- EXTRA TPL LIBS: /usr/lib64/openmpi/lib/libmpi_cxx.so;optimized;
/usr/lib64/libboost_regex-mt.so;debug;/usr/lib64/libboost_regex-mt.so;
optimized;/usr/lib64/libboost_filesystem-mt.so;debug;
/usr/lib64/libboost_filesystem-mt.so;optimized;/usr/lib64/libboost_system-mt.so;
debug;/usr/lib64/libboost_system-mt.so;optimized;
/usr/lib64/libboost_signals-mt.so;debug;/usr/lib64/libboost_signals-mt.so;
/usr/lib64/libSM.so;/usr/lib64/libICE.so;/usr/lib64/libX11.so;
/usr/lib64/libXext.so;/usr/lib64/libXm.so;/usr/lib64/libXpm.so;
/usr/lib64/libXmu.so;/usr/lib64/libXt.so;-lpthread;/usr/lib64/liblapack.so;
/usr/lib64/libblas.so

Note that depending on how you configured Dakota, some of the libraries may not be included (for example NPSOL, DOT, NLPQL). Optional libraries like GSL (discouraged due to GPL license) may also be needed if Dakota was configured with them. Check which appear in CMAKE_INSTALL_PREFIX/bin CMAKE_INSTALL_PREFIX/lib.

Note that as of Dakota 5.2, -lnewmat is no longer required but additional Boost libraries are needed (-lboost_regex -lboost_filesystem -lboost_system) as a result of migration from legacy Dakota utilities to more modern Boost components.

You may also need funcadd0.o, -lfl, -lexpat, and, if linking with system-provided GSL, -lgslcblas. The AMPL solver library may require -ldl. System compiler and math libraries may also need to be included. If configuring with graphics, you will need to add -lDGraphics and system X libraries (partial list here):

-lXpm -lXm -lXt -lXmu -lXp -lXext -lX11 -lSM -lICE

We have experienced problems with the creation of libamplsolver.a on some platforms. Please use the Dakota mailing lists for help with any problems.

Finally, it is important to use the same C++ compiler (possibly an MPI wrapper) for compiling Dakota and your application and potentially include Dakota-related preprocessor defines as emitted by CMake during compilation of Dakota. This ensures that the platform configuration settings are properly propagated.

Summary

To utilize the Dakota library within a parent software application, the basic steps of main.cpp and the order of invocation of these steps should be mimicked from within the parent application. Of these steps, ParallelLibrary instantiation, ProblemDescDB::manage_inputs() and ParallelLibrary::specify_outputs_restart() require the use of overloaded forms in order to function in an environment without direct command line access and, potentially, without file parsing. Additional optional steps not performed in main.cpp include the extension/derivation of the direct interface and the retrieval of strategy results after a run.

Dakota's library mode is now in production use within several Sandia and external simulation codes/frameworks.


Generated on 1 Feb 2013 for Dakota by  doxygen 1.6.1