IMP
2.2.1
The Integrative Modeling Platform
|
Detailed structural characterization of macromolecular assemblies is usually more difficult than that of single proteins because assemblies often don’t crystallize or are too large for NMR spectroscopy. This challenge can be addressed by an “integrative” or “hybrid” approach that simultaneously considers all available information about a given assembly. The integrative approach has several advantages. First, synergy among the input data minimizes the drawbacks of sparse, noisy, ambiguous and incoherent datasets. Each individual piece of data contains little structural information, but by simultaneously fitting a model to all data derived from independent experiments, the degeneracy of the structures that fit the data can be markedly reduced. Second, this approach has the potential to produce all structures that are consistent with the data, not only one structure. Third, an analysis of the structures allows us to estimate the precision of both the data and the structures. Last, this approach can make the process of structure determination more efficient, by indicating which measurements would be most informative.
Hybrid structures based on our integrative approach:
The integrative structure determination is an iterative process consisting of four stages:
IMP provides tools to implement the computational parts of the integrative modeling iterative process, steps 2-4. This computation can be driven from Python scripts or C++ programs. The examples below will use Python scripts.
In IMP, the model is represented as a collection of data, called particles, each of which has associated attributes (e.g. an atom with associated coordinates, mass, radius etc). In IMP, the attributes can be numbers, strings, or lists of other particles, among other things. Each particle is identified by an index (IMP::kernel::ParticleIndex) and has an associated name, in order to make it easier to understand. Finally, attributes are identified by keys (e.g. IMP::kernel::StringKey for string attributes). The key identifies one type of data that may be contained in many particles.
At the most basic, to create particles and manipulate attributes you can do
import IMP.kernel model= IMP.kernel.Model() particle_0= model.add_particle("my first particle") string_key = IMP.kernel.StringKey("my first data") model.add_attribute(string_key, particle_0, "Hi, particle 0") particle_1= model.add_particle("my second particle") model.add_attribute(string_key, particle_1, "Hi, particle 1") print model.get_attribute(string_key, particle_0)
Certain of the attributes can be marked as parameters of the model. These are attributes that you want to sample or optimize. To do so you can do model.set_is_optimized(float_key, particle_0)
Accessing all your data at such a low level can get tiresome, so we provide decorators to make it easier. Each type of decorator provides an interface to manipulate a particular type of data easier. For example, an IMP.atom.Residue decorator provides access to residue associated information (e.g. the index of the residue, or its type) in particles that have it.
residue= IMP.atom.Residue(model, my_residue) print residue.get_residue_type()
Decorators provide a standard interface to add their data to a particle, decorate a particle that already has the needed data or check if a particle is appropriate to have the decorator used with it.
# add coordinates to a particle decorated_particle = IMP.core.XYZ.setup_particle(model, my_particle, IMP.algebra.Vector3D(1,2,3)) print decorated_particle.get_coordinates() # my_other_particle has already been set up, so we can just decorate it directly another_decorated_particle = IMP.core.XYZ(model, my_other_particle) print another_decorated_particle.get_coordinates() # we can change the coordinates too another_decorated_particle.set_coordinates(IMP.algebra.Vector3D(5,4,3))
Decorators can also be used to create relationships between particles. For example, rigid bodies are implemented using the IMP::core::RigidBody decorator. Each rigid body has a collection of other particles that move along with it, the IMP::core::RigidMember particles.
Biological modules are represented hierarchically in IMP using the IMP::atom::Hierarchy. These hierarchies follow the natural hierarchical nature of most biomolecules. A protein from a PDB would be a hierarchy with a root for the whole PDB file with a child per chain. Each chain particle has a child for each residue in the chain, and each residue has a child for each atom. Each particle has various types of associated data. For example and atom has data using the IMP::atom::Atom, IMP::core::XYZR, IMP::atom::Mass and IMP::atom::Hierarchy decorators.
The structures represented do not have to be atomic and can be multi-resolution. That is can have coordinates at any level of the hierarchy. The invariants are that the leaves must have coordinates, radii and mass. Pieces of the hierarchy can be picked out using the IMP::atom::Selection using the standard sort of biological criteria:
# Select residues 10 through 49. my_residues= IMP.atom.Selection(my_pdb, residue_indexes=range(10,50)).get_particles()
You can manipulate and maintain collections of particles using IMP::Container. A collection can be anything from a list of particles gathered manually, to all pairs of particles from some list that are closer than a certain distance to one another. For example, to maintain a list of all close pairs of particles you can do
# all particle pairs closer than 3A # it is always good to give things name, that is what the last argument does close_pairs= IMP.container.ClosePairContainer(all_my_particles, 3, "My close pairs")
These containers can then be used to create scoring functions or analyze the data.
Many things such as rigid bodies and lists of all close pairs depend on maintaining some property as the model changes. These properties are maintained by IMP::kernel::Constraint objects. Since the invariants may depend on things that are reasonably expensive to compute, these invariants are updated only when requested. This means that if you change the coordinates of some particles, the contents of the close pairs list might be incorrect until it is updated. The required update can be triggered implicitly, for example when some scoring function needs it, or explicitly, when IMP::kernel::Model::update() is called.
Behind the scenes, IMP maintains an IMP::kernel::DependencyGraph that tracks how information flows between the particles and the containers, based on the constraints. It is used to detect, for example, that a particular particle is part of a rigid body, and so if its coordinates are needed for scoring, the rigid body must be brought up to date and the appropriate constraint must be asked to update the member particle's coordinates. In order to be able to track this information, relevant objects (IMP::kernel::ModelObject) have methods IMP::kernel::ModelObject::get_inputs() and IMP::kernel::ModelObject::get_outputs() that return the things that are read and written respectively.
One then needs to be able to evaluate how well the current configuration of the model fits this data that one is using to model. In addition to scores, when requested derivatives of the total score as a function of each parameter can be computed.
An IMP::kernel::Restraint computes a score on some set of particles. For example, a restraint be used to penalize configurations of the model that have collisions
# penalize collisions with a spring constant of 10 kcal/mol A soft_sphere_pair_score= IMP.core.SoftSpherePairScore(10) my_excluded_volume_restraint= IMP.container.PairsRestraint(soft_sphere_pair_score, close_pairs, "excluded volume")
To get the score of an individual restraint, you can use its IMP::kernel::Restraint::get_score() method.
Scoring in IMP is done by creating an IMP::kernel::ScoringFunction. A scoring function consists of the sum of terms, each called a Restraint. You can create many different scoring functions for different purposes and each restraint can be part of multiple scoring functions.
my_scoring_function= IMP.core.RestraintsScoringFunction([my_excluded_volume_restraint], "score excluded volume")
It is now time to find configurations of the model that score well with regards to the scoring function you developed. IMP provides a number of tools for that.
An IMP::kernel::Optimizer takes the current configuration of the model and perturbs it, typically trying to make it better (but perhaps just into a different configuration following some rule, such as molecular dynamics). They using a scoring function you provide to guide the process.
my_optimizer= IMP.core.ConjugateGradients(m) my_optimizer.set_scoring_function(my_scoring_function) my_optimizer.optimize(1000)
A IMP::kernel::Sampler produces a set of configurations of the model using some sampling scheme.
Configurations of the model can be saved and visualized in a variety of ways. Atomic structures can be written as pdb files using IMP::atom::write_pdb(). More flexibly, coarse grained models, geometry and information about the scoring function can be written to RMF files.
my_rmf= RMF.create_rmf_file("my.rmf") IMP.rmf.add_hierarchy(my_rmf, my_hierarchy) IMP.rmf.add_restraint(my_rmf, my_excluded_volume_restraint) IMP.rmf.save_frame(my_rmf, 0)
Functionality in IMP
is grouped into modules, each with its own namespace (in C++) or package (in Python). For example, the functionality for IMP::core can be found like
IMP::core::XYZ(p)
in C++ and
IMP.core.XYZ(p)
in Python.
A module contains classes, methods and data which are related and controlled by a set of authors. The names of the authors, the license for the module, its version and an overview of the module can be found on the module main page (e.g. IMP::example). See the "Namespaces" tab above for a complete list of modules in this version of IMP
.
IMP provides two sorts of tools to help you understand what is going on when you write a script. Both logging and checks are disabled if you use a fast
build, so make sure you have access to a non-fast
build.
Many operations in IMP can print out log messages as they work, allowing one to see what is being done. The amount of logging can be controlled globally by using IMP::base::set_log_level() or for individual objects by calling, for example model.set_log_level(IMP.base.VERBOSE)
.
IMP implements lots of runtime checks to make sure both that it is used properly and that it is working correctly. These can be turned on and off globally using IMP::base::set_checks_level() or for individual objects.
IMP tries to make things simpler to use by adhering to various naming and interface conventions.
Unless documented otherwise, the following units are used
CamelCase
are class names, for example IMP::RestraintSet_
) in them are functions or methods, for example IMP::Model::update().ClassName
are passed using type type ClassNames
. This type is a list
in Python and a IMP::base::Vector<ClassName> or, more or less, IMP::base::Vector<ClassName*> in C++.set_
change some stored valueget_
create or return a value
object or which return an existing object
class
object.create_
create a new IMP::base::Object class objectadd_
, remove_
or clear_
manipulate the contents of a collection of datashow_
prints things in a human-readable formatload_
and save_
or read_
and write
move data between files and memorylink_
creates an connection between something and an IMP::base::Objectupdate_
changes the internal state of an IMP::base::Objectdo_
is a virtual method as part of a non-virtual interface patternhandle_
takes action when an event occursvalidate_
checks the state of data and prints messages and throws exceptions if something is corruptedsetup_
and teardown_
create or destroy some type of invariant (e.g. the constraints for a rigid body)apply_
either applies a passed object to each piece of data in some collection or applies the object itself to a particular piece of passed data (yeah, it is a bit ambiguous).IMP_
are preprocessor symbols (C++ only)Graphs in IMP are represented in C++ using the Boost Graph Library. All graphs used in IMP are VertexAndEdgeListGraphs, have vertex_name properties, are BidirectionalGraphs if they are directed.
The Boost.Graph interface cannot be easily exported to Python so we instead provide a simple wrapper IMP::PythonDirectedGraph. There are methods to translate the graphs into various common Python and other formats (e.g. graphviz).
As is conventional in C++, IMP classes are divided into broad, exclusive types
const&
), never by pointer. Equality is defined based on the data stored in the value. Most value types in IMP are always valid, but a few, mostly geometric types (IMP::algebra::Vector3D) are designed for fast, low-level use and are left in an uninitialized state by their default constructorAll types in IMP, with a few documented exceptions, can be
IMP tries to maintain backwards compatibility, however, this is not always feasible. Our general policy is that functionality that is deprecated in one release (e.g. 2.1) is removed in the next one (2.2). Deprecated functionality should produce warnings when use (e.g. compile time messages for deprecated macros and runtime messages for deprecated functions called from Python). In addition, bugs discovered in deprecated functionality are not fixed.
Probably the best thing to do next is to read the kernel/nup84.py example.