IMP  2.2.1
The Integrative Modeling Platform
design_example.md
1 # Design example
2 
3 # Overview # {#designexample}
4 
5 [TOC]
6 This page walks through an iterative design process to give an
7 example of what sort of issues are important and what to think about
8 when choosing how to implement some functionality.
9 
10 # Original Description # {#design_original}
11 
12  Hao wants to implement ligand/protein scoring to IMP so that he can
13  take advantage of the existing infrastructure. The details of the scoring
14  function are currently experimental. The code does the following:
15 
16 1. Read in the protein pdb and the small ligand mol2. The protein is in
17  a pdb file and so can use IMP::atom::read_pdb. The ligand is in a mol2
18  file which defines its own set of pdb-compatible atom types.
19 2. He proposed storing the coordinates and atom types in vectors outside
20  of the decorators to speed up scoring.
21 3. Read in the potential of mean force (PMF) table from a file with
22  a custom format. The number of dimensions can be constant including
23  the two atom types for a pair atoms, and the distance between that
24  pair. The values are stored in the table will not change during the
25  program and need to be looked up quickly given the dimension data.
26  The PMF table uses different atom names than the mol2 file.
27 4. Score a conformation by looping over all ligand-protein atom
28  pairs. For each pair look up the PMF value in the table by the
29  two atom types and the distance, sum up all PMF values.
30 
31 ## Comments on the original description ## {#design_original_comments}
32 
33 1. mol2 is a standard file format so it makes sense to have a reader
34  for it in IMP. We can adopt the mol2 atom names as the standard names
35  for ligand atoms in IMP.
36 2. The details of how the coordinates are stored an accessed are
37  implementation details and worring about them too much should probably
38  be delayed until later once other considerations are figured out.
39 3. Loading the PMF table is a natural operation for an initialization
40  function. However, since the PMF table is not a standard file format,
41  it doesn't make sense for it to go into IMP, at least not until a file
42  form for the protein-ligand scoring has been worked out. Also there is
43  little reason to keep the PMF table atom types around, and they probably
44  should be convereted to more standard atom types on load. Finally, since
45  the data in the PMF file is directly the scoring data, there isn't a
46  real need to have a special representation for it in memory.
47 4. There are two different considerations here, which pairs of atoms to
48  use and how to score each pair.
49 
50 
51 # Design Proposal for Reading # {#design_reading}
52 Since the mol2 reader is quite separate from the scoring, we will consider
53 it on its own first. In analogy to the pdb reader, it makes sense to
54 provide a function `read_mol2(std::istream &in, Model *m)` which returns
56 
57 The mol2 atom types can either be added at runtime using
58 IMP::atom::add_atom_type() or a list of predifined constants can be added
59 similar to the IMP::atom::AT_N. The latter requires editing both
60 IMP/atom/Atom.h and modules/atom/src/Atom.cpp and so it a bit harder
61 to get right.
62 
63 # Implementing Scoring as a IMP::Restraint # {#design_restraint}
64 
65 First, this functionality should probably go in a new module since it
66 is experimental. One can use the scratch module in a separate `git` branch,
67 for example.
68 
69 One could then have a `PMFRestraint` which loads a PMF file from the
70 module data directory (or from a user-specified path). It would
71 also take two IMP::atom::Hierarchy decorators, one for the ligand and
72 one for the protein and score all pairs over the two. For each pair of atoms,
73 it would look at the IMP::atom::Atom::get_type() value and use that
74 to find the function to use in a stored table.
75 
76 Such a design requires a reasonable amount of implementation, especially
77 once one is interested in accelerating the scoring by only scoring nearby
78 pairs. The `PMFRestraint` could use a IMP::core::ClosePairsScoreState
79 internally if needed.
80 
81 # Implementing Scoring as a IMP::PairScore # {#design_score}
82 
83 One could instead separate the scoring from the pair generation by implementing
84 the scoring as an IMP::PairScore. Then the user could specify an
85 IMP::core::ClosePairsScoreState when experimenting to see what is the fastest
86 way to implement things.
87 
88 As with the restraint solution, the IMP::PairScore would use the
89 IMP::atom::Atom::get_type() value to look up the correct function to use.
90 
91 If you look around in \imp for similar pair scores (see IMP::PairScore and the
92 inheritance diagram) you see there is a IMP::core::TypedPairScore which
93 already does what you need. That is, it takes a pair of particles, looks up
94 their types, and then applies a particular IMP::PairScore based on their types.
95 IMP::core::TypedPairScore expects an IMP::IntKey to describe the type. The
96 appropriate key can be obtained from IMP::atom::Atom::get_type_key().
97 
98 Then all that needs to be implemented in a a function, say
99 IMP::hao::create_pair_score_from_pmf() which creates an IMP::core::TypedPairScore,
100 loads a PMF file and then calls IMP::core::TypedPairScore::set_pair_score() for
101 each pair stored in the PMF file after translating PMF types to the
102 appropriate IMP::atom::AtomType.
103 
104 This design has the advantage of very little code to write. As a result it
105 is easy to experiment (move to 3D tables or change the set of close pairs). Also
106 different, non-overlapping PDFs can be combined by just adding more terms to
107 the IMP::core::TypedPairScore.
108 
109 The disadvantages are that the scoring passes through more layers of function
110 calls, making it hard to use optimizations such as storing all the coordinates
111 in a central place.
112 
113 
114 # Some final thoughts # {#design_final}
115 
116 1. Figure out orthogonal degrees of freedom and try to split
117  functionality into pieces that control each. Here it is the set
118  of pairs and how to score each of them. Doing this makes it
119  easier to reuse code.
120 2. Don't create two classes when only have one set of work. Here,
121  all you have is a mapping between a pair of types and a
122  distance and a score. Having both a PMFTable and PMFPairScore
123  locks you into that aspect of the interface without giving you
124  any real flexibility.
125 3. Implementing things in terms of many small classes makes the
126  design much more flexible. You can easily replace a piece
127  without touching anything else and since each part is simple,
128  replacing a particular piece doesn't take much work. The added
129  complexity can easily be hidden away using helper functions in
130  your code (or, if the action is very common, in IMP).
AtomType add_atom_type(std::string name, Element e)
Create a new AtomType.
Key< 1, true > IntKey
The type used to identify int attributes in the Particles.
IMP::kernel::PairScore PairScore
Hierarchy read_pdb(base::TextInput input, kernel::Model *model, PDBSelector *selector=get_default_pdb_selector(), bool select_first_model=true)
Abstract score function.
The standard decorator for manipulating molecular structures.
IMP::kernel::Model Model
Hierarchy read_mol2(base::TextInput mol2_file, kernel::Model *model, Mol2Selector *mol2sel=nullptr)
Create a hierarchy from a Mol2 file.
const AtomType AT_N